| CONTENTS | PREV | NEXT | INDEX | Designing Enterprise Applications with the J2EETM Platform, Second Edition |
This section presents some design considerations for internationalizing Web-tier components.
An internationalized Web application must be able to determine the encoding of an incoming request and ensure that the response is properly encoded. This section discusses locale and encoding management for Web-tier components.
10.3.1.1 Determining HTTP Request Locale and EncodingRuntime locale determination is simple and automatic in J2SE applications. An application developer can use J2SE internationalization APIs to set an application's locale programmatically.
Locale semantics for J2EE applications are more complex than for J2SE applications. For example, the system default locale for a Web component is the Web container's default locale. In a distributed environment, this default locale may differ among containers, making the locale dependent on the container servicing the request. Using Locale.getDefault in Web applications is not recommended, because the value returned represents the Web container's locale, not the client's locale.
An internationalized enterprise application's Web tier must somehow determine the encoding of incoming request parameters. As explained previously in this chapter, an encoding defines the relationship between a character set's code points and a data stream's unit size and serialization rules. Correct data interpretation requires knowing how the data are encoded. Unfortunately, there's no standard way to determine HTTP request parameter encoding. An HTML browser encodes each request using the encoding of the page that was the source of the request, but that knowledge is only useful if the original page's encoding is known.
There are several approaches to determining and tracking HTTP request locale:
- Deduce encoding from the
Accept-languageHTTP header--TheAccept-languageheader does not unambiguously indicate request encoding, but it can provide an appropriate locale for content generation. The methodServletRequest.getLocalereturns a preferredLocalethat the Web container chooses based on theAccept-languageheader value. The methodServletRequest.getLocalesreturns anEnumerationofLocaleobjects that the client will accept, based on the contents of multipleAccept-languageheader values. A Web component can usegetLocalesto select the most appropriate locale from among the available options.On the other hand, however, this approach is unreliable because there is no unique relationship between the value of the
Accept-languageheader and the request encoding. Most character sets may be represented in a variety of encodings. TheAccept-languagevalue, even if accurate, only narrows the range of possible encodings. For these reasons, relying onAccept-languagefor determining request encoding is discouraged.HTTP defines two other relevant
Accept-headers.Accept-charsetis a list of character sets the browser will accept, which can be useful in choosing a response encoding.Accept-encodingis a document's so-called "content coding," usually a type of data compression. Neither of these headers indicates request encoding. See RFC 2616 listed in Section 10.9 on page 345 for details.- Provide separate application entry points for different locales--In the Web tier, one servlet may be mapped to several URLs, each corresponding to a particular locale. The URL might even contain the locale identifier; for example,
http://j2eeserver/j2eeapp/login/en_USfor United States English, andhttp://j2eeserver/j2eeapp/login/de_CH, for Swiss German. This approach is especially appropriate for applications that heavily use manually-localized JSP pages, because such pages are typically already separated by the URL namespace.- Define an application-wide encoding--If every Web component in an application transmits all of its pages in the same encoding, then requests from those pages will always be in that encoding. This approach simplifies design, but has the drawback that any component that does not set the encoding correctly will not work properly. This drawback can be eliminated using a servlet filter; see the next section for a description. As described previously in this chapter, UTF-8 encoding unifies ASCII with Unicode. Standardizing on UTF-8 is the recommended approach because it provides the broadest coverage of character sets.
The method ServletRequest.setCharacterEncoding (Java Servlet specification version 2.3 and above only) overrides a servlet request's default encoding with a given encoding, which thereafter is used to interpret request parameters. This method must be called before any data is retrieved from the request object.
In summary, the BluePrints recommendation is to standardize on a single encoding, preferably UTF-8, to provide consistent request encoding, efficient data transmission, broad character set coverage, and wide browser support. When a consistent encoding cannot be used (because of noncompliant browsers, for example), consider storing locale and encoding in session state, or use separate URLs for each locale or encoding as described in the next section.
10.3.1.2 Storing Locale and Encoding at RuntimeInstead of determining locale and encoding for each request, locale can be stored for use by subsequent requests. There are several ways to accomplish this:
10.3.1.3 Setting HTTP Response Locale and Encoding
- Store locale and encoding in hidden variables or parameters--The encoding of a page could be stored in hidden variables in forms or query string parameters, so every request would contain an indication of the request parameter encoding. This approach suffers from a few problems. Additional parameters and hidden form variables complicate page creation. Accurately changing the encoding of a page implies changing all of the parameters or form variables, which complicates maintenance. Parameters or hidden form variables are appropriate to indicate request encoding only when both the application cannot be standardized on a single encoding, and it doesn't use stateful components.
- Store encoding as session state--A stateful server-side component (a session bean or servlet using
HttpSession) can maintain the encoding of generated content in a session attribute. This approach is recommended for applications that must use both multiple encodings and stateful components.- Store the locale and encoding as a user preference--Most enterprise applications store some user profile information--sometimes only a password. User profile parameters can be used to localize all requests following login. User preference information may be kept in a client-side cookie, stored in a database and accessed with JDBC, stored in Web-tier session state, or accessed as a user profile entity bean.
Response encoding of JSP pages and servlets determines both the format of characters in the response and the request encoding of any subsequent request from the served page.
An HTTP server indicates content encoding using part of the Content-type HTTP header. This header's value is either TYPE or TYPE;charset=CHARSET where TYPE is the content type (RFC 1049) and CHARSET is the name of the encoding as registered with the Internet Assigned Names Authority (IANA). The default value for TYPE is text/html; the default value for CHARSET is ISO-8859-1. A reference to the IANA registry of values for charset is listed in Section 10.9 on page 345.
There are two ways to set encoding of a servlet's HTTP response:
- Use
ServletResponse.setContentType--Use this method to manually set the entireContent-typeheader. Include the encoding as the value ofcharset; for example:response.setContentType("text/html;charset=Shift_JIS");- Use
ServletResponse.setLocale--Use this method to set HTTP headers appropriate for the given locale, including thecharset=portion of theContent-typeheader.
Set the locale or content type before calling Servlet.getWriter to ensure that the resulting Writer is configured for the correct encoding.
Two attributes of a JSP page's page directive can control encoding:
A JSP container may issue a runtime error if the encoding for the page is inappropriate for the content type. It may produce a translation-time error when a JSP page specifies an unsupported encoding.
The content type and encoding of a JSP page is fixed at page translation time when they are set using a directive. Use either a custom tag or a servlet to set encoding at runtime.
An application can use a servlet filter to set response encoding to a single value before a servlet or JSP page receives the request. This technique provides a single point of control for enforcing standardized encoding and ensures that encoding is correct before a servlet uses its response object. The sample application enforces response encoding with a servlet filter. The servlet filter can also serve as a guard, logging an error message if any client makes a request using an unsupported encoding.
Automatic selection of language, character set, and encoding selection make things easy for users. But it's important always to provide a way for users to change languages manually as well. Page headers or footers are a good place for hyperlinks or dropdown boxes for manual language selection. When you offer users a choice of languages, the name of each language should be in the language to be chosen, rather than the language of the current page.
Internationalization and localization are important concerns when designing presentation components such as JSP pages, JavaBeans helper components, and custom tags. Examples of localizable Web-tier components include:
- JSP page fragments included dynamically in a response based on locale
- Helper JavaBeans components that customize their behavior to a locale
- Custom tags that order lists of data using a collating sequence appropriate for a locale
- Custom tags that customize database requests to a specified locale
- Custom tags that use locale to format numbers, dates, currency, percentages, and so on
All of these components may use the J2SE internationalization APIs. Remember always to consider internationalization when designing presentation components.
10.3.2.3.1 ExampleThis example from the sample application presents a localizable custom tag that displays currency values in a format appropriate for a locale.
The sample application includes a presentation component called a list tag, which formats a list of items from a java.util.Iterator. The list tag evaluates and outputs its body text for each value the iterator returns. Each value is a JavaBeans component that exposes its values as get and set property accessors.
The example presentation component, a listItem tag, formats and displays the current item within a list tag's body text. A sample usage of this tag looks like this:
<waf:listItem property="unitCost" formatText="currency" locale="ja_JP" precision="0"/>
The tag's attributes control its behavior in the following ways:
- The tag's
propertyattribute identifies the JavaBeans property to format from the current iterator value. ThelistItemtag retrieves the value to display by calling the current item'sgetUnitCostmethod. This method is the get property accessor as defined by the JavaBeans naming convention.- The
listItemtag'sformatTextattribute indicates how to format the item. In the example above, the tag formats the value as currency; it can also format decimals and percentages.- The
precisionattribute indicates how many zeroes appear after the decimal. In this case, the currency is Yen, which requires no decimal part, so there are0digits after the decimal point.- Finally, the tag formats the item according to the value of the
localeattribute. The locale name follows the standard convention described in Section 10.1.1.1 on page 313; in the example, the locale isja_JP, meaning Japanese (ja) in Japan (JP). The tag handler code maps this string to the correspondingLocaleobject. The tag handler code uses the standard J2SE classCurrencyFormatto format the value, complete with Yen sign.
Another example usage of the listItem tag might look like this:
<waf:listItem property="unitCost" formatText="currency" locale="en_US" precision="2"/>
The locale in this example is en_US, so the CurrencyFormatter will use appropriate localization for United States English. Because this currency amount is in dollars, the precision is two (to display cents).
Note that this tag does not actually convert currency between Yen and dollars. Rather, it simply formats the value that getUnitCost returns for the specified locale.
Components other than custom tags can also be internationalized. For example, unitCost in the above example is a property of a JavaBean component, which itself could be localized. The component could return one price, in Yen, for locale ja_JP, and a different price, in dollars, for locale en_US. In such a case, the JavaBeans component (part of the application MVC model) would produce a unit cost appropriate to the locale, while the presentation tag (part of the application MVC view) would format the value in a way appropriate for the locale. (This scenario is hypothetical, as the sample application does not provide this functionality in quite this way.)
The sample application contains other examples of locale-aware presentation components. Localizable presentation components greatly simplify internationalization.
Because locale is primarily about how to present data, localization is most appropriately implemented in MVC views. In the Web tier, MVC views are usually JSP pages. J2EE Web applications should localize content in the Web tier with JSP pages.
Two common approaches exist for localizing JSP pages: creating JSP pages that can be localized with resource bundles or maintaining separate JSP pages for each locale. Each approach has strengths and drawbacks. The next two sections discuss the tradeoffs between these two options.
Use separate JSP pages for each locale when content structure and display logic differ greatly between locales or when messages depend on the target locale. Resource bundles are recommended for error and logging messages (see Section 10.7 on page 341), and when content varies between locales only in data values and not in structure.
10.3.3.4 Localizing JSP Pages with Resource BundlesA common way to localize JSP pages is to assemble chunks of localized text using locale-aware custom tags. Each time the page is served, the custom tags select text from a resource bundle for the current locale.
Figure 10.2 shows a single internationalized JSP page that is localized with resource bundles for several locales. Benefits of this approach include:
- Easier maintenance--A single JSP page is the source file for a particular screen in all locales. A modification to the JSP page changes the dynamic content generated for all locales.
- Consistent page structure--Because the source code for a screen is shared between locales, the page maintains the same structure in all locales, changing only in data values, message text, and language displayed.
- Easy extensibility--A new locale can be added by simply defining a new resource bundle for the locale.
The consistency provided by this approach is also a major drawback. While changing the content of this page is easy, customizing its structure to locales is harder, because one JSP page produces content for all locales.
This approach shares a single JSP page across locales, so the page encoding must be compatible with the encodings of all application character sets. The JSP directive setContentType specifies the content type and the encoding for the page at page translation time, so all pages produced using this directive must use the same encoding. For reasons explained earlier in this chapter, standardizing on UTF-8 encoding is recommended.
The recommended way to implement a single JSP page customized to multiple locales is to use resource bundles. Access resource bundles from custom tags in the pages instead of using resource bundles from scriptlet or expression code. Custom tags improve the readability and maintainability of JSP pages, and reduce duplicated code.
10.3.3.5 Locale-Specific JSP PagesAnother approach to localizing JSP pages is to provide a separate JSP page for each locale.
Figure 10.3 Localizing by Creating Separate JSP Pages for Each Locale
Figure 10.3 shows a directory tree of an internationalized JSP page, index.jsp. There are actually four separate versions of the file, each in a separate directory in the server's namespace. In this approach, a servlet or servlet filter forwards each request for a JSP page to the appropriate file based on the requesting client's locale. The names of the directories that separate the different file versions use the standard resource bundle suffix naming convention. An alternative is to use file naming conventions instead of directories. For example, the name of a file for the default locale is index.jsp, the Japan Japanese localized JSP file would be called index_ja_JP.jsp, the Swiss German file would be index_de_CH.jsp, and so on. While this approach will work, applications with a large number of files and locales might easily become difficult to manage.
Grouping JSP pages, static pages, and other resources such as graphics files in one directory per locale is a BluePrints best practice. Note that JSP pages can be localized selectively with this scheme. The logic for determining which file to forward to is in a dispatching servlet or servlet filter, which can implement the same naming convention scheme as do resource bundles. The forwarding component can always choose the most specific file available and use a default file (with no localization suffix) as a fallback.
The page-per-locale approach has the following benefits:
- Greater customizability--Using resource bundles to customize a single JSP page results in pages whose structure is essentially the same for all locales. Using one JSP page per locale provides maximum customizability of the content for a locale, because customizations are not limited to the contents of a resource bundle. As a result, the page-per-locale approach is prefereable when content differs substantially between locales.
- Source clarity--All of the content for a locale appears in a single file (the JSP file for the locale) instead of being separated between a JSP page file with some structural tags, and a properties file or resource bundle class containing named strings.
At the same time, this approach has some drawbacks. Maintaining a consistent look-and-feel between locales is more difficult with separate JSP pages than with resource bundles. Separate files must be created and maintained consistently for several locales. This means more maintenance than does the resource bundle approach.
The Web-tier framework and tools you select for creating your application may influence your decision in how to support internationalized content.
The sample application uses a templating mechanism, providing both structural consistency between locales and the flexibility of page-per-locale localization. The templating mechanism uses an XML "screen definitions file" for each locale to assemble localized JSP pages into a single page. The screen definitions file for a locale specifies a template file, and maps localized JSP pages to symbolic names such as "header," "footer," and so on. The template file defines the overall structure for a page, and uses custom tags to include localized JSP pages, which it references by symbolic name. Because the screen definitions file specifies the template, both page layout and "look and feel" can be unified across locales (by using a single template) or customized for particular locales (by using separate templates).
Regardless of which option you choose, setting the JSP page response encoding correctly is crucial. The sample application standardizes all page encoding to UTF-8, and enforces this encoding with a servlet filter for all JSP pages it serves.