|
Articles Index
One important strength of the Java Platform, Standard
Edition (Java SE) has always been its internationalization and
localization support. The platform continues to evolve, and Java SE
6 provides developers even more control over how they access and use
locale-sensitive resources in their applications. Java SE 6 provides
the following major enhancements to its internationalization
support:
Resource Control and Access
To provide
localized resources in applications, programmers should
use resource bundles as defined by the java.util.ResourceBundle class. This class initiates the searching and loading of localized
resources when you invoke its static getBundle method.
The method returns a ResourceBundle instance that is
responsible for providing the localized text, images, and other
elements for a target locale. A locale is a cultural identifier
defined by a language and geographical region.
Although the default algorithms for searching and loading bundles
are well defined, Java SE 6 more clearly specifies resource caching
and provides you more control over how your applications search and
load localized resources. Applications should continue to use
ResourceBundle methods to retrieve resources, but new
features allow you great flexibility in how and where you store the
actual content that ResourceBundle objects provide.
Before Java SE 6, programmers usually stored localized content in
either a subclass of ListResourceBundle or as a
properties file. Now, however, you can specify different formats for
your resource files. You can, for example, create and use an XML-formatted
resource file. You might also decide to change the default
naming scheme for localized files. This extra control is available
from custom ResourceBundle.Control classes that you can
implement.
The ResourceBundle.Control class exposes the major
steps of the bundle-loading process. Each step has a separate method
in the class. You can override those methods to provide customized
strategies for searching, loading, and caching resources. Because the
Control class defines methods that implement the
existing default strategies, you have to implement only the
customized functionality that you want for your particular subclass. By
providing your own Control subclass to the getBundle
method, you control exactly how your application finds and uses
localized resources.
Of course, you don't have to create your own Control
class. You can always use the predefined, default Control.
The default class provides methods that implement the default
behavior. In the following example, the getBundle method
uses the default Control:
Locale targetLocale = new Locale("fr", "FR"); // French language, French region
ResourceBundle myResources = getBundle("com.sun.demo.intl.AppResource", targetLocale);
|
If your host's default locale is en_US, the default
Control object searches for the following localized
AppResource names in this example:
com.sun.demo.intl.AppResource_fr_FR
com.sun.demo.intl.AppResource_fr
com.sun.demo.intl.AppResource_en_US
com.sun.demo.intl.AppResource_en
com.sun.demo.intl.AppResource
|
For each bundle name in the preceding list, the default Control
searches for two implementation formats: ResourceBundle subclasses (.class
format) and PropertyResourceBundle property files (.properties
format). When it finds a resource in either format, it determines the bundle's parent chain and returns
the ResourceBundle instance. Notice also that the bundle names use locale-specific suffixes -- for example, fr_FR, fr, and en_US -- to differentiate among the
various localized bundles with the same base name, AppResource. Additionally, the default behavior
caches bundles. Repeated invocations of getBundle return cached resources if you ask for the same bundle name. The
Java platform documentation describes the getBundle method behavior in detail.
In some situations, you may prefer a different bundle-loading
strategy. The next few sections describe scenarios that differ from
the default. The scenarios are the following:
Properties-Only Searches
Some bundle-loading strategies don't require a fully customized
Control subclass. Instead, use the Control
class's static getControl method to enforce some standard
options that differ only slightly from the default. For example, if
your application uses properties files exclusively, you can avoid the
overhead of searching for ResourceBundle subclasses.
Instead, you can retrieve a control that searches only for properties
files.
Call the Control.getControl method with a
List<String> of file formats that should be
supported. The predefined string values are java.class
and java.properties. Three static final constants define
a nonmutable list containing each list option:
FORMAT_CLASS defines a list containing the string java.class.
FORMAT_PROPERTIES defines a list containing the string java.properties.
FORMAT_DEFAULT defines a list containing both java.class and java.properties.
Use the Control.FORMAT_PROPERTIES constant to create
a Control object that searches for properties files
only:
Control propOnlyControl = Control.getControl(Control.FORMAT_PROPERTIES);
ResourceBundle bundle = ResourceBundle.getBundle("com.sun.demo.intl.res.Warnings",
propOnlyControl);
|
Using the propOnlyControl instance, the
getBundle method ignores bundle file names ending in
.class, and the method searches only for bundles ending
in .properties.
Locales as Part of the Package Name
Different localizations of the same base bundle name are usually
differentiated by locale identifier suffixes. For example, the
default or root Warnings bundle is simply named
Warnings.properties. However, a French version of that
bundle would be named Warnings_fr_FR.properties. Using
the default Control, these different bundles would
exist together in the same package. But you can change the way that
localized bundles are named.
Imagine that you prefer to put different localizations of the
same bundle into separately defined subdirectories or packages. You
might create the following property files in your file
structure:
com/sun/demo/intl/res/root/Warnings.properties
com/sun/demo/intl/res/fr_FR/Warnings.properties
com/sun/demo/intl/res/ja_JP/Warnings.properties
|
To do this, you must define your own Control
subclass. The subclass must override the following methods:
Override the getFormats method because your
application will use only properties files. Override the
toBundleName method because your application will use
the specified locale as part of the new bundle's package name rather
than append locale-specific suffixes to the bundle base name.
The following code shows a customized Control class
that allows bundle searches for .properties files and
locale-specific package names instead of bundle-name suffixes.
class SubdirControl extends Control {
// This control provides only properties file formats.
public List<String> getFormats() {
return Control.FORMAT_PROPERTIES;
}
// Given a base bundle name and a locale, this
// method creates a bundle name for a specific locale.
// In this case, the bundle name uses the locale as a part
// of the package name, not a bundle-name suffix.
//
public String toBundleName(String bundleName, Locale locale) {
StringBuffer localizedBundle = new StringBuffer();
// Find the base bundle name.
int nBaseName = bundleName.lastIndexOf('.');
String baseName = bundleName;
// Create a new name starting with the package name.
if (nBaseName >= 0) {
localizedBundle.append(bundleName.substring(0, nBaseName));
baseName = bundleName.substring(nBaseName+1);
}
String strLocale = locale.toString();
// Now append the locale identification to the package name.
if (strLocale.length() > 0 ) {
localizedBundle.append("." + strLocale);
} else {
localizedBundle.append(".root");
}
// Now append the basename to the fully qualified package.
localizedBundle.append("." + baseName);
return localizedBundle.toString();
}
}
|
The following code shows how to provide the customized
Control object to the getBundle
method:
String bundleName = "com.sun.demo.intl.res.Warnings";
SubdirControl control = new SubdirControl();
Locale locale = new Locale("fr", "FR");
ResourceBundle bundle = ResourceBundle.getBundle(bundleName, locale, control);
|
If the default locale is en_US, the
getBundle method will use the Control
object to search the following candidate and fallback bundle
names:
com.sun.demo.intl.res.fr_FR.Warnings
com.sun.demo.intl.res.fr.Warnings
com.sun.demo.intl.res.en_US.Warnings
com.sun.demo.intl.res.en.Warnings
com.sun.demo.intl.res.root.Warnings
|
Cache Controls
The default behavior for loading bundles includes a check to
determine whether the bundle has already been loaded. However, you
can change this cache option. If you simply want to clear the cache
before a bundle reload, you can call the clearCache
method of the ResourceBundle class:
ResourceBundle.clearCache();
ResourceBundle myBundle = ResourceBundle.getBundle("com.sun.demo.intl.res.Warnings");
|
You can even control a cache's expiration by setting a custom
time-to-live value. In your own Control, override the
getTimeToLive method to return the millisecond lifetime
value for the bundle. Two predefined values exist: TTL_DONT_CACHE
and TTL_NO_EXPIRATION_CONTROL.
The default Control
returns TTL_NO_EXPIRATION_CONTROL, which means that
bundles are cached without any expiration value. The value
TTL_DONT_CACHE indicates that the bundle must not be
cached at all. If you would like your bundles to expire every four hours
to support live updates without restarting your application, for
example, you can override the getTimeToLive method like
this:
public long getTimeToLive() {
return 4L*60*60*1000; // 14,400,000 milliseconds is four hours.
}
|
The Control class offers many options for specifying
precise bundle searching and loading. This article presents only a
few. Some of the other methods you may override include the
following:
getCandidateLocales
getFallbackLocale
newBundle
needsReload
See the complete platform
documentation for the ResourceBundle.Control class for more
details on these and other methods.
Locale-Sensitive Services
The java.text and java.util packages
support more than 100 locales. Although existing locales represent
the needs of many geographical regions, the locale-sensitive classes in the Java
platform do not yet support some areas. Supporting a
locale and its data requires a lot of research, including
investigating and confirming date and number formats, country name
translations, and sort orders. Sometimes even political influences
affect locale data. Unfortunately, it is practically impossible to
keep the platform's locale data completely up-to-date at all times,
even though customers want and need access to new locale data in the
platform.
One solution is to provide new application programming interfaces
(APIs) that allow you to use any locale data that you may need for
your own application. Java SE 6 provides an interface that
developers can use to plug in their own locale data and related
services. Fortunately, an active project called the Common Locale Data
Repository (CLDR) attempts to track global locale data and
maintain it. The Unicode Consortium hosts the project. Using the
new Locale-Sensitive Services Service Provider Interface (SPI), you
can use this or any other locale data in your application.
To provide locale data and services, you must first decide which
functionality you want to provide to the application. You can provide
locale data for the following locale-sensitive classes:
java.text.BreakIterator
java.text.Collator
java.text.DateFormat
java.text.DateFormatSymbols
java.text.DecimalFormatSymbols
java.text.NumberFormat
java.util.Currency
java.util.Locale
java.util.TimeZone
Once you decide which functionality you want to provide with your
locale, you must implement the corresponding service provider
interface (SPI), which resides in either the java.text.spi or java.util.spi packages.
Imagine that you want to provide a DateFormat object for a new locale. You should implement the
java.text.spi.DateFormatProvider class. Because java.text.spi.DateFormatProvider is an abstract class,
you must extend it and implement the following methods:
getAvailableLocales
getDateInstance
getDateTimeInstance
getTimeInstance
Notice that getAvailableLocales method is actually
derived from the parent class LocaleServiceProvider, so
all the SPI providers should implement it to declare their supported
locales. Notice that the other three methods are mirrored factory
methods from the corresponding API class. For example, the
getDateInstance method also exists in the
java.text.DateFormat class.
After implementing the required methods, you must package your
service so that you can deploy it with the Java Runtime Environment
(JRE). Because the Locale-Sensitive Services SPIs are based on the
standard Java Extension Mechanism, you can package them as a JAR file in the JRE extension directory. JREs that use your extension can now
provide locale data for previously undefined or unsupported locales.
Text Normalization
The Unicode
Standard allows users to create equivalent text in different
ways. For example, the é character, named
LATIN SMALL LETTER E WITH ACUTE in the Unicode Standard, has the
point value U+00E9. The base character of e and the acute
accent mark, ´, are combined into a single code
point called a composite or composed character.
However, you can also represent the same visual character by
combining the two separate code points for the lowercase letter
e and the acute accent. The two code
points U+0065 and U+0301 combine to create
the same visual and semantic effect, which is the
é character. The combining characters are
sometimes called a combining sequence. Other characters in the
Unicode Standard can combine to create similar effects with
different combining sequences and character forms.
Some combining sequences are visually different but have the same
meaning for most practical purposes. For example, the
three-character sequence 1/2 has essentially the same
meaning as the single character ½, or
U+00BD. Similarly, the character 2 and the
superscript character ² are visually different but
similar in meaning. These similarities among characters provide many
opportunities for users to enter text in many different ways. As you
might imagine, text operations such as searching and sorting become
quite complicated if you must consider all the various ways to form
equivalent text.
The Java platform's java.text.Collator class
understands Unicode text forms and normalizes text for accurate
comparisons. The normalization process converts text from disparate
text forms to a single form that allows accurate text processing.
Until Java SE 6, Collator used private APIs to
normalize text. However, those APIs are now public in Java SE 6.
Use java.text.Normalizer to normalize text. You might
want to normalize text for text-processing operations, serialization,
transfer, or database storage. The API has only two static methods:
normalize and isNormalized. As you might
expect, the normalize method will normalize text into a
specific form. The isNormalized method checks whether
text is already normalized to a specific form.
The Normalizer.Form enumeration represents each Unicode
normalization form:
NFD (Normalization Form D)
NFC (Normalization Form C)
NFKD (Normalization Form KD)
NFKC (Normalization Form KC)
NFD is canonical decomposition. The decomposition process
converts composed character forms to combining sequences as mapped
by the Unicode Standard. For example, the single code point
U+00F1 for the ñ character becomes
the decomposed combining sequence U+006E U+0303 in
NFD. The new sequence contains the common character
n followed by a combining tilde, ˜.
NFC is canonical decomposition followed by canonical
composition. After canonically decomposing text, the process maps
combining sequences into standard composed code points. For example,
applying NFC to the sequence U+0065 U+0300
creates just a single code point: U+00E8, or è.
NFC is the World Wide Web Consortium's recommended normalization
form to transfer and process text on the Internet.
NFKD is compatibility decomposition. After applying
canonical decomposition, the process applies a compatibility mapping
that transforms some characters to a standard compatible form.
Compatibility is determined by a predefined mapping from one
character to another character, and the Unicode Standard defines and
maintains the mappings. NFKD creates noticeable changes to the TRADE MARK SIGN character, which has code point U+2122.
Compatibility decomposition transforms the single code point to the
characters TM, which are the common characters for
LATIN CAPITAL LETTER T (U+0054) and
LATIN CAPITAL LETTER M (U+004D).
NFKC is compatibility decomposition followed by
canonical composition. This normalization form tries to create
composed characters that are compatible to the original characters.
Equivalent compatible characters are defined by the Unicode
Standard. If you apply NFKC to the code point
U+1E9B, LATIN SMALL LETTER LONG S WITH DOT
ABOVE, the decomposition step creates the sequence
U+017F U+0307. Finally, the composition step transforms the
sequence to a single composite character U+1E61.
The following code sample shows how to use the Normalizer
class to transform text to Normalization Form D (NFD):
String strName = "Jos\u00E9"; // using a composed é
String strNFD = Normalizer.normalize(strName, Normalizer.Form.NFD);
|
The resulting string strNFD now contains five code point
values: Jose´. These characters have the Unicode
values U+004A U+006F U+0073 U+0065 U+0301.
You can also test whether text is in a specific normalization form
using the isNormalized method:
boolean bNormalized = Normalizer.isNormalized(strNFD, Normalizer.Form.NFD);
System.out.printf("NFD? %b\n", bNormalized);
|
International Domain Names
The Internationalizing Domain Names in Applications (IDNA)
standard, defined by RFC 3490, describes the fact that domain names
are no longer restricted to the ASCII character set. With a few
restrictions, the full set of characters in the Unicode 3.2 standard
are available to define domain names. Unfortunately, domain name
server (DNS) and resolver services are not all fully capable of
reliably storing and using non-ASCII characters. The IDNA solution
defines a method for representing non-ASCII characters with an
encoding that uses only ASCII characters. The result is that DNS and
name resolver software continue to function with an ASCII-compatible
encoding (ACE), but end users can use internationalized domain names
using an expanded set of Unicode characters.
Java SE 6 supports the IDNA standard by providing the
java.net.IDN
class. This class provides methods for
converting a Unicode domain name to an ASCII-compatible name. The
available operations are toASCII and
toUnicode. Applications should convert domain names to
ACE using the toASCII method before submitting the
domain names to DNS or resolver services. Applications can use the
toUnicode method to create the Unicode text that users
should see.
If you enter a non-ASCII domain name into your application, the application
should convert the name using the toASCII method before submitting
it across the Internet.
// Retrieve the domain name from the user interface.
String strUnicodeName = txtUnicodeName.getText();
// Convert the international domain name to
// an ASCII-compatible encoding.
String strACEName = IDN.toASCII(strUnicodeName);
|
Using the Japanese domain name shown in Figure 1, the conversion stores the
text xn--wgv71a119e.jp in the strACEName variable. The
new text is the ACE equivalent of the Japanese domain name.
Figure 1. The IDN class creates an encoded equivalent name for DNS and
resolver software.
|
The text xn--wgv71a119e doesn't mean anything to
most people. It's encoded text, suitable for machine or software
consumption. Your applications should use the toUnicode
method to convert these ASCII-encoded names into a suitable form
that people can typically read and understand. The following code
snippet shows how to convert the text back to its original form:
String strACEName = txtACEName.getText();
String strUnicodeName = IDN.toUnicode(strACEName);
|
Japanese Calendars
Your customers in Japan frequently use two calendars, the
Gregorian calendar and the traditional Japanese Imperial calendar.
Although everyone is familiar with the Gregorian calendar, and it may
be used more often than not, the Japanese government often uses
the Imperial calendar in its forms and documents. The Imperial
calendar defines eras based on the reigning period of Japanese
emperors.
The Java platform provides calendars by way of the
getInstance method of the java.util.Calendar class. You can construct a
Japanese
Imperial calendar by using the locale ja_JP_JP like
this:
Calendar calJapanese = Calendar.getInstance(new Locale("ja", "JP", "JP")); |
Once you've created the calendar, you can use it to set, retrieve,
and manipulate dates using Imperial calendar rules for era and year
names.
The difference between Gregorian and Imperial calendars is most
obvious when you format dates. The java.text.SimpleDateFormat
and java.text.DateFormat classes support date formats
for the new calendar. Create a formatter and display the current date
using code like this:
Date now = new Date();
Locale localeJapanese = new Locale("ja", "JP");
Locale localeImperialJapanese = new Locale("ja", "JP", "JP");
DateFormat dfGregorian = DateFormat.getDateInstance(DateFormat.FULL, localeJapanese);
DateFormat dfImperial = DateFormat.getDateInstance(DateFormat.FULL, localeImperialJapanese);
String strGregorianDate = dfGregorian.format(now);
String strImperialDate = dfImperial.format(now);
txtGregorianDate.setText(strGregorianDate);
txtImperialDate.setText(strImperialDate);
|
When using the ja_JP locale, DateFormat
produces a Gregorian date using Japanese characters for year, month,
and day terms. When using the ja_JP_JP locale,
DateFormat creates a date string using
the new Imperial calendar. Figure 2 shows a date in which the Gregorian year
2007 shows as the Imperial year Heisei 19.
Figure 2. Java SE 6 provides support for the Japanese Imperial calendar.
|
New Supported Locales
With Java SE 6, the already long list of supported locales just
got longer. The platform now includes new locales that are fully
supported by the various locale-sensitive classes. Locale data comes
from the increasingly popular CLDR data. Although the platform uses CLDR data for the
new locales, pre-existing locales in the platform are
unaffected.
 |
Chinese (Simplified) |
Singapore |
zh_SG
|
English |
Malta |
en_MT
|
English |
Philippines |
en_PH
|
English |
Singapore |
en_SG
|
Greek |
Cyprus |
el_CY
|
Indonesian |
Indonesia |
in_ID
|
Irish |
Ireland |
ga_IE
|
Japanese (Japanese Imperial calendar) |
Japan |
ja_JP_JP
|
Malay |
Malaysia |
ms_MY
|
Maltese |
Malta |
mt_MT
|
Serbian |
Bosnia and Herzegovina |
sr_BA
|
Serbian |
Serbia and Montenegro |
sr_CS
|
Spanish |
United States |
es_US
|
Summary
Java SE 6 updates the platform's already extensive internationalization support by opening up the platform
to allow developers more control
over how resources are found and cached. Also, using the
Locale-Sensitive Services SPI, you can add locale support that is
not already in Java SE 6. The Normalizer class is no
longer private. You can use it to normalize text into the four forms
defined by the Unicode Standard: NFC, NFD,
NFKC, NFKD. You don't have to limit your
application to plain ASCII domain names. The IDN class
provides an API to convert non-ASCII domain names to usable
ASCII-compatible encodings suitable for interacting with DNS and
resolver services. You can now use and format dates using the
Japanese Imperial calendar. Finally, more than a dozen new locales
are available, and their data comes from the Unicode Consortium's
CLDR project. The new CLDR-based locale definitions do not affect
existing locales.
For More Information
About the Authors
John O'Conner is an engineer and writer at Sun
Microsystems. He coaches Little League baseball and AYSO soccer teams, which are always populated
by at least one of his five children.
Naoto Sato is a Java
internationalization engineer in the client software group at Sun
Microsystems. Currently, his work is focused on enhancements of
locales in the Java platform. Before joining Sun, he worked with the internationalization team at IBM Japan.
|