|
Articles Index
by Allan Jacobs
October 2000
The C language utility sprintf is for formatting strings of characters and numbers. This article documents the use of a Java programming language
class, PrintfFormat, whose behavior is based on the sprintf specification.
Source code is provided.
The C-language utility sprintf is not in the Java class library. This article documents the use of a Java programming language implementation of sprintf, suitably modified for the Java type system. It is intended for use by programmers who have to translate legacy C or C++ applications into Java, especially those with time deadlines too tight to allow a rewrite to use the Java classes DecimalFormat, NumberFormat, and MessageFormat.
Users of the code provided in this article should note that there is an effort underway to
add a printf-like facility directly into the J2SE platform. This work is being done under
the Java Community Process as part of
JSR-51: New I/O APIs for the Java Platform. There is not necessarily any connection
between the PrintfFormat class presented in this article and the facility being developed
for JSR-51.
Java is derived from C++, which is derived from C. The library functions that ship with the JDK free of charge is large. Programmers faced with the task of converting C or C++ code to Java can usually find the tools to do so somewhere in the set of Java APIs. The tools for formatting text and numbers in Java are the classes in the package java.text. These classes include DecimalFormat, NumberFormat, and MessageFormat. Code conversion would be easier if the C library, especially printf, had a closer Java programming language analogue than the members of the java.text package. This is especially true in applications where the formatting has to be done in a field of a particular width and alignment within such a field is important.
The core routine in the printf family of procedures is sprintf. The routine sprintf is a method that takes a control string and an arbitrary set of input arguments. If the types of the arguments are consistent with the control string then the output from sprintf will be a string containing formatted versions of the input arguments.
The following C code fragment formats an int i and a double precision floating
point d in a string x and then prints the string on the console.
int i=10; double d=10.0; char x[14];
sprintf(x,"Example=%d %lg\n",i,d);
printf(x);
Control strings are sequences made from literals and conversion specifications. In this case, the control string is "Example=%d %3.0lg\n". The int i is formatted using the conversion specification "%d" and the double d is formatted by the conversion specification "%3.0lg". The console will display:
Example=10 10
Control strings specify the data type of the argument expected, define field widths, specify alignment within a field, and choose fill characters (space or 0). For integer type arguments, the control string can specify the radix (8, 10, or 16) of the output. For floating-point arguments, the control string controls the number of digits after the decimal point; it can also guide the choice between decimal and scientific notation.
The sprintf utility aids programmers who are converting legacy code into 100% Java code so that it can be run safely across the Internet. The easiest way to do this is to invoke sprintf itself, using Java Native Interface. The advantage of a Java implementation, the path taken here, is that the implementation will run everywhere.
The purpose of sprintf is to format an array of Java objects and return the conversion as a Java string. Usually the Java object will be an instance of one of the wrapper types in java.lang: Byte, Character, Double, Float, Integer, Long, or Short. Unwrapped Java data can be formatted when the control field only contains a single conversion character. If more complex control fields capable of formatting unwrapped data types are desired, users are free to subclass PrintfFormat.
The deviations are there because the language C utility sprintf recognizes data types that are not supported in Java and are therefore superfluous. These include pointer types of all kinds, unsigned integers, and long doubles. Other deviations are features that were put into sprintf to support localization. These include the use of asterisks in control strings that allow the dynamic assignment of field widths and precisions. Also missing is support for control strings that begin with %n$ where n is a digit string specifying the index of the next argument to be converted.
Readers who understand C sprintf may wish to try a sprintfdemonstration applet. If you do not have this knowledge, it may be advisable to read the rest of the article and return to try the demonstration applet. To run the demonstration: use the "Type" and "Value" Choice widgets to specify an object for printing. Once the object has been specified, add it to the print list by clicking on the "Add" button. Repeat this process as many times as desired.
Once you are done with data input, type the control string with which you want to format the data values into the text field. Then, tell the applet that you are ready to format by clicking on the "Format" button.
Formatting Strings
Formatting strings is the easiest of the operations that sprintf does because it is the operation with the fewest options. The control string can be used to set the field width, the maximum number of characters to be converted, and the alignment within a field.
The following code example formats the input string (the literal "target"), embedding
it into a larger string that begins with "Pre" and ends with "Post". The conversion is done in two steps. The first is two create a new instance of the class PrintfFormat. This is done with a constructor that takes a sprintfcontrol string as an input argument. The second step is to use a sprintfmethod of the class PrintfFormat to create a string that can be used for other purposes (printing to the console, for example).
System.out.println(new PrintfFormat("Pre %s
Post").sprintf("target"));
Embedding this fragment into a Java program and executing it causes the program to print the following on the console:
% Pre target Post
Note that control strings begin with percent signs (%). How do you specify a literal containing a %? Use a conversion specification for percent signs: %%. The following example appends "Post%":
System.out.println(new
PrintfFormat("Pre %s Post%%").sprintf("target")
);
Executing a Java example that contains the above line of code prints the following on the console:
% Pre target Post%
Blanks in control strings are not separators. The Java fragment:
new PrintfFormat("Pre%sPost").sprintf("target");
is perfectly legitimate. It's output is a string whose value is "PretargetPost".
Because of the way Java is lexically analyzed, escape sequences are handled with a minimum of fuss. Backspaces (\b), horizontal tabs (\t), linefeeds (\n), form feeds (\f), carriage returns (\r), double quotes (\"), single quotes (\'), and backslashes (\\) can all be placed in control strings and will have the same meanings that they normally have in string literals.
Field width for strings are adjusted by a number before the "s" in the conversion specification for a string. Alignment within that field is controlled with an optional "-" flag character after the "%" in the conversion specification. The following will result in a string % six character:
new PrintfFormat("\'%6s\').sprintf("12345")
In this case, the default padding character (the blank) is added to the default location (the beginning). The result is:
"Pre 12345Post"
The alignment flag goes in front of the field width in a string conversion specification. The maximum number of characters to convert is specified in digits following the decimal point in the numerical part of the conversion specification.
// ' 12345'
new PrintfFormat("\'%7s\'").sprintf("12345");
// '12345 '
new PrintfFormat("\'%-7s\'").sprintf("12345");
// ' 123'
new PrintfFormat("\'%7.3s\'").sprintf("12345");
// '123 '
new PrintfFormat("\'%-7.3s\'").sprintf("12345");
|
The conversion specification "%7s" formats a string argument in a field of width seven. If the string has fewer than seven characters, it is right justified within the field and blanks are printed to the left. A conversion specification "%-7s" changes this behavior by demanding right justification within a field of width seven. The conversion specifications "%7.3s" and "%-7.3s" also format a string within a field of width seven. However, only the first three characters are formatted and the remainder, if any, are discarded.
Conversion Specifications
With some experience with conversion specifications as applied to string conversions, a more formal discussion of conversion specifications will make more sense. Readers with C language experience will find some, perhaps all, of the material that follows familiar. Conversion specifications for all data types look similar to those used for strings. Every conversion specification begins with a percent sign (%). After the leading percent sign, there follows a sequence: zero or more flags, an optional minimum field width, an optional precision, an optional h or l (C sprintf allows for an optional L, but the contexts where L is useful are not present in Java programs), and a character that specifies the type of formatting to be done.
Conversion Specifiers and Modifiers
Conversion specifiers give PrintfFormat instances information about what type of data is going to appear at which position. The conversion specifiers that PrintfFormat uses are those that are useful in Java programs; one of c, d, e, E, f, g, i, o, s, x, and X. They appear at the end of a conversion specification.
The d and i specifiers indicate that there is an Integer (or int, depending on the sprintf variant) argument to be converted to a string of the form [-]dddd. The precision (default 1) is the maximum number of digits to appear. If the string that is generated fits in fewer digits, the result will be expanded with leading spaces or zeros.
The o specifier is used to format an integer value in base eight. The x and X specifiers format integer arguments in base sixteen. Primitive integer types like byte, short, and char are cast to an int automatically at the sprintf call to using method argument conversions. The same conversion is done for arguments of reference types Byte, Short, Char, and Int; their values are retrieved and cast to int internally and a string representation is constructed from this int value.
Small integer types (byte, short, char) values passed to sprintf are converted, using the usual Java method conversion mechanism to int type. What if you want to have the output format reflect the size of the input data? For instance, if you are requesting a hex string representation of the input, you probably want the conversion to reflect the number of bytes in the input data. printf utility uses the letter h to indicate that the data it is manipulating is to be cast to a short before formatting. If the d, i, o, x, or X specifier is preceded by an h modifier then the PrintfFormat instance knows that the corresponding argument is to be formatted as a value of type Short instead of Integer (or short instead of int for the printf method for unwrapped data).
A similar mechanism can be used to cast the input argument to a long before formatting. Of course, this is most useful when the input is a Long (or long for the routines that handle primitive data types). In this case, the l modifier tells PrintfFormat that the corresponding argument is of type Long instead of Integer (or long instead of int).
Examples and their output strings appear below:
// '0xffff'
new PrintfFormat("\'%hx\'").sprintf(-1) ;
// '0xffffffff'
new PrintfFormat("\'%x\'").sprintf(-1L) ;
// '0XFFFFFFFFFFFFFFFF'
new PrintfFormat("\'%lX\'").sprintf(-1L);
|
The h and l modifiers are used in these examples to change the size of the output string in a manner pretty much independent of the size of the integral value that is to be formatted.
The c specifier is used to format char input. In the example below, the int 97 is formatted as a character using the %c control specification.
// 'a'
new PrintfFormat("\'%c\'").sprintf(97);
Floating point values and objects of type Float and Double can be formatted using sprintf and specifiers e, E, f, and g.
Flags in Conversion Specifications
The flag characters are -, +, #, space, ', and 0. The number of flag characters and there order are unrestricted, though only certain combinations make any sense.
The flag character - means that the result of the conversion will be left justified within the field. The default behavior is to right justify. This flag character is applicable to all types of conversions and has the same meaning for all of them.
The flag character + has meaning for some numeric conversions (d, i, e, f, g, E, F, and G conversion specifiers). It's use signifies that the conversion will always begin with a plus or minus sign. The default is to omit plus signs when formatting non-negative numeric arguments. For the other formats (c, o, s, x, X conversion specifiers) PrintfFormat instances ignore the + flag (the C language specification says that the behavior is undefined).
A space character, when used as a flag character, prefixes a space character to signed numeric conversions (specifiers d, i, e, f, g, E, F, and G) that either do not begin with a sign or that are zero length (the latter situation can occur when a zero argument is converted with an a d, i, o, x, or X specifier with precisions of 0).
A 0 flag character changes the padding character from a space character to a zero. It is applicable when the conversion is right justified so that the padding occurs on the left. This flag character is only applicable in numeric conversions (specifiers d, i, e, f, g, E, F, G, o, x, and X). PrintfFormat instances ignore 0 flag characters for other conversion types (the C language printf specification says that the behavior is undefined). Refer to a C language reference for the interactions between a 0 flag, the precision (when it is specified), and a - flag.
The purpose of the # flag character is to assert the desire for an "alternate form" of conversion. For floating-point conversions (specifiers e, f, g, E, F, G) the result will always contain a decimal point character, even when no digits follow it. Trailing zeros are not removed from the result in g or G conversions. For o conversions, the # flag forces the first digit of the result to be zero. For x and X conversions, the # flag causes the result to begin with a 0x prefix. PrintfFormat instances ignore # flag characters when used with any other conversion type (the C language printf specification says that the behavior is undefined).
An apostrophe ('), when used as a flag character, signifies that a thousands separator (a comma in the English speaking locales) is to be used in formatting a floating-point number. This flag has meaning in floating-point conversions (specifiers e, f, g, E, F, G) and is ignored when used in any other conversion type.
Field Width and Precision
The field width is an optional positive integer that occurs, when used, after the flags in a conversion specification. The field width is a minimum field width. If a conversion is shorter than this minimum it is padded with either blanks or zeros. If a conversion does not fit in the width a user specifies, the conversion uses more space. No conversion is truncated by a field width.
The C language allows field widths to be asterisks. This is meant to indicate that the value of the field width will be set by an argument to the conversion routine. This feature is not supported by PrintfFormat.
Precisions are non-negative integers that appear after decimal points in conversion specifications. The meaning of this integer depends on the conversion type. For d, i, o, x, and X conversions, the precision gives the minimum number of digits. With e, E, and f specifiers, the precision is the number of digits after the decimal point. For g and G conversions, the precision is the maximum number of digits after the decimal point.
Formatting Integral Data
Integers can be fed to PrintfFormat instances as primitive data (byte, short, char, int, and long) or as reference data (Byte, Short, Character, Integer, and Long). There are really only two input forms that need to be considered: int and long. Shorter integers (byte, short, char, Byte, Short, Character, and Integer) are converted, either at the call interface or internally to int. The wrapped long is retrieved internally, so Long inputs are formatted the same way that primitive long data is formatted.
Once the input data has been reduced to either int or long, it can be cast one more time to either a short or long, if requested. The cast to short is performed when the control specifier for the PrintfFormat instance is preceded by an h modifier. A cast to long is done when the control specifier is preceded by an l modifier.
The output string can contain either a decimal (i or d control specifiers), octal (with an o specifier), or hexadecimal (x or X specifiers) representation.
In the examples below, the largest negative short is formatted as a decimal, octal, and hexadecimal string. The size of the octal and hexadecimal representations can be changed by using the h or l modifiers. The first of the examples can be used to show, with the %d format that the short s has the value -32768. The second, third, and fourth examples construct 16, 32, and 64-bit octal representations of -32768, respectively. The fifth, sixth, and seventh examples construct 16, 32, and 64-bit hexadecimal representations of -32768, respectively.
s=Short.MIN_VALUE;
// '-32768'
new PrintfFormat("`%d\'").sprintf(s);
// '100000'
new PrintfFormat("\'%ho\'").sprintf(s);
// '37777700000'
new PrintfFormat("\'%o\'").sprintf(s);
// '1777777777777777700000'
new PrintfFormat("\'%lo\'").sprintf(s);
// '8000'
new PrintfFormat("\'%hx\'").sprintf(s);
// 'ffff8000'
new PrintfFormat("\'%x\'").sprintf(s);
// 'ffffffffffff8000'
new PrintfFormat("\'%lx\'").sprintf(s);
|
Field width is controlled by specifying it in front of the specifier. The two examples below format ints in a field of size 7. By default the field is right justified. The + flag is used to require a sign in the output. So, in this case, the padding consists of five blanks written to the left of the sign.
// ' +1'
new PrintfFormat("\'%+7d\'").sprintf(1);
// ' -1'
new PrintfFormat("\'%+7d\'").sprintf(-1);
The padding character can be changed from a blank to a zero character by using a 0 flag. There is no reason why a 0 flag cannot be used at the same time as a + flag to get a signed, zero-padded conversion.
// '+000001'
new PrintfFormat("\'%+7d\'").sprintf(1);
// '-000001'
new PrintfFormat("\'%07d\'").sprintf(-1);
The following pair of examples require a minimum of three digits in the conversion. This is done in the precision field of the control string, which is the part of a control string immediately preceding it's conversion specifier and after a decimal point. The field width is seven so at most four spaces will be used as padding. In these examples, the alignment within the field has been changed so that the conversion is right justified. This is done with a - flag character.
// '-001 '
new PrintfFormat("\'%-7.3d\'").sprintf(-1);
// '001 '
new PrintfFormat("\'%-7.3d\'").sprintf(1);
|
Sometimes it is useful to precede non-negative integral data formats with blanks instead of nothing or with a plus sign. This is done by using a space as a flag character. The first of the next two code fragments demonstrates that space characters have no effect on negative data inputs. The second shows that a space appears in front of the output when formatting positive integral data values.
// '-001 '
new PrintfFormat("\'%- 7.3d\'").sprintf(-1);
// ' 001 '
new PrintfFormat("\'%- 7.3d\'").sprintf(1);
Formatting Floating Point Data
Floating point values can be fed to a PrintfFormat instance as either primitive data (float or double) or reference data values (Float or Double). There are five distinct formats for floating-point data: e, E, f, g, and G.
The size of the input, 64 or 32 bits, can be indicated by the absence or presence of an L modifier, respectively. In this implementation of sprintf, the L flag really only controls the number of characters devoted to the exponent in the scientific notation formats (e, E, g, and G).
The simplest of the floating-point formats is the f format. It is used to specify the use of simple decimal format. The field width and number of digits after the decimal point can be set with a decimal number in front of the f. In the first of the examples below, 1.0 is format in a field of width 10. The precision defaults to 5. In the second, the precision is set to 3. This causes a total of five spaces to be employed as pad characters. In both cases the decimal representation is right justified within it's field of 10.
// ' 1.00000'
new PrintfFormat("\'%10f\'").sprintf(1.0);
// ' 1.000'
new PrintfFormat("\`%10.3f\'").sprintf(1.0);
Scientific notation is requested when using the e or E formats. Field widths and precisions can be specified just as they can be specified in f formats.
// ' 3.30e+97'
new PrintfFormat("%13.2e").sprintf(0.33e+98);
// ' 3.30e-99'
new PrintfFormat("%13.2e").sprintf(0.33e-98);
The purpose of the g format is to use decimal (f) format for small numbers (exponents greater than -5 and less than the precision), and to use scientific notation (e format) otherwise. Field width is controlled just as it is in e and f formats. Precisions in control specifications, however, only control the maximum number of digits after decimal points in output strings. Use of the g specifier tells the PrintfFormat instance to remove trailing zeros after the decimal point and to remove the decimal point if it has no trailing digits.
// ' 1e-05'
new PrintfFormat("%6.4g").sprintf(1.0e-5);
// '0.0001'
new PrintfFormat("%6.4g").sprintf(1.0e-4);
// ' 1000'
new PrintfFormat("%6.4g").sprintf(1.0e3);
// ' 1e+04'
new PrintfFormat("%6.4g").sprintf(1.0e4);
|
The +, space, -, and 0 flags have the same meanings as they have when used with integral formats and data. That is, a + flag tells PrintfFormat instances to format non-negative values with a leading plus sign; a space flag to format non-negative values with a leading space. A - flag changes the alignment within the field from the default right-justification to left-justification. The use of a 0 flag changes the pad character from a space to a zero. The next five examples illustrate the use of these flags.
// '001.1e+04'
new PrintfFormat("\'%09.2g\'").sprintf(1.1e+4);
// ' 1.1e+e04'
new PrintfFormat("\'% 9.2g\'").sprintf(1.1e+4);
// ' +1.1e+04'
new PrintfFormat("\'%+9.2g\'").sprintf(1.1e+4);
// ' +1.1e+04'
new PrintfFormat("\'%- 9.2g\'").sprintf(1.1e+4);
// '+1.1e+04'
new PrintfFormat("\'%-+9.2g\'").sprintf(1.1e+4);
|
Localization
The PrintfFormat class, like the C language sprintf, tries to customize for a particular language, usually that spoken by the user of the computer running the code. There are two sorts of customizations involved. One of them is that the decimal point is not always a period as it is in English language locales. The other is that the thousands separator is not always a comma as it is in English language locales.
It is not usually necessary to use the PrintfFormat constructor with the Locale argument, but it is available. To use it, just pass a locale as the first argument. The default German locale, for example, has a different floating-point format than the English locales. To see this, use the apostrophe flag and an f control specifier. The purpose of the apostrophe flag is to insist that thousands separators be used in the output string.
// 12,345,678.9
Locale loc=Locale.ENGLISH;
new PrintfFormat(
loc,"%'12.1f").sprintf(12345678.9);
// 12.345.678,9
Locale loc=Locale.GERMAN;
new PrintfFormat(
loc,"%'12.1f").sprintf(12345678.9);
|
Conclusion
While not a perfect replica of the C language sprintf utility, the PrintfFormat class should be helpful in converting a code from C to Java. Users are encouraged, however, to use the java.text classes for formatting data because those have regular maintenance and are a much better path towards internationalizing your software than PrintfFormat.
Source Code:
PrintfFormat.java
PrintfApplet.java
If problems are encountered with the above source code or if you expect a slightly different flavor of sprintf than that presented here, then there is something that can be done about it. There is another site offering sprintf classes might satisfy your needs:
Henrik Bengtsson's fprintf, printf, and sprintf
About the Author
Allan Jacob is a software quality engineer at Sun Microsystems. He has
worked on parallel computer software at Applied Parallel Research and geophysical software at Chevron.
|