|
In This Issue
Welcome to the Core Java Technologies Tech Tips for April 2007. Core Java
Technologies Tech Tips provides tips and hints for using core Java technologies
and APIs in the Java Platform, Standard Edition 6 (Java SE 6).
In this issue provides tips for the following:
» Compiling with the Java Compiler API
» Regular Expressions
These tips were developed using Java SE 6. You can download Java SE 6 from the Java SE Downloads page.
The author of this month's tips is John Zukowski, president and principal consultant of JZ Ventures, Inc..
Compiling with the Java Compiler API
From day one, the standard Java platform has been lacking standard interfaces
to call and generate Java byte codes using its compiler. Using Sun's
implementation of the platform, a user could access the non-standard Main class
of the com.sun.tools.javac package to compile your code (found in the tools.jar
file in the lib subdirectory). However, that package doesn't provide standard,
public programming interfaces. Users of other implementations don't
necessarily have access to the class. With Java SE 6 and its new Java Compiler
API defined by JSR-199, you can access the javac compiler tool from your own
applications.
There are two ways to use the tool. One way is simple, and the other is more
complicated but allows you to manipulate more options. You'll first use the
simpler way first to compile the "Hello, World" program, shown here:
public class Hello {
public static void main(String args[]) {
System.out.println("Hello, World");
}
}
|
To invoke the Java compiler from your Java programs, you need to access the
JavaCompiler interface. Among other things, accessing the interface
allows you to set the source path, the classpath, and the destination
directory. Specifying each of the compilable files as a JavaFileObject
instance allows you to compile each of them. However, you don't quite need to
know about JavaFileObject just yet.
Use the ToolProvider class to request the default implementation of the
JavaCompiler interface. The ToolProvider class provides a
getSystemJavaCompiler() method, which returns an instance of the JavaCompiler
interface.
JavaCompiler compiler = ToolProvider.getSystemJavaCompiler();
|
The simplest way to compile with the JavaCompiler is to use the run() method,
which is defined in the Tool interface, which it implements:
int run(InputStream in,
OutputStream out,
OutputStream err,
String... arguments)
|
Pass null stream arguments to use the defaults of System.in, System.out, and
System.err respectively for the first three arguments. The varargs set of
String objects represents the filenames to pass into the compiler.
Thus, to compile the Hello class source previously shown, you need
the following:
int results = tool.run(null, null, null, "Hello.java");
|
Assuming no compilation errors, this will generate the Hello.class file in the
destination directory. Had there been an error, the run() method sends output
to the standard error stream, which is the third argument of the run() method.
The method returns a non-zero result when errors occur.
You can use the following code to compile the Hello.java source file:
import java.io.*;
import javax.tools.*;
public class CompileIt {
public static void main(String args[]) throws IOException {
JavaCompiler compiler =
ToolProvider.getSystemJavaCompiler();
int results = compiler.run(
null, null, null, "Hello.java");
System.out.println("Result code: " + results);
}
}
|
Once you compile the CompileIt program once, you can run it multiple
times without recompilation when you need to change or recompile the Hello.java
source. Assuming no errors, running CompileIt will produce the following output:
> java CompileIt
Result code: 0
|
Running CompileIt also produces a Hello.class file in the same directory:
> ls
CompileIt.class
CompileIt.java
Hello.class
Hello.java
|
You could stop there since that is sufficient to use the now standard compiler,
but there is more. You have a second way to access the compiler for when you
want better access to the results. More specifically, this second way allows
developers to present the compilation results in a more meaningful manner,
rather than just pass along the error text that went to stderr. The better
approach to using the compiler takes advantage of the StandardJavaFileManager
class. The file manager provides a way to work with regular files for both
input and output operations. It also reports diagnostic messages with the help
of a DiagnosticListener instance. The DiagnosticCollector class you will be
using is just one such implementation of that listener.
Before identifying what needs to be compiled, you need a file manager. Create
a file manager in two basic steps: create a DiagnosticCollector and then ask
the JavaCompiler for the file manager with its getStandardFileManager() method.
Pass the DiagnosticListener object to the getStandardFileManager() method. This
listener reports non-fatal problems and you can optionally share it with the
compiler by passing it into the getTask() method later.
DiagnosticCollector<JavaFileObject> diagnostics =
new DiagnosticCollector<JavaFileObject>();
StandardJavaFileManager fileManager =
compiler.getStandardFileManager(diagnostics, aLocale, aCharset);
|
You could provide a null diagnostics listener to the call, but that is
just about the same as using the earlier compilation method.
Before looking at the details of StandardJavaFileManager, the compilation
process involves a single method of JavaCompiler called getTask(). It
takes six arguments and returns an instance of an inner class
called CompilationTask:
JavaCompiler.CompilationTask getTask(
Writer out,
JavaFileManager fileManager,
DiagnosticListener<? super JavaFileObject> diagnosticListener,
Iterable<String> options,
Iterable<String> classes,
Iterable<? extends JavaFileObject> compilationUnits)
|
Most of these arguments can be null, with logical defaults.
* out: System.err
* fileManager: compiler's standard file manager
* diagnosticListener: compiler's default behavior
* options: no command-line options to compiler
* classes: no class names for annotation processing
The last argument, compilationUnits, really shouldn't be null as that is
what you want to compile. That brings us back to StandardJavaFileManager.
Notice the argument type: Iterable<? extends JavaFileObject>. Two methods of
StandardJavaFileManager give this result. You can either start with a List of
File objects, or a List of String objects, representing the file names:
Iterable<? extends JavaFileObject> getJavaFileObjectsFromFiles(
Iterable<? extends File> files)
Iterable<? extends JavaFileObject> getJavaFileObjectsFromStrings(
Iterable<String> names)
|
Actually, anything that is Iterable can be used to identify the collection of
items to compile here, not just a List. A List just happens to be the easiest
to create:
String[] filenames = ...;
Iterable<? extends JavaFileObject> compilationUnits =
fileManager.getJavaFileObjectsFromFiles(Arrays.asList(filenames));
|
You now have all the necessary information to compile your source files. The
JavaCompiler.CompilationTask returned from getTask() implements Callable. So,
to start the task, invoke the call() method:
JavaCompiler.CompilationTask task =
compiler.getTask(null, fileManager, null, null, null, compilationUnits);
Boolean success = task.call();
|
Assuming no compilation warnings or errors, the call() method will compile all
the files identified by the compilationUnits variable, including all compilable
dependencies. To find out if everything succeeded, check the Boolean return
value for success. The call() method returns Boolean.TRUE only if all the
compilation units compile. On any error, the method returns Boolean.FALSE.
Before showing the working example, let us add one last thing, the
DiagnosticListener, or more specifically, its implementer DiagnosticCollector.
Passing the listener as the third argument to getTask() allows you to
ask after compilation for the diagnostics:
for (Diagnostic diagnostic : diagnostics.getDiagnostics()) {
System.console().printf(
"Code: %s%n" +
"Kind: %s%n" +
"Position: %s%n" +
"Start Position: %s%n" +
"End Position: %s%n" +
"Source: %s%n" +
"Message: %s%n",
diagnostic.getCode(), diagnostic.getKind(),
diagnostic.getPosition(), diagnostic.getStartPosition(),
diagnostic.getEndPosition(), diagnostic.getSource(),
diagnostic.getMessage(null));
}
|
And lastly, you should call the file manager's close() method.
Putting all that together gives us the following program, to again just compile
the Hello class:
import java.io.*;
import java.util.*;
import javax.tools.*;
public class BigCompile {
public static void main(String args[]) throws IOException {
JavaCompiler compiler = ToolProvider.getSystemJavaCompiler();
DiagnosticCollector<JavaFileObject> diagnostics =
new DiagnosticCollector<JavaFileObject>();
StandardJavaFileManager fileManager =
compiler.getStandardFileManager(diagnostics, null, null);
Iterable<? extends JavaFileObject> compilationUnits =
fileManager.getJavaFileObjectsFromStrings(Arrays.asList("Hello.java"));
JavaCompiler.CompilationTask task = compiler.getTask(
null, fileManager, diagnostics, null, null, compilationUnits);
Boolean success = task.call();
for (Diagnostic diagnostic : diagnostics.getDiagnostics()) {
System.console().printf(
"Code: %s%n" +
"Kind: %s%n" +
"Position: %s%n" +
"Start Position: %s%n" +
"End Position: %s%n" +
"Source: %s%n" +
"Message: %s%n",
diagnostic.getCode(), diagnostic.getKind(),
diagnostic.getPosition(), diagnostic.getStartPosition(),
diagnostic.getEndPosition(), diagnostic.getSource(),
diagnostic.getMessage(null));
}
fileManager.close();
System.out.println("Success: " + success);
}
}
|
Compiling and running this program will just print out the success message:
> javac BigCompile.java
> java BigCompile
Success: true
|
However, if you change the println method to the mistyped pritnln method, you
instead get the following when run:
> java BigCompile
Code: compiler.err.cant.resolve.location
Kind: ERROR
Position: 80
Start Position: 70
End Position: 88
Source: Hello.java
Message: Hello.java:3: cannot find symbol
symbol : method pritnln(java.lang.String)
location: class java.io.PrintStream
Success: false
|
Using the Compiler API, you can do much more than what has been presented in
this brief tip. For example, you can control the input and output directories
or highlight the compilation errors in an integrated editor environment. Now,
thanks to the Java Compiler API, you can do all that with standard API calls.
For more information on the Java Compiler API and JSR 199, see the JSR 199 specification.
Regular Expressions in Java SE
Regular expression or regex support has been a part of the Java Platform since
version 1.4. Found in the java.util.regex package, regex classes support
pattern matching similar to what the Perl language provides but use Java
language syntax and classes. The whole of the package is limited to three
classes: Pattern, Matcher, and PatternSyntaxException. Version 1.5 introduced
the MatchResult interface.
Use the two classes Pattern and Matcher together. Define the regular
expression with the Pattern class. Then use the Matcher class to check the
pattern against the input source. You encounter the exception when the pattern
has a syntax error in the expression.
Neither class has a constructor. Instead, you compile a regular expression to
get a pattern, and then ask the Pattern returned for its Matcher based on some
input source.
Pattern pattern = Pattern.compile( <regular expression> );
Matcher matcher = pattern.matcher( <input source> );
|
Once you have a Matcher, you typically process the input source to
find all the contained matches. Use the find() method to locate matches of the
pattern in the input source. Each call to find() continues from where the last
call left off, or position 0 for the first call. That which is matched is then
returned by the group() method.
while (matcher.find()) {
System.out.printf"Found: \"%s\" from %d to %d.%n",
matcher.group(), matcher.start(), matcher.end());
}
|
The following code shows a basic regular expression program, which prompts the
user for both the regular expression and the string to compare it against:
import java.util.regex.*;
public class Regex {
public static void main(String args[]) {
Console console = System.console();
// Get regular expression
String regex = console.readLine("%nEnter expression: ");
Pattern pattern = Pattern.compile(regex);
// Get source
String source = console.readLine("Enter input source: ");
Matcher matcher = pattern.matcher(source);
// Show matches
while (matcher.find()) {
System.out.printf("Found: \"%s\" from %d to %d.%n",
matcher.group(), matcher.start(), matcher.end());
}
}
}
|
So, what exactly does a regular expression look like? The Pattern class
provides more in-depth details, but basically a regular expression is a
sequence of characters that tries to match another sequence of characters. For
instance, you can look for the double el "ll" string literal pattern in the
"Hello, World" string. The previous program would find the "ll" pattern at
starting position 2 and ending position 4. The ending position is the position
of the next character after the end of the matched string pattern.
Pattern strings like "ll" are not very interesting, reporting only where they
are literally in the input source. Regular expression patterns can include
special metacharacters. Metacharacters provide regular expressions with
powerful matching abilities. You can use the 15 characters "([{\^-$|]})?*+." as
metacharacters in regular expressions.
Some metacharacters indicate character groupings. For instance, the bracket
characters [ and ] allow you to specify a set of characters in which a match
succeeds if any of the enclosed characters is found in the text. For instance,
the pattern "co[cl]a" will match the words coca and cola. It won't match cocla
since [] is used to match only a single character. You'll be shown more on
quantifiers shortly when it is okay to match something multiple times.
Besides trying to match individual characters, you can use the bracket
characters [ and ] to match a range of characters, like the letters from j-z,
specified as [j-z]. These can also be combined with a string literal, as in
"foo[j-z]" which would succeed in finding the match fool, but fail with food,
since l is within the range j to z and d is not. You can also use the ^
character to represent negation, with of a string literal or a range.
The pattern "foo[^j-z]" will match words that start with foo but do not end
with a letter from j through z. So the string food would now succeed in
matching. Multiple ranges can be combined together like [a-zA-Z] to mean the
letters a through z as lowercase or uppercase characters.
While string literals are great for a first lesson on regular expressions, the
more typical thing most people use in regular expressions is the predefined
character classes. This is where the metacharacters . and \ come into play. The
period . is used to represent any character. So, the regular expression ".oney"
would match money and honey, and any other set of 5 characters that ends in
oney. The \ on the other hand is used with other characters to represent a
whole set of letters. For instance, while you could use [0-9] to represent the
set of digits, you can also use \d. You can also use [^0-9] to represent the
set of characters that aren't digits. Or, you can use the predefined character
class string of \D. All of these character class strings are defined in the
Java platform documentation for the Pattern class, as they aren't all as easy
to remember. Here is a subset of some special predefined character classes:
* \s -- whitespace
* \S -- non-whitespace
* \w -- word character [a-zA-Z0-9]
* \W -- non-word character
* \p{Punct} -- punctuation
* \p{Lower} -- lowercase [a-z]
* \p{Upper} -- uppercase [A-Z]
|
One thing that is important to point out with the predefined character strings
isn't immediately noticeable. If you use one of these strings in the Regex
program shown above, you enter it as shown. \s matches whitespace. If, however,
you want to hardcode the regular expression in your Java source file, you have
to remember that the \ character is treated special. You must escape the string
in your source:
String regexString = "\\s";
|
Here, the \\ represents a single backslash in the string. There are other
special strings for representing other string literals:
* \t -- tab
* \n -- newline
* \r -- carriage return
* \xhh -- hex character 0xhh
* \uhhhh -- hex character 0xhhhh
|
Quantifiers make regular expressions more interesting, at least when
combined with other expressions like character classes. For instance, if you
want to match a string of three characters from a-z, you could use the pattern
"[a-z][a-z][a-z]" but you don't have to. Instead of repeating the string, you
add a quantifier after the pattern. For this specific example, "[a-z][a-z][a-z]"
can also be represented as ":[a-z]{3}" here. For a specific amount, the number
goes with {} brackets. You can also use ?, *, or + to represent zero or once,
zero or more, or one or more, respectively.
The pattern [a-z]? matches a character from a-z zero or once. The pattern
[a-z]* matches a character from a-z zero or more times. The pattern [a-z]+
matches a character from a-z one or more times.
Use quantifiers carefully, paying special attention to quantifiers that allow
zero matches.
When using the bracket symbols {} as quantifiers, you must specify a range. {3}
means exactly 3 times, but you could also say {3,}, which means at least 3
times. The quantifier {3, 5} matches a pattern from 3 to 5 times.
There is much more to regular expressions than shown here. The art of using
them involves coming up with the right regular expression for the situation at
hand. Try out some different expressions with the earlier Regex program to see
if they match what you are expecting. Be sure to try out the different
quantifiers to get an understanding of their differences. Notice that
quantifiers typically try to include the largest number of characters for a
possible match.
For a more detailed look at regular expressions, try out the Regular Expressions
lesson in The Java Tutorial online.
Also, visit the javadoc for the Pattern class.
Developer Assistance
Need programming advice on Java SE? Try Developer Expert
Assistance.
|