Sun Java Solaris Communities My SDN Account Join SDN
 
Core Java Technologies Tech Tips

Compiler APIs and Regular Expressions

In This Issue

Welcome to the Core Java Technologies Tech Tips for April 2007. Core Java Technologies Tech Tips provides tips and hints for using core Java technologies and APIs in the Java Platform, Standard Edition 6 (Java SE 6).

In this issue provides tips for the following:

» Compiling with the Java Compiler API
» Regular Expressions

These tips were developed using Java SE 6. You can download Java SE 6 from the Java SE Downloads page.

The author of this month's tips is John Zukowski, president and principal consultant of JZ Ventures, Inc..

Compiling with the Java Compiler API

From day one, the standard Java platform has been lacking standard interfaces to call and generate Java byte codes using its compiler. Using Sun's implementation of the platform, a user could access the non-standard Main class of the com.sun.tools.javac package to compile your code (found in the tools.jar file in the lib subdirectory). However, that package doesn't provide standard, public programming interfaces. Users of other implementations don't necessarily have access to the class. With Java SE 6 and its new Java Compiler API defined by JSR-199, you can access the javac compiler tool from your own applications.

There are two ways to use the tool. One way is simple, and the other is more complicated but allows you to manipulate more options. You'll first use the simpler way first to compile the "Hello, World" program, shown here:

public class Hello {
  public static void main(String args[]) {
    System.out.println("Hello, World");
  }
}
 

To invoke the Java compiler from your Java programs, you need to access the JavaCompiler interface. Among other things, accessing the interface allows you to set the source path, the classpath, and the destination directory. Specifying each of the compilable files as a JavaFileObject instance allows you to compile each of them. However, you don't quite need to know about JavaFileObject just yet.

Use the ToolProvider class to request the default implementation of the JavaCompiler interface. The ToolProvider class provides a getSystemJavaCompiler() method, which returns an instance of the JavaCompiler interface.

JavaCompiler compiler = ToolProvider.getSystemJavaCompiler();
 

The simplest way to compile with the JavaCompiler is to use the run() method, which is defined in the Tool interface, which it implements:

int run(InputStream in, 
    OutputStream out, 
    OutputStream err, 
    String... arguments)
 

Pass null stream arguments to use the defaults of System.in, System.out, and System.err respectively for the first three arguments. The varargs set of String objects represents the filenames to pass into the compiler.

Thus, to compile the Hello class source previously shown, you need the following:

int results = tool.run(null, null, null, "Hello.java");
 

Assuming no compilation errors, this will generate the Hello.class file in the destination directory. Had there been an error, the run() method sends output to the standard error stream, which is the third argument of the run() method. The method returns a non-zero result when errors occur.

You can use the following code to compile the Hello.java source file:

import java.io.*;
import javax.tools.*;

public class CompileIt {
  public static void main(String args[]) throws IOException {
    JavaCompiler compiler =
        ToolProvider.getSystemJavaCompiler();
    int results = compiler.run(
        null, null, null, "Hello.java");
    System.out.println("Result code: " + results);
  }
}

 

Once you compile the CompileIt program once, you can run it multiple times without recompilation when you need to change or recompile the Hello.java source. Assuming no errors, running CompileIt will produce the following output:

> java CompileIt
Result code: 0
 

Running CompileIt also produces a Hello.class file in the same directory:

> ls
CompileIt.class
CompileIt.java
Hello.class
Hello.java
 

You could stop there since that is sufficient to use the now standard compiler, but there is more. You have a second way to access the compiler for when you want better access to the results. More specifically, this second way allows developers to present the compilation results in a more meaningful manner, rather than just pass along the error text that went to stderr. The better approach to using the compiler takes advantage of the StandardJavaFileManager class. The file manager provides a way to work with regular files for both input and output operations. It also reports diagnostic messages with the help of a DiagnosticListener instance. The DiagnosticCollector class you will be using is just one such implementation of that listener.

Before identifying what needs to be compiled, you need a file manager. Create a file manager in two basic steps: create a DiagnosticCollector and then ask the JavaCompiler for the file manager with its getStandardFileManager() method. Pass the DiagnosticListener object to the getStandardFileManager() method. This listener reports non-fatal problems and you can optionally share it with the compiler by passing it into the getTask() method later.

DiagnosticCollector<JavaFileObject> diagnostics =
    new DiagnosticCollector<JavaFileObject>();
StandardJavaFileManager fileManager =
    compiler.getStandardFileManager(diagnostics, aLocale, aCharset);
 

You could provide a null diagnostics listener to the call, but that is just about the same as using the earlier compilation method.

Before looking at the details of StandardJavaFileManager, the compilation process involves a single method of JavaCompiler called getTask(). It takes six arguments and returns an instance of an inner class called CompilationTask:

JavaCompiler.CompilationTask getTask(
    Writer out,
    JavaFileManager fileManager,
    DiagnosticListener<? super JavaFileObject> diagnosticListener,
    Iterable<String> options,
    Iterable<String> classes,
    Iterable<? extends JavaFileObject> compilationUnits)
 

Most of these arguments can be null, with logical defaults.
* out: System.err
* fileManager: compiler's standard file manager
* diagnosticListener: compiler's default behavior
* options: no command-line options to compiler
* classes: no class names for annotation processing

The last argument, compilationUnits, really shouldn't be null as that is what you want to compile. That brings us back to StandardJavaFileManager. Notice the argument type: Iterable<? extends JavaFileObject>. Two methods of StandardJavaFileManager give this result. You can either start with a List of File objects, or a List of String objects, representing the file names:

Iterable<? extends JavaFileObject> getJavaFileObjectsFromFiles(
    Iterable<? extends File> files)
Iterable<? extends JavaFileObject> getJavaFileObjectsFromStrings(
    Iterable<String> names)
 

Actually, anything that is Iterable can be used to identify the collection of items to compile here, not just a List. A List just happens to be the easiest to create:

String[] filenames = ...;
Iterable<? extends JavaFileObject> compilationUnits =
    fileManager.getJavaFileObjectsFromFiles(Arrays.asList(filenames));

 

You now have all the necessary information to compile your source files. The JavaCompiler.CompilationTask returned from getTask() implements Callable. So, to start the task, invoke the call() method:

JavaCompiler.CompilationTask task =
    compiler.getTask(null, fileManager, null, null, null, compilationUnits);
Boolean success = task.call();

 

Assuming no compilation warnings or errors, the call() method will compile all the files identified by the compilationUnits variable, including all compilable dependencies. To find out if everything succeeded, check the Boolean return value for success. The call() method returns Boolean.TRUE only if all the compilation units compile. On any error, the method returns Boolean.FALSE.

Before showing the working example, let us add one last thing, the DiagnosticListener, or more specifically, its implementer DiagnosticCollector. Passing the listener as the third argument to getTask() allows you to ask after compilation for the diagnostics:

for (Diagnostic diagnostic : diagnostics.getDiagnostics()) {
  System.console().printf(
      "Code: %s%n" +
      "Kind: %s%n" +
      "Position: %s%n" +
      "Start Position: %s%n" +
      "End Position: %s%n" +
      "Source: %s%n" +
      "Message:  %s%n",
      diagnostic.getCode(), diagnostic.getKind(),
      diagnostic.getPosition(), diagnostic.getStartPosition(),
      diagnostic.getEndPosition(), diagnostic.getSource(),
      diagnostic.getMessage(null));
}
 

And lastly, you should call the file manager's close() method.

Putting all that together gives us the following program, to again just compile the Hello class:

import java.io.*;
import java.util.*;
import javax.tools.*;

public class BigCompile {
  public static void main(String args[]) throws IOException {
    JavaCompiler compiler = ToolProvider.getSystemJavaCompiler();
    DiagnosticCollector<JavaFileObject> diagnostics =
        new DiagnosticCollector<JavaFileObject>();
    StandardJavaFileManager fileManager =
        compiler.getStandardFileManager(diagnostics, null, null);
    Iterable<? extends JavaFileObject> compilationUnits =
        fileManager.getJavaFileObjectsFromStrings(Arrays.asList("Hello.java"));
    JavaCompiler.CompilationTask task = compiler.getTask(
        null, fileManager, diagnostics, null, null, compilationUnits);
    Boolean success = task.call();
    for (Diagnostic diagnostic : diagnostics.getDiagnostics()) {
      System.console().printf(
          "Code: %s%n" +
          "Kind: %s%n" +
          "Position: %s%n" +
          "Start Position: %s%n" +
          "End Position: %s%n" +
          "Source: %s%n" +
          "Message:  %s%n",
          diagnostic.getCode(), diagnostic.getKind(),
          diagnostic.getPosition(), diagnostic.getStartPosition(),
          diagnostic.getEndPosition(), diagnostic.getSource(),
          diagnostic.getMessage(null));
    }
    fileManager.close();
    System.out.println("Success: " + success);
  }
}
 

Compiling and running this program will just print out the success message:

> javac BigCompile.java
> java BigCompile
Success: true
 

However, if you change the println method to the mistyped pritnln method, you instead get the following when run:

> java BigCompile
Code: compiler.err.cant.resolve.location
Kind: ERROR
Position: 80
Start Position: 70
End Position: 88
Source: Hello.java
Message:  Hello.java:3: cannot find symbol
symbol  : method pritnln(java.lang.String)
location: class java.io.PrintStream
Success: false
 

Using the Compiler API, you can do much more than what has been presented in this brief tip. For example, you can control the input and output directories or highlight the compilation errors in an integrated editor environment. Now, thanks to the Java Compiler API, you can do all that with standard API calls. For more information on the Java Compiler API and JSR 199, see the JSR 199 specification.

Regular Expressions in Java SE

Regular expression or regex support has been a part of the Java Platform since version 1.4. Found in the java.util.regex package, regex classes support pattern matching similar to what the Perl language provides but use Java language syntax and classes. The whole of the package is limited to three classes: Pattern, Matcher, and PatternSyntaxException. Version 1.5 introduced the MatchResult interface.

Use the two classes Pattern and Matcher together. Define the regular expression with the Pattern class. Then use the Matcher class to check the pattern against the input source. You encounter the exception when the pattern has a syntax error in the expression.

Neither class has a constructor. Instead, you compile a regular expression to get a pattern, and then ask the Pattern returned for its Matcher based on some input source.

Pattern pattern = Pattern.compile( <regular expression> );
Matcher matcher = pattern.matcher( <input source> );
 

Once you have a Matcher, you typically process the input source to find all the contained matches. Use the find() method to locate matches of the pattern in the input source. Each call to find() continues from where the last call left off, or position 0 for the first call. That which is matched is then returned by the group() method.

while (matcher.find()) {
   System.out.printf"Found: \"%s\" from %d to %d.%n",
       matcher.group(), matcher.start(), matcher.end());
}
 

The following code shows a basic regular expression program, which prompts the user for both the regular expression and the string to compare it against:

import java.util.regex.*;

public class Regex {

   public static void main(String args[]) {
       Console console = System.console();

       // Get regular expression
       String regex = console.readLine("%nEnter expression: ");
       Pattern pattern = Pattern.compile(regex);

       // Get source
       String source = console.readLine("Enter input source: ");
       Matcher matcher = pattern.matcher(source);

       // Show matches
       while (matcher.find()) {
           System.out.printf("Found: \"%s\" from %d to %d.%n",
               matcher.group(), matcher.start(), matcher.end());
       }
   }
}
 

So, what exactly does a regular expression look like? The Pattern class provides more in-depth details, but basically a regular expression is a sequence of characters that tries to match another sequence of characters. For instance, you can look for the double el "ll" string literal pattern in the "Hello, World" string. The previous program would find the "ll" pattern at starting position 2 and ending position 4. The ending position is the position of the next character after the end of the matched string pattern.

Pattern strings like "ll" are not very interesting, reporting only where they are literally in the input source. Regular expression patterns can include special metacharacters. Metacharacters provide regular expressions with powerful matching abilities. You can use the 15 characters "([{\^-$|]})?*+." as metacharacters in regular expressions.

Some metacharacters indicate character groupings. For instance, the bracket characters [ and ] allow you to specify a set of characters in which a match succeeds if any of the enclosed characters is found in the text. For instance, the pattern "co[cl]a" will match the words coca and cola. It won't match cocla since [] is used to match only a single character. You'll be shown more on quantifiers shortly when it is okay to match something multiple times.

Besides trying to match individual characters, you can use the bracket characters [ and ] to match a range of characters, like the letters from j-z, specified as [j-z]. These can also be combined with a string literal, as in "foo[j-z]" which would succeed in finding the match fool, but fail with food, since l is within the range j to z and d is not. You can also use the ^ character to represent negation, with of a string literal or a range. The pattern "foo[^j-z]" will match words that start with foo but do not end with a letter from j through z. So the string food would now succeed in matching. Multiple ranges can be combined together like [a-zA-Z] to mean the letters a through z as lowercase or uppercase characters.

While string literals are great for a first lesson on regular expressions, the more typical thing most people use in regular expressions is the predefined character classes. This is where the metacharacters . and \ come into play. The period . is used to represent any character. So, the regular expression ".oney" would match money and honey, and any other set of 5 characters that ends in oney. The \ on the other hand is used with other characters to represent a whole set of letters. For instance, while you could use [0-9] to represent the set of digits, you can also use \d. You can also use [^0-9] to represent the set of characters that aren't digits. Or, you can use the predefined character class string of \D. All of these character class strings are defined in the Java platform documentation for the Pattern class, as they aren't all as easy to remember. Here is a subset of some special predefined character classes:

* \s -- whitespace
* \S -- non-whitespace
* \w -- word character [a-zA-Z0-9]
* \W -- non-word character
* \p{Punct} -- punctuation
* \p{Lower} -- lowercase [a-z]
* \p{Upper} -- uppercase [A-Z]
 

One thing that is important to point out with the predefined character strings isn't immediately noticeable. If you use one of these strings in the Regex program shown above, you enter it as shown. \s matches whitespace. If, however, you want to hardcode the regular expression in your Java source file, you have to remember that the \ character is treated special. You must escape the string in your source:

String regexString = "\\s";
 

Here, the \\ represents a single backslash in the string. There are other special strings for representing other string literals:

* \t -- tab
* \n -- newline
* \r -- carriage return
* \xhh -- hex character 0xhh
* \uhhhh -- hex character 0xhhhh
 

Quantifiers make regular expressions more interesting, at least when combined with other expressions like character classes. For instance, if you want to match a string of three characters from a-z, you could use the pattern "[a-z][a-z][a-z]" but you don't have to. Instead of repeating the string, you add a quantifier after the pattern. For this specific example, "[a-z][a-z][a-z]" can also be represented as ":[a-z]{3}" here. For a specific amount, the number goes with {} brackets. You can also use ?, *, or + to represent zero or once, zero or more, or one or more, respectively.

The pattern [a-z]? matches a character from a-z zero or once. The pattern [a-z]* matches a character from a-z zero or more times. The pattern [a-z]+ matches a character from a-z one or more times.

Use quantifiers carefully, paying special attention to quantifiers that allow zero matches.

When using the bracket symbols {} as quantifiers, you must specify a range. {3} means exactly 3 times, but you could also say {3,}, which means at least 3 times. The quantifier {3, 5} matches a pattern from 3 to 5 times.

There is much more to regular expressions than shown here. The art of using them involves coming up with the right regular expression for the situation at hand. Try out some different expressions with the earlier Regex program to see if they match what you are expecting. Be sure to try out the different quantifiers to get an understanding of their differences. Notice that quantifiers typically try to include the largest number of characters for a possible match.

For a more detailed look at regular expressions, try out the Regular Expressions lesson in The Java Tutorial online.

Also, visit the javadoc for the Pattern class.

Developer Assistance
Need programming advice on Java SE? Try Developer Expert Assistance.

Rate and Review
Tell us what you think of the content of this page.
Excellent   Good   Fair   Poor  
Comments:
Your email address (no reply is possible without an address):
Sun Privacy Policy

Note: We are not able to respond to all submitted comments.