Sun Java Solaris Communities My SDN Account Join SDN
 
Reference

JCanyon: GRAND CANYON FOR JAVA

 
JCanyon: GRAND CANYON FOR JAVA

As seen at JavaOne 2001

JCanyon screenshot

Kenneth Russell, Sun Microsystems, Inc.

Contents

i. Quick Start with Java Web Start

System requirements:

  • 330 MB of free disk space
  • 192 MB or more RAM
  • OpenGL(R) (is pre-installed on Windows(R) PCs and many GNU/Linux distributions)

System recommendations:

  • 3D accelerator card with hardware texture mapping (has been tested extensively with NVIDIA(R) GeForce(R) family of products)

The launch links below require Java Web Start. As of this writing J2SE 1.4 was still in beta, so JDK 1.4 must also be installed before launching the jcanyon demo. The follow-on release to JDK 1.4 Beta will provide substantial performance improvements for this and other NIO-intensive applications. The links below are designed to work on the following operating systems:

  • Solaris/SPARC with OpenGL for Solaris/SPARC
  • Solaris/x86 with Mesa
  • GNU/Linux on x86 (Mesa is usually pre-installed; vendors may provide drivers for hardware acceleration)
  • Windows 95/98/ME/NT/2000 (OpenGL is pre-installed)

NOTE: the Web Start links do not work yet on Linux and Solaris due to bug 4423128, which will be fixed in 1.4. It is recommended that you change your display settings to 640x480 before launching the demo to simulate full-screen mode; there is currently poor interaction (4468476) between 1.4's full-screen support and OpenGL on Windows.

Launch the jcanyon demo for JDK 1.4 Beta 1 ("1.4.0-beta")

Launch the jcanyon demo for subsequent JDK 1.4 releases

The first time the program is launched, a 20 MB jar file will be downloaded and decompressed onto your local disk into a directory you choose. The location of this directory is written into a file called ".jcanyon" located in your home directory (as specified by the system property "user.home".) In case of problems during data downloading or decompression, delete the .jcanyon file as well as the associated data directory and relaunch the demonstration.

The complete source code for jcanyon is available below.

Mouse controls:

Mouse left/right roll left/right
Mouse up/down pitch down/up
Left button click slow down
Right button click speed up

Keyboard controls:

a Toggle afterburner
f Toggle fillets
g Toggle fog
q Quit
r Reset simulator
t Toggle control panel
z Toggle between java.nio direct buffers and JDK 1.3-era JNI calls

1. Introduction

Before JDK 1.4, the only mechanisms for transferring large amounts of data in to and out of the Java platform were the JNI Get<PrimitiveType>ArrayElements and GetPrimitiveArrayCritical entry points. The implementation of Get<PrimitiveType>ArrayElements either incurs a memory allocation and data copy, which is expensive, or requires pinning support in the garbage collector, which is difficult or impossible depending on the garbage collection algorithm used by the Java virtual machine implementation (JVM). GetPrimitiveArrayCritical is fast but has severe limitations on how it can be used which make it unacceptable for many kinds of applications.

1.1. java.nio Direct Buffers

JDK 1.4 introduces the concept of direct buffers in the new java.nio package (JSR-000051). Direct buffers provide a safe platform-, vendor-, and JVM-independent zero-copy mechanism that have improved I/O throughput significantly in JDK 1.4. They are applicable to more domains than just I/O; any high-bandwidth application like I/O, 2D, 3D, database access, and inter-language interoperability can potentially benefit from direct buffers. The Java HotSpot Client and Server VMs have been optimized to generate very efficient code for buffer operations.

The jcanyon demonstration is a simple flight simulator which visualizes a large data set (roughly three hundred megabytes in size) using a slightly modified version of the free software OpenGL binding "OpenGL for Java" with a few minor additions to support java.nio direct buffers for holding geometric data. JCanyon is intended solely as a technology demonstration. Standard Java programming language extensions which operate on large amounts of data, such as Java3D and Java Advanced Imaging, will be able to take similar advantage of the java.nio features once suitable additions have been made to their APIs.

JCanyon uses a multiresolution rendering algorithm to avoid holding the entire data set in memory at once, and uses memory mapped files, accessed via java.nio, to minimize data copying and heap size. The innermost rendering loop decimates the geometric data at run time while sending it to the graphics card using java.nio direct buffers. The entire application, including the physically-accurate flight model and rendering loop, is written in the Java programming language. No native code aside from the OpenGL binding is required or used. The simulator runs at real-time rates using the Java HotSpot Client VM on a single-CPU 750 MHz Pentium III PC with an NVIDIA GeForce or GeForce 2 graphics card.

The rest of this paper is organized as follows. Section 2 gives an overview of the jcanyon demonstration, its use of java.nio, and a description of the inner rendering loop. Section 3 provides details on the multiresolution algorithm and the flight dynamics. Section 4 contains answers to frequently asked questions. Section 5 provides links to the complete source code of the jcanyon demonstration. Section 6 contains acknowledgments, Section 7 the alias for feedback, and Section 8 references for further reading.

2. JCanyon Overview

The jcanyon demonstration uses java.nio in two ways; first, it takes advantage of the new file I/O facilities to minimize data copying when reading data from the disk, and second, it uses java.nio direct buffers to send data to the graphics card via OpenGL.

2.1. Use of New File I/O

The data set which jcanyon renders is comprised of roughly 100 MB (uncompressed) of elevation data, which provides the shape of the canyon and mountains, and 200 MB of satellite imagery, which provides the surface color. The data is too large to fit in RAM on most machines; a multiresolution algorithm, described in Section 3.1, is used to shrink the data down to a manageable size. Memory-mapped file I/O is used to transfer both the elevation and image data into the application. The elevation data is currently always mapped in at the highest resolution and is decimated at run time. The image data is downsampled into pyramids ahead of time and only the data of the needed resolution is mapped into memory at any given time. Imagery is fetched from the disk in a background thread and is scanned by that thread to ensure the data is in the disk cache before it becomes accessible to the main thread; this reduces the pauses associated with fetching data from the disk, especially on multi-CPU systems.

2.2. Rendering Loop

The inner rendering loop strides through the geometric data for a portion of the data set and assembles 3D points, as well as references to the 2D imagery in the form of texture coordinates, into a large buffer containing single-precision floating-point values. This buffer is then transferred to the graphics card for rendering. In order for OpenGL to render the buffer, it requires a persistent pointer to the buffer's starting address.

There are two versions of the inner rendering loop; one uses a java.nio direct buffer to hold the data to be sent to OpenGL, and the other uses a standard Java programming language float[]. When using java.nio, since the buffer is "direct", it is not managed by the Java garbage-collected heap and therefore a pointer to its data can be safely passed to OpenGL using the JNI entry point GetDirectBufferAddress, which is new in 1.4. Because of the semantics of the OpenGL APIs, the only correct method of sending data down to the card from a Java programming language float[] is to either use Get<PrimitiveType>ArrayElements (which requires a memory allocation and data copy on any JVM which does not support pinning) or to use GetPrimitiveArrayCritical and copy the data manually; the native code associated with this demonstration uses the latter technique to avoid repeated memory allocations. While the need to perform a data copy is in some sense an artifact of the OpenGL APIs, there are many domains, as described in the introduction, in which an intervening data copy has been necessary up until JDK 1.4. Such a data copy can easily be the bottleneck in an application.

This demonstration was originally intended to use the NVIDIA GL_NV_vertex_array_range (VAR) extension to OpenGL for high-throughput rendering of the terrain mesh. However, after implementing the terrain engine as well as supporting code to allow the use of the VAR extension, it was discovered that enabling VAR caused a slowdown when the polygon count went up; more work is needed to resolve this performance penalty. In the meantime, VAR can be enabled by passing in the -nv command line option to DBFlyer when running on a GeForce-based card with the latest Detonator drivers.

3. Algorithm Details

3.1. Multiresolution Algorithm

The terrain renderer uses a variant of the algorithm described in [LKR96]. The data set is subdivided into roughly 200 tiles and measures 13 by 15 tiles. Each tile contains both elevation and image data. At the highest resolution, each tile measures 513x513 elevation samples and 512x512 image samples. Tiles are downsampled into pyramids by recursively dropping every other sample. Image pyramids for texture mapping are created off-line. The geometric data is decimated at run time by iterating through the highest-resolution data with an appropriate stride.

Maximum deltas as described in [LKR96] are computed off-line for each level of the elevation data pyramid. These deltas are defined as the maximum world-coordinate error incurred in the vertical direction by rendering a given tile at the next-lower resolution. The level-of-detail (LOD) selection algorithm is somewhat dependent on the deltas being monotonically increasing, so a quadratic curve is fitted to these deltas and they are subsequently resampled. However, this technique is not currently guaranteed to produce monotonically increasing deltas and, in fact, several tiles in the database produce deltas which are not increasing. The resulting visible effects have not yet been characterized.

During rendering of each frame, the position of the camera is compared to the centroid of each visible tile. The maximum screen-space error is computed and compared against the precomputed maximum deltas; the appropriate decimation level is chosen to not exceed the current error. Thus, LOD selection is performed for each tile every frame and each tile's LOD is independent.

The two primary contributions of jcanyon's rendering algorithm are its texture selection technique and a method for eliminating cracks in the surface called filleting.

3.1.1. Texture selection

It was observed that using the same algorithm for texture LOD as is used for geometry LOD caused unacceptable visual artifacts, as too-coarse textures were frequently selected. Synthetic "deltas" for the textures were created and the texture LOD selection decoupled from the geometry, but tuning of the texture deltas did not yield acceptable results; among the reasons for this was a propensity of the LOD algorithm to incorrectly shift to too-coarse LODs for tiles directly underneath the camera. Instead, a new algorithm was devised for selection of texture LODs. This algorithm is an approximation to screen-space error for images, and takes into account that the visible area of the texture decreases first as the tile moves further away from the camera, and second as the angle between the normal vector of the tile and the forward vector of the camera approaches 90 degrees. It attempts to select an LOD for the texture at which magnification of texels will not be visible.

First, the relative areas of the downsampled tiles' textures are computed ahead-of-time. The zeroth (highest resolution) level has a relative area of one; the first, a relative area of 1/4; the third, a relative area of 1/16; and so on. The distance between the camera and the centroid of the tile is computed; the screen-space area of the rendered tile decreases as 1 / distance2. The current camera parameters are consulted to discover the distance from the camera at which the tile precisely fills the screen either horizontally or vertically; currently the system assumes that the window is either square or wider than it is tall. This "filling distance" is squared and divided by the square of the camera distance. The result is an approximation to the area, relative to the highest resolution version of the data, which the tile fills in the output window.

The second factor considered is the angle at which the camera views the tile. The dot product of the camera's forward vector in world coordinates is taken with the world-coordinate normal vector of the tile. It was observed that using the square of the cosine of this angle yields visually better results than simply the cosine, and that weighting the cosine term too heavily when the tile is too close to the camera yields visible artifacts. A linear ramp is therefore used to weight the distance from the camera more when the tile is close to the near clipping plane and to weight the angle the tile makes with the camera more when the tile is close to the far clipping plane.

These two factors are combined to compute the relative area the tile currently makes in the field of view; see Equation 1. The precomputed relative areas are searched to find the appropriate level at which the texture should be rendered.

relArea = (1.0f - cos2Alpha) * ooLength2 + (cos2Alpha * dotp * dotp) * ooLength2
where
ooLength2 = (dist0 * dist0) / diffLen2

dist0 = distance from eye point in world-coordinate units when a face-on tile fills the window completely either horizontally or vertically

diffLen2 = squared length of vector (eye_position - tile_position)

cos2Alpha = (diffLen2 - dist0 * dist0) / (farClipDist * farClipDist - dist0 * dist0)

dotp = abs(dot(camera_forward_vector, tile_normal_vector))
Equation 1. Computation of relative area of current tile.

This image level selection algorithm is completely automatic and does not require any tunable parameters. Further, it has been observed to select appropriate levels of texture detail for large terrain data sets. An attempt was made to replace the geometry LOD selection algorithm from [LKR96] with the abovementioned texture LOD algorithm; this did not work well, as the geometry was in general selected at too high a level of detail, causing the frame rate to drop. Further work is needed to address the degenerate cases of the geometry LOD algorithm.

3.1.2. Fillets

Previous terrain rendering algorithms, for example [LKR96] and [DWS97], have been global; dependencies propagated between tiles of the terrain to generate a continuous mesh to avoid visible cracks between adjacent tiles rendered at differing levels of detail. The recording and processing of these dependencies is extremely CPU-intensive, as it requires many or all vertices of the mesh to be processed using binary tree or quadtree algorithms.

An alternative solution was sought that would enable completely independent ("local") processing of tiles in the terrain database. Observation of the behavior of the system indicated that cracks in the terrain were usually a minor feature. It appeared that one could eliminate them by applying an image processing filter to fill in the gaps by interpolating along vertical segments that remained unfilled by the core rendering algorithm. However, reading the image back from the frame buffer to perform such processing is computationally expensive.

A scheme called filleting was devised to solve the problem geometrically. Figure 1 shows one tile from the Grand Canyon data set. A band of polygons is drawn around the edge of each tile which stretches down to the minimum elevation in the database. Each side of this band is textured with the appropriate line of the edge texels. This band covers the space, if any, between adjacent tiles, allowing them to be rendered independently and avoiding global mesh algorithms. In practice, only the forward-facing fillets need be drawn. Figure 2 compares the mesh output with and without fillets. With fillets, there are effectively no visible artifacts, especially when the terrain is in motion. In our application, fillets added roughly 4% to the rendering time, which is relatively inexpensive compared to performing complex selection algorithms on a per-vertex basis. While popping is occasionally visible in the mesh since no vertex morphing is performed, this appears to be largely caused by degenerate behavior of the geometry LOD algorithm and we hope to correct this defect in the future without resorting to per-vertex algorithms.


Figure 1. Relationship of fillets to the tile surface. Click for larger image.


Figure 2. Surface without and with fillets. Click for larger image.

3.2. Flight Dynamics

The flight dynamics are a port of Prof. Richard Murray's public domain simplified F-16 model from Fortran to the Java programming language. An option has been added to allow the angle-of-attack and sideslip aerodynamic forces to be disabled, improving the simulator's stability at the expense of realism. This option is enabled in the jcanyon demonstration.

4. Frequently Asked Questions

  • Q. Isn't Java slow?

    A. No.

  • Q. Is the comparison with earlier JNI calls really fair?

    A. The rendering loop which uses pre-1.4 JNI calls has not been optimized, and performs some unnecessary data copying. Based on earlier experiments, optimization could probably bring the pre-1.4 version of the inner rendering loop within a factor of three of the 1.4 version rather than the factor of forty slower it currently is.

5. Source Code

The source code for the jcanyon demonstration is available in two formats:

The source code is covered by the modified BSD license (the version without the advertising clause); see the enclosed documentation for details.

To reduce the download size of the jcanyon sources, the source code for the version of "OpenGL for Java" which jcanyon uses is provided separately. The modifications that were made to these sources do not include the java.nio integration; that is included in the hiperf package in the jcanyon sources above. Rather, the changes made to OpenGL for Java were minor additions to the GLCanvas and GLAnimCanvas classes to support more-efficient OpenGL context handling, and have already been checked in to the OpenGL for Java source tree. The standard distribution is available from http://www.jausoft.com/ and is covered by the GNU Library General Public License. (Three bundles are provided to ensure that the version for the specified operating system has been built and tested.)

6. Acknowledgments

  • Sven Goethel, author of "OpenGL for Java"
  • Prof. Richard Murray
  • Rene W. Schmidt
  • Benjamin Sleeter, US Geological Survey
  • Maintainers of aria.arizona.edu
  • US Geological Survey and EROS Data Center in Sioux Falls, SD

7. Feedback

Please direct all feedback to jcanyon-feedback@eng.sun.com. While we will read all email sent, we can not guarantee a reply.

8. References

[LKR96] P. Lindstrom, D. Koller, W. Ribarsky, L. Hodges, N. Faust, and G. Turner. "Real-time, continuous level of detail rendering of height fields", in proceedings of SIGGRAPH 96, August 4-9, 1996, pp. 109-118.

[DWS97] M. Duchaineau, M. Wolinsky, D. Sigeti, M. Miller, C. Aldrich, and M. Mineev-Weinstein. "Roaming terrain: real-time optimally adapting meshes", IEEE Visualization 97, November 1997, pp. 81-88.

9. Trademarks

Java, HotSpot, Solaris, Sun, and Sun Microsystems are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries.

NVIDIA and GeForce are registered trademarks or trademarks of NVIDIA Corporation in the United States and/or other countries.

OpenGL is a registered trademark of Silicon Graphics, Inc.

All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the United States and other countries.

Windows is a registered trademark of Microsoft Corporation in the United States and/or other countries.

Related Links