Sun Java Solaris Communities My SDN Account Join SDN
 
Article

Exploring the New Frontier: Java Technology Powers the "Post-Genomic" Era

 

by Steve Meloan

RELATED STORY: Exploring the New Frontier, Part 1

October 9, 2001 -- While the mapping of the human genome was officially completed in June of 2000, those engaged in the fields of computational biology and bioinformatics know that the real work has only just begun. A rough draft of the sequences contained within human DNA has now been completed. But the next step is one of meaningfully interpreting this massive volume of information--a task that amounts to deciphering a text of three billion characters, in a language that is only passably understood (even by molecular biologists), and without the aid of spaces or punctuation.

In short, genomics, and the companion study of proteomics--the mapping and understanding of the proteins coded for in a given genome--present genetic researchers with computational tasks of scales never before seen. Without the aid of massive databases, and networked computational systems, it would effectively be impossible to process and interpret the avalanche of biological data now being generated on an almost daily basis. And with genomic research centers scattered across the globe, in both private and academic settings, using a myriad of different hardware and operating systems (on everything from desktops to super computers), the secure, network-aware, cross-platform power of Java technology is increasingly proving indispensable to this ongoing task of reverse-engineering our inner-workings.

Physiome Sciences, Inc.'s computer-based biological simulation technologies, and Bioinformatics Solutions Inc.'s PatternHunter, a genomic search and analysis facility, are just two examples of the growing adoption of Java technology in the fields of bioinformatics and computational biology.

Physiome logo

Part 2: Physiome Sciences, Inc.

Physiome Sciences, formed in 1995, is a privately held company focused on enabling pharmaceutical firms to better develop drugs through the use of computer-based biological simulations. The avalanche of genomic, proteomic, and other biological research data in the past decade has now reached a critical mass of knowledge sufficient to simulate many biological functions "in-silico"--on computers. "The ever-increasing amounts of data about gene expressions, pathways, tissues, and cells is now sufficient for building innovative, predictive models," says Jeremy Levin, CEO of Physiome Sciences.

Biological simulations have become all the more important in an era of more precise, sophisticated, and targeted drugs. As seen with the advent of protease inhibitors in the battle against AIDS--which are drugs targeted at specific and narrow biochemical pathways--pharmaceutical development has become far more designer oriented. And as a result, there is considerably increased cost in the development process, as well as danger in the human testing phases. Pharmaceutical researchers spent just $3 billion dollars on drug development in 1980, according to the Biotechnology Industry Organization. But by 2000, that figure had ballooned to $23 billion. Yet only $2.4 billion of that money was spent on drugs that made it successfully through FDA clinical trials. "The industry is built on failure," says Levin. Clearly, a means of speeding drug development, as well as making it safer, is needed. And for Physiome Sciences, that means is "in-silico" simulations.

Enter Java Technology

The field of computational biology is often hampered by the use of diverse computer languages, hardware systems, and data formats, making it difficult for scientists from around the world to effectively share information, findings, and computational facilities. It is for this reason that Physiome Sciences' technology solutions are built almost entirely using the scalable, cross-platform, network-aware power of Java technology. "The company was founded by Professor Dennis Noble of Oxford University, Professor Peter Hunter of the University of Aukland, and Professor Rai Winslow of Johns Hopkins University," says Dr. Scott Lett, Distinguished Computational Scientist for Physiome Sciences. "The technologies at those different centers were very disparate--Linux desktop machines and super computers at one, IBM super computers and PCs at another, and at a third, mainly Macintosh systems."

After initial attempts at developing cell-modeling systems in C++, Physiome Sciences turned to Java technology. "We made use of Java technology's native ability to run on all of these different platforms, as well as its facility to communicate across networks," says Lett. "And we used XML to define the mathematical models, so that we could translate them on any platform. As a result, we were able to get real, working systems up and running in less than a year's time. We would still be programming if we were trying to write this in any other language."

In Silico Cell Architecture, CellML Modeling Language, PathwayPrism Technology Platform, CardioPrism Technology Platform

Physiome Sciences' Java technology based solutions include tools, application frameworks, and complex databases, all of which are licensed to pharmaceutical drug developers. Underlying the company's product offerings, is the core technology of In Silico Cell architecture, which supports the hierarchical modeling of biological systems, and the creation of more complex models from simpler ones.

Physiome Data Sources
In Silico Cell technology allows biotechnology and pharmaceutical researchers to rapidly evaluate and generate research data using computer-based models of cells, tissues, and pathways.
In Silico Cell allows researchers to interface with the technology in a fashion that is most intuitive to their particular scientific background. "If that background is mathematical, then they can write text in equations," says Lett. "In Silico Cell will translate that into numerics, tie it to the graphics, and actually solve the problem for them." Meanwhile, for researchers that might not normally work with mathematics, they can draw diagrams of their pathway functions using a modern software tool similar to PowerPoint. "Those diagrams are turned into mathematics, and then simulations, right before their eyes," says Lett. And users can also access and alter the mathematics underlying a given simulation model. "People who are more mathematically oriented may want to work with the equations," says Lett. "Meanwhile, if you change the math, it then changes the diagrams as well."

The In Silico Cell modeling process is further facilitated using the CellML modeling language, a free, open standard, XML-based modeling language for describing biological processes at the cellular and subcellular levels. "CellML provides researchers with the ability to integrate biological models, experimental data, and text documents, in a platform-independent and web-accessible way," says Lett. "It even allows researchers to share models if they are using different model building software."

The CellML technology is a collaborative effort between Physiome Sciences and the University of Aukland, with a scientific advisory board from such prestige institutions as the University of Cambridge, Cornell University, and Columbia University. CellML works in conjunction with the companion technologies of AnatML and FieldML, which were developed at the University of Aukland for describing anatomical data and the distribution of biological properties in three dimensions, respectively. Together, these XML-base technologies provide a complete vocabulary for describing "virtual" biological systems--from the subcellular to the organism level.

Sitting on top of In Silico Cell is Physiome Sciences' PathwayPrism technology platform. PathwayPrism facilitates the mapping, modeling, and simulation of molecular interactions and biochemical pathways in cell signaling--the chemical means by which a cell's processing and behavior is effected. Such models provide researchers with new insights, and allow them to rapidly test new pharmaceutical hypotheses. And because Physiome Sciences' technology platform is network-aware, PathwayPrism is web-enabled, providing researchers with the facility to share models, as well as the ability to integrate data, collaboratively modify and update pathways, and access public databases. "PathwayPrism can construct pathways using a variety of qualitative or quantitative data," says Lett, "everything from 2D gels, fluorescence data, protein affinity studies, and enzyme kinetics."

PathwayPrism provides the capability to map, analyze and simulate molecular interactions in cellular pathways.

The company's CardioPrism technology platform also sits functionally on top of In Silico Cell, and contains a customizable suite of cardiac models to assess and predict drug-induced repolarization abnormalities. Many pharmaceutical drugs can affect the heart's electrical cycle, causing sometimes-fatal arrhythmias. CardioPrism has since been verified in its accuracy by simulating the effects of numerous repolarization impacting drugs, and has even retroactively predicted gender-specific effects that resulted in deaths during phase III clinical trials of a drug back in 1996. "CardioPrism could have saved lives," says Levin.

Together, the component driven design of the Physiome Sciences platform allows for the development of predictive models--beginning at the level of biochemical pathways, to cells, to tissues, and on up to living systems. And by being network enabled, multiple users, in different locations, can share data and more effectively work together. Researchers are able to view and edit underlying mathematical equations, create and merge complex pathway maps, access in-house bioinformatics data, and link to external databases via the Internet--thus speeding understanding, and drug discovery.

CardioPrism is a unique suite of customizable cardiac cell models designed to assist in assessing the risk of adverse cardiac effects during drug development.

Meeting Critical Needs

Java technology has allowed Physiome Sciences' software engineers to provide worldwide connectivity, seamless database access, rich scientific data visualization, data security, and XML-based data portability. "All of our desktop applications make use of the Swing framework for the controls and the graphical user interfaces," says Lett. "We also make use of the Java 2D API graphics environment for visualization of bio-chemical signaling pathways. And we use several other of the visualization classes, like JTree, to organize the material for easy viewing. Things that, in previous efforts at Oxford University, took a year, took us a couple of weeks using Java technology."

Meanwhile, JDBC technology allows the company to hide differences between databases from scientists who use the software. And the availability of XML parsing technology for Java technology based programs has allowed Physiome Sciences to create a structure by which users can define the mathematical model of a cell without knowing any programming, and have that definition be portable to different environments--and even to other programming languages. "The fact that there is so much in XML technology--the libraries that are available from different sources for Java technology--made it possible for us to get a lot done without having to write very much of it ourselves," says Lett.

Java technology has also proven vital in terms of communicating with legacy systems at the various national laboratories that lie at the heart of the work being done in post-genomic research. "We use CORBA technology to communicate with various legacy super computing applications," says Lett. "And we make use of the Java Native Interface (JNI), in the small number of cases where certain numerics libraries are available only in a particular language. In many languages, we would have had to rewrite the library code into that language.

And finally, on the hardware front, Physiome Sciences considers Sun as the preferred vendor for its bio databases, having chosen Sun Enterprise 3500 and 4500 servers, running Solaris operating environment, and Oracle, for its ever-growing and mission-critical database needs.

The Future

Java technology appears to be riding a wave of increasing adoption in the biotechnology realm. The need for scalability, cross-platform compatibility, network awareness, and security, all call out to Java technology's core strengths. And now, even processing capability is increasingly being added to that list. "Bioinformatics is all about processing bigger and bigger data," says Lett. "And Java technology's increasing performance abilities, in terms of being able to deal with large amounts of data, has really helped us out. Sun, and the JavaGrande group, have both contributed to making Java technology better and better for scientific computing." It was recently estimated that accurate modeling of heart function--including information down to the genomic level--would require on the order of ten to the 16th unknowns to be solved. "That means that we are going to have to tap into super computing capabilities," says Lett. "But such companies are increasingly making high-performance Java technology capabilities available to us, so it's very conceivable that the entire solution can be done using Java technology."

Meanwhile, Java technology's strengths also fit well with the overall evolutionary direction of the biotechnology and pharmaceutical industries. "The fact that many of the pharmaceutical companies are the result of mergers of smaller companies points toward a need to make various legacy systems and diverse databases talk to one another," says Lett. "The adoption of Java technology mitigates risk in terms of ever-changing computing and laboratory technology. In our own case, if we'd tried to do this in another language, we could very well still be writing a lot of the code that scientists are now productively using."

For now, Physiome Sciences is forging ahead with ever more sophisticated biological models. "We now have modeling technology that couples the intact heart to the structure of the torso, so that we can even model the propagation of electrical signals emanating from the heart out to the skin--effectively predicting what one's EKG will look like under different circumstances." And the company is currently developing modeling capabilities for simulating lung, bone, kidney, and pancreatic function.

Meanwhile, the company is also busily forming both corporate and private partnerships. Their PACE (Physiome Academic Centers of Excellence) program provides academic collaborators with Physiome Sciences' technology and model building capabilities. And the shared data resulting from these collaborations is expected to grow exponentially. "Once we have the data generated by PACE centers," says Lett, "we will be modeling, storing, and visualizing information at the petabyte (2 to the 50th) level."

Conclusion

In Spring of 2002, representatives of both Physiome Sciences and the development team of PatternHunter are expected to speak at the second "Computational Challenges in the Post-Genomic Age" conference, co-sponsored by Pacific Northwest National Laboratory, the San Diego Supercomputer Center, and Sun Microsystems.

end.

See Also

Exploring the New Frontier, Part 1
(http://java.sun.com/features/2001/09/genome.html)

Physiome Sciences, Inc.
(http://www.physiome.com/)

Multimedia (Flash) Overview of Physiome Sciences' Modeling Technology
(http://www.physiome.com/images/screenshots/modelframe.htm)

CellML.org
(http://www.cellml.org/)

Java Technology & XML
(http://java.sun.com/xml/)

JDBC Data Access API
(http://java.sun.com/products/jdbc/)

Java 2D API
(http://java.sun.com/javase/technologies/desktop/media/2D/)

Java Native Interface
(http://java.sun.com/j2se/1.3/docs/guide/jni/)

Jtree Class
(http://java.sun.com/j2se/1.3/docs/api/javax/swing/JTree.html)

Java Grande Forum
(http://www.javagrande.org/)

"Computational Challenges in the Post-Genomic Age II" Conference
(http://www.sdsc.edu/Workshops/postgenomic/)