Sun Java Solaris Communities My SDN Account Join SDN
 
Article

Ask Espresso Man

 
 

Articles Index


 

The Java Developer Connection welcomes Espresso Man and Little Grasshopper whose Q & A sessions have been a long-running feature in Sun publications.

In this session, they discuss the more important implications of the sudden emergence of the eXtensible Markup Language (XML) and the tremendous opportunities this provides to developers of Java applications.

 

 

Q. XML? Isn't that an extension to HTML?

A. Well Little Grasshopper, the two languages certainly do have the same "look and feel".

Actually HTML is an application of Standard Generalized Markup Language (SGML) technology to a specific problem space (document presentation), whereas XML is a subset (or simplification) of SGML itself, adopted for the Web. Aside from its obvious document markup capabilities, XML is also fast becoming the standard for specifying business to business (B2B) data interchange across the Internet.

Q. Why XML and not HTML?

A. As a result of sharing a common heritage, both XML and HTML use tag names to "tag" text strings, and both surround these tag names with angle brackets (<...>). However HTML tag names are limited to a predefined set and are primarily used to indicate how the enclosed text data is to be DISPLAYED. The set of XML tag names on the other hand is unbounded, and they are used to indicate what the enclosed data MEANS.

So for example, while a typical HTML tag might be a command to the presentation layer to "display the enclosed data in bold font:"

  <b>John Jones 1234</b>

the equivalent XML tags might be commands to the message recipient to "treat this data as the customer name and ID":

  <cname>John Jones</cname>
<cid DataType=int>1234</cid>

As a result, XML tag names define and identify data fields in an XML message in the same way that a schema does for a database, except that with XML, these tag names are carried along as part of the message data itself.

Q. So XML messages are self descriptive! But what good does transmitting a tag name to identify a datafield do, unless the recipient was already expecting to receive this data?

A. You raise an important point. Before a pair of applications can exchange and correctly interpret a set of XML data messages, they must first agree on the type of data that the message will contain, and the tag names used to identify this data. In an exactly analogous manner, two CORBA programs must first agree on the interface to a service before the client can invoke methods on the server object that implements that interface.

In the case of an XML message, transmitter/recipient agreement is achieved via publicizing a message schema, typically defined according to one of two widely recognized standards:

  • Document Type Definition (DTD)

  • XML Schema

Either schema can be located in the front of the XML message, or referenced from an external location commonly identified via a URL embedded within the message itself. This allows the recipient to automatically "validate" that the actual contents of the XML message are as promised.

In the case of DTD's for example, the following lines would specify the presence of the XML tag names above:

  <!ELEMENT cname (#PCDATA)>
  <!ELEMENT cid   (#PCDATA)>
  <!ATTLIST cid
	DataType CDATA #REQUIRED>

XML Schema is an improvement in that:

  • Unlike a DTD, an XML Schema is itself an XML document. This allows creation of "meta" XML Schema specifications (which define other XML Schemas rather than XML messages).
  • An XML Schema allows enclosed text to represent simple data types other than strings (example: integer customer IDs). These can then be automatically interpreted and verified for the message recipient, whereas with DTDs, such verification must be explicitly provided by the recipient code.

The important point is that whether an XML Schema or a DTD is used to specify the data fields in an XML message, the actual contents of the resulting message are identical.

Q. Sounds confusing! You have to have separate schemas defined for each kind of XML message exchanged by two applications, and the schemas can be scattered all over the web. Is this really useful?

A. Extremely useful, oh dubious one! Such a collection of schemas might define the complete set of messages exchanged by subsystems within a typical application space (example: Retail, Education, Hospitality), and thus constitute an XML standard for an entire industry. Rather than being scattered therefore, the entire collection of message schemas are usually gathered together and then published and maintained on a web site owned by the industry trade group that developed the standard.

There are two other places such a standard might be published:

  1. www.biztalk.org

    Biztalk is a Microsoft hosted website that imposes several proprietary constraints on all submitted XML message schemas. These include the requirement to first replace all DTD or XML Schema message specifications with a nonstandard, Biztalk-only equivalent (XML-Data).

  2. www.xml.org

    This site is hosted by the OASIS organization, a non profit consortium whose membership list includes Sun, IBM, Oracle, SAP and Microsoft.

    xml.org registers and publicizes both DTDs and XML Schema message specifications as a way to promote standardization of non proprietary XML applications in electronic commerce.

Q. So an industry standard body simply defines a collection of XML message schemas, registers them with say xml.org, and then any two industry applications can use this new standard to intercommunicate, despite the language they are written in, and the operating system and hardware they are deployed on. Wow!

A. Ah ... it's not quite that simple. While an XML message is universally interpretable, taken by itself it doesn't guarantee a "platform-neutral" wire. To do that, XML needs to be coupled with a platform-neutral transport protocol, for the same reason that HTML has to be coupled with HTTP before it can be received and interpreted by any browser connected to the Internet.

Q. A "platform-neutral wire"?

A. A platform-neutral wire is one that does not impose proprietary restrictions on those applications that use it to intercommunicate. To create such a wire, an industry XML standard must specify that:

  • All data is exchanged via a set of predefined XML messages.

  • The underlying transport protocol is non proprietary and widely supported.

Only by specifying BOTH the XML message formats AND the underlying transport protocol, can the standard ensure that all compliant applications will share information seamlessly, despite (as you noted) having possibly been written in two different languages and run on top of two different operating systems.

This in turn guarantees end users who deploy such standard-conformant applications across a "platform-neutral wire", that they:

  • Retain the freedom to select "best of breed" hardware, operating system and middleware, without having these choices mandated by a requirement to first deploy a vendor-specific transport protocol.
  • Can integrate any of these solutions directly into their enterprise, layered on top of their existing legacy software and hardware platforms.

Q. So what are the logical choices for the transport protocol of such a "platform-neutral wire" industry XML standard?

A. There are actually several candidates for such a protocol. As we shall see, while each has its strengths and weaknesses, one clear choice does emerge.

1. IIOP

This is the transport protocol underlying the CORBA architecture. It is independent of both the language a communicating application is written in, and the operating systems that such an application runs on.

Requiring all XML messages to be sent over IIOP will create a level playing field for competing system suppliers, thereby increasing the end-user's choice of suppliers.

IIOP is however an object transport protocol and as such, imposes a fair amount of design conformance between communicating applications. In addition, IIOP suffers from the following restrictions:

  • It is designed more for a LAN than the Internet, and as a result does not easily operateacross institutional firewalls.
  • Most legacy applications are message rather than object based, and as a result, will not easily adopt to support an object protocol.

Additionally, the goal of an XML standard is to provide "loose coupling" between applications developed and deployed by different organizations, particularly where these organizations are communicating across the Internet. Therefore using the richness of an object transport to convey what are essentially a series of text messages does not represent an optimal solution (see Table 1).

2. Proprietary Message Service (MOM)

Incorporating the use of a commercially available message oriented middleware (MOM) product such as JMQ, MSMQ, MQSeries or objectEvents in a standard can provide significant additional XML messaging functionality for the developers of the standard.

However such a solution suffers from the following restrictions:

  • As with IIOP, MOM products are designed more for a Corporate network than the Internet, and as a result do not easily operate across institutional firewalls.
  • They are also proprietary in the sense that the functionality provided by MOM products is often achieved through a private lower-level protocol that is owned and controlled by a single company, and which is often difficult if not impossible for other vendors to connect to.

Specifying a MOM product as the transport protocol for XML data messages thus destroys the level playing field that is essential for wide adoption of any standard, because it anoints one vendor as being "more equal than others" (see Table 2).

Such a situation sets the stage for the fragmentation of the market between conformant (but incompatible) solutions based around different message-service providers. This is EXACTLY what most XML industry standards are designed to prevent!

3. HTTP/HTTPS

HTTP is a message-oriented rather than an object-oriented transport protocol, and so does not dictate any design constraints on the applications that use it. HTTP is also:

  • Both O/S and language independent
  • The transport layer over which the vast majority of Internet messages are conveyed today. It can leap most corporate firewalls at a single bound!
  • Currently supported by every existing Web Browser and Web Server
  • Used (and understood) by a wide range of existing applications.
  • Available in a "secure (encrypted) session" version (HTTPS)

As a result, requiring XML to be sent over HTTP/HTTPS offers clear advantages if the primary goal of the standard is the interoperability of all compliant applications, because it enforces a truly platform-neutral wire between them. The School Interoperability Framework (SIF) XML standard for Education, grades K-12 now mandates use of HTTPS for just these reasons. See: Schools Interoperability Framework

Q. But doesn't HTTP provide a "lower level" protocol than either of the other two? How is the missing functionality supplied?

A. It is true that such messaging features as encryption, guaranteed error-free delivery, authentication, event publish/subscribe, automatic message queueing, and support for disconnected operations are all important to provide for an XML-based standard, and an application developer should be shielded from as much of the detail as possible.

When HTTP/HTTPS is used as the underlying XML transport, it must either provide these features directly or they must somehow be supplied by data fields within the XML, usually in a "header" placed in front of all messages defined by the standard.

The following functionality can be provided by HTTPS:

  • Message encryption (the ability of two communicating applications to prevent outside parties from correctly interpreting any of the data they are exchanging).
  • Guaranteed error-free delivery. This requires use of HTTPS's "Post" message, which conveys back success or failure within the same HTTPS connection as the original message.

The remaining messaging functionality is normally provided by a MOM product located at the destination, which relies on the "header" XML fields in the arriving message. In other words, once the XML data arrives over the platform-neutral wire, it is immediately passed along to platform-specific messaging middleware, which supplies the missing functionality.

As an example, the single XML element:

 <Authentication Type=X.509> Shhhhhh 
 
</Authentication>

is used by the SIF standard to optionally convey a digital signature for the XML message, which can provide the receiver with absolute assurances of:

  • Authentication (Is sender who she claims to be?)

  • Authorization (Is sender allowed to do what she requested?)

  • Data Integrity (Has any part of this message been altered?)

Though proprietary middleware at the destination may be supplying key messaging functionality, this fact remains TRANSPARENT to the sender, because the wire itself is platform-neutral.

Q. So XML over HTTP/HTTPS can provide a feature rich platform-neutral wire. But you are Espresso Man! How can we have made our way through an entire "Ask Espresso Man" column without once mentioning "Java?"

A. We can't and we won't. But prior to explaining why the Java platform is the best one on which to implement XML, we had to provide some background on XML first.

Now consider how far we have come. We have demonstrated how defining an industry XML standard around a platform-neutral wire (XML/HTTP) allows for any two compliant subsystems to interconnect despite the OS and messaging middleware they use. Now let's consider the advantages from the developer's viewpoint.

Implementing such a subsystem using Visual Basic or C++ components constricts the eventual product deployment to a single OS and a single messaging service.

Writing an equivalent set of components in the Java programming language provides the ability to deploy the resulting product on any OS. Further, by conforming to the Java Messaging Service (JMS) API, these components will be able to transparently utilize a wide range of existing MOM products, providing a major competitive advantage for the Java-based products. See: Vendors

Finally, packaging these Java components as Enterprise JavaBeans (EJBs) enables the developer to reduce complexity by simplifying most multi-threaded, transactional and persistence issues. So an additional benefit for the EJB developer is a reduction in "time to market".

Q.What about the end user? What's the value proposition there?

A. The end user gets to deploy simplified platform-neutral EJB components across a platform-neutral wire. This represents the first true realization of "plug & play" components for Business to Business (B2B) Internet applications, and is the ONLY way to deliver on XML's promise of universal interoperability.

Therefore you can expect to see such EJB components appearing soon, for use in a variety of vertical industries. Several industries. Very soon.

Table 1:

Application Interaction Standards: Objects vs. Methods
ComparitorCORBA / COM+
(Object-Based)
XML(Message-Based)
Data Source:ClientTransmitter - Publisher
Data Destination:ServiceReceiver - Subscriber
On-the-wire Data:Object method to invoke & method argumentsDocument - Event
Schema:Interface definition language (IDL)Document Type Definition(DTD), XML Schema
Standards Organization:OMG (CORBA)
Microsoft (COM+)
W3C
Schema Registration:OMG (CORBA)
Microsoft (COM+)
OASIS (xml.org)
Microsoft (biztalk.org)
Application Coupling:CloseLoose
Transport Layer:IIOP (CORBA)
DCOM (COM+)
HTTP/HTTPS or (MOM)
Message-oriented middleware
Typical Topology:IntranetIntranet / Extranet / Internet
Typical Deployment:Corporate "internal"Business to business (B2B)
Enables:Distributed enterpriseDistributed trading partners

Table 2:

XML Transport Protocol
ComparitorMicrosoft MOM(MSMQ)CORBA based MOM
(JMQ, MQSeries, ...)
HTTP
Standards OrgNone (Microsoft)OMGIETF
Application APIMSMQ (Microsoft)Java Messaging Service (JMS)HTTP
Operating System NeutralityNoYesYes
Message Server NeutralityNoLimitedYes
Web Server AvailabilitySomeSomeAll
Internet Capabilities LimitedLimitedComplete
Naming Service InterfaceActive DirectoryJNDI (LDAP, DNS,NDS, NIS, ...)DNS
Message SecurityProvidedProvidedExternal
SupplierMicrosoft onlyMultiple MOM vendorsAny vendor

Coffecup Logo

About the Author

Ron Kleinman is the Chief Technical Evangelist for Sun Developer Relations, and serves as Sun's representative on multiple industry-wide Java and XML standards committees. He has extensive experience consulting with developers who are trying to "java-tize" their existing applications. He has prepared and delivered numerous presentations on Java technologies both in the U.S and overseas. His particular areas of expertise include Java on the Server (EJBs and server-side APIs), Jini, Java-based device access control and management, and more recently, XML.