IntroductionToday's Internet-driven economy has accelerated users' expectations for unfettered access to information resources and transparent data exchange among applications. This exchange requires a deep knowledge of how each application, tool, or service structures and interprets the information it uses. The term metadata is used to describe both how this information is structured (its syntax) and what it means (its semantics). Unfortunately, most applications define metadata differently. Each uses slightly different programming structures, syntax, and semantics to model their metadata. These incompatibilities make it challenging for one application to discover and interact with the data maintained by another application. To overcome this hurdle, companies typically spend lots of developer time and money building hard-wired interfaces between applications so they can share data in sophisticated ways. This costly and ongoing effort is the result of a lack of common metadata -- or more specifically, a common model for creating metadata (a metamodel). This stifles the development of robust solutions to common business problems that require combining or using data from multiple, heterogeneous applications. Examples of such business problems include:
Difficulties in the management of metadata limit the ability of developers to create integrated, interoperable applications in these and other domains. Standardizing on XML DTDs or XML Schemas, which many industries are attempting to do as a solution to this problem, is insufficient, as they do not have the capability to model complex, semantically rich, hierarchical metadata. This problem becomes more critical as the number and complexity of applications, services, data sources, interchange protocols, and deployment platforms continues to increase in the e-economy. To facilitate metadata interoperability, a host of vendors have teamed up through the Java Community Process (JCP) to provide a platform-independent specification for metadata. It is the Java Metadata Interface (JMI) specification, which is JSR-40 on the JCP website at jcp.org. The JMI specification defines a dynamic, platform-neutral infrastructure that enables the creation, storage, access, discovery, and exchange of metadata using Java interfaces. JMI is based on the Meta Object Facility (MOF) specification from the Object Management Group (OMG), an industry-endorsed standard for metadata management. This paper describes what metadata is, why its management is so important, what is required of a metadata management infrastructure, and how JMI satisfies these requirements. What is Metadata?Metadata can be defined as information about data, or simply data about data. In practice, metadata is what tools, databases, applications and other information services use to define the structure and meaning of their objects, services, and other computing artifacts. Examples of metadata include:
The examples above refer to what is called technical metadata. There is also business metadata (also called process metadata), which is used to capture semantic content about such things as business rules, business nomenclature and business terminology. A glossary is an example of business metadata. Then there are systems that provide a means of notating metadata. XML DTDs are an example of a way to notate and interchange semantic information about structured documents. The Unified Modeling Language (UML) defines a way to formally describe the function, structure, and behavior of a system. We will show later how UML can be used as part of a metadata management infrastructure. Why is Metadata Important?Without metadata, or semantics, one has no way of knowing what a data object represents. The integer value "39" encountered in a program, for example, could mean almost anything. Currently, metadata management is made extremely difficult by the fact that many, perhaps most, of these semantics tend to be embedded in the systems that use them. The lack of a common way to represent and share metadata thus leads to great difficulties in sharing even the simplest data, let alone in the integration of complex applications, components, and systems. People are excited about XML in part because at one level (using data encapsulated within descriptor tags) it appears to solve this problem. However, XML has significant limitations as a general solution to metadata representation and management. The current flurry of activity in various industries to standardize on a set of XML DTDs to represent the semantics of their particular artifacts, while a valuable and worthwhile first step, is really focused at the wrong level of abstraction for enabling true metadata interchange. Where is metadata management required? Well, basically wherever there are significant metadata artifacts to be found. And this turns out to be just about everywhere. Application development tools and frameworks need to manage and understand: models, record definitions, and database definitions. Component-based development environments must deal with a host of interfaces, classes, and components, possibly in different languages. Data warehouses and information portals have to manage data that is organized as tables, columns, schemas, cubes, or flat files in different formats. Finally, comprehensive metadata management is a necessary prerequisite for enabling the kind of services-based architecture that we are all moving towards. You cannot share or use a service unless you understand the semantics of the objects and features that it provides. For example, the higher-level artifacts of ebXML (a standard for enabling e-commerce) such as business objects, business processes, company profiles, and trading partner agreements all need to be described in a platform and implementation-independent way. The reason we point out ebXML is because the people working on it are using the metadata technologies we will be describing to characterize their artifacts.
An Example: Challenges of Metadata Management in Data WarehousingData warehousing applications, because of the variety of interoperational challenges they face, illustrate some of the problems encountered by the lack of a common metadata standard. Data warehousing applications typically deal with many different data sources. Each data source will of course have its own unique metadata (its schema). Not only do data warehousing applications have to integrate these different databases, but the databases themselves usually capture different aspects of the business (that is, not only are the schemas structurally different, they often refer to different things). Further, these databases are often located on different systems that span the company. Additionally, other data sources can contribute, for example, files that have been generated that capture web site click-stream data. Also, applications themselves can be sources of data, such as the output of ERP or CRM systems. All of these different kinds of data need to be Extracted, Transformed, and Loaded into a data warehouse. This is what ETL stands for in the middle box in the figure above. This process is so complex that there are over 250 ETL vendors in the industry today that make their living by writing separate interfaces to each different kind of data source on the input end and each data warehouse on the output end. And any one company that is putting a data warehouse solution in place cannot typically use only one ETL vendor, but needs many to handle their own unique set of data source requirements. Then we even have a problem on the output end, where one would like to take the data that has been distilled into a data warehouse and analyze it with different reporting tools. Of course, these tools themselves also expect their data to be in a certain (different and unique) format. What is Required to Manage Metadata?We have seen that there are tremendous benefits to be had in managing metadata. If one has a standard way to represent and share metadata descriptions, then application-level integration is greatly facilitated. Application-level integration is largely unaddressed by the Java 2 Platform, Enterprise Edition (J2EE) framework, which addresses (and solves) mostly platform-level integration issues. So what is required to describe and manage metadata? The first thing we need is a common, standard way to represent metadata. Another way of saying this is that we need a way to model metadata. A well-defined model will have very precise definitions of what the features and attributes of particular model instances (the metamodels) mean. These precise definitions will then allow us to define exact and unambiguous mappings of the model features to particular languages and interchange formats. Also, this model must be able to describe metadata at different levels of abstraction, for one can always move up and down in meta-levels. One system's data is another system's metadata is another system's meta-metadata. When you think about it, what we really need is a meta-metamodel. In other words, a model for defining metamodels that describe metadata (whew!). In effect, a language for defining metadata at any level of abstraction. This is what the Meta Object Facility (MOF) specification from the Object Management Group (OMG) provides. It defines a small set of concepts (such as package, class, method, attribute, operation, reference, constraint, association, and data type) that allow one to define and manipulate models of metadata. These concepts are described using a subset of the Unified Modeling Language (UML) notation. Since the MOF uses UML notation, it is easy for people familiar with the UML to begin to use it to define and characterize their metadata. The MOF provides an open-ended information modeling capability, providing both a base set of metamodeling constructs that can be used to describe technologies and application domains, and a mapping of those constructs to CORBA IDL (Interface Definition Language) for automatically generating model-specific APIs. The MOF also defines a reflective programming capability that allows for applications to query a model at run time to determine the structure and semantics of the modeled system. The MOF standard was approved by the OMG in November 1997. However, the MOF is only half the story, since we also need to define a mapping from this admittedly very high-level and abstract definition to actual language semantics. This is where JMI comes in. It defines how the constructs used by the MOF to model metadata map to the Java language. As the Java language mapping to MOF, JMI provides a common Java programming model for metadata access for the Java platform. JMI provides a natural and easy-to-use mapping from a MOF-compliant data abstraction (again, usually defined in UML) to the Java programming language. Using JMI, applications and tools which specify their metamodels using MOF-compliant UML can have the Java interfaces to the models automatically generated. Further, metamodel and metadata interchange via XML is enabled by JMI's use of the XML Metadata Interchange (XMI) specification, an XML-based mechanism for interchanging metamodel information among applications. Java applications can create, update, delete, and retrieve information contained in a JMI compliant metadata service. The flexibility and extensibility of the MOF allows JMI to be used in a wide range of usage scenarios. Thus, one can define metadata clearly and unambiguously with UML-like semantics, and JMI will automatically generate the Java APIs that will create, manage, access, and query metadata instances. Additionally, JMI uses the XML Metadata Interchange (XMI) standard to define precise mappings from MOF-based metamodels to XML DTDs, and mappings from metadata instances to XML streams, enabling standard XML-based interchange. What about XML?We mentioned previously that XML alone is not sufficient to model metadata. Why is that the case? Isn't XML a language for defining other languages, and as such could be used to describe metadata? While it is true that XML is great for data and metadata interchange, it turns out that it is not rich enough for capturing high level semantic information. Features like hierarchical relationships, associations, data types, and other critical information necessary for describing metadata can only be represented in XML by extending XML itself. Some extensions like these are found in XML Schema, but that still does not provide a common metadata modeling environment. For example, although any one XML Schema can represent the metadata for a particular application, component, or service, it does not give a general solution to modeling metadata across many different domains. What we need, again, is a set of high-level constructs that can be used to build models of any kind of metadata. The MOF model provides these constructs. So now we can see that the XML DTD standardization efforts referred to earlier are often focused at the wrong level of abstraction. These groups are essentially standardizing on an interchange format. While this is very useful, their efforts would be even more effective if they could standardize on the actual objects used in their domain, the way the data warehousing industry has done with the Common Warehouse Metamodel (CWM). In this case, industry leaders have agreed on the high-level definitions of such industry-specific concepts as dimension, schema, column, relational table, and OLAP cube. By standardizing the metadata, data warehousing vendors are significantly increasing the ability of their systems to interoperate. JMI -- Enabling a Metadata Management InfrastructureImplementations of the JMI specification will provide a metadata management infrastructure that will greatly facilitate the integration of applications, tools, and services. Formerly, complete interoperability and integration of disparate systems has been difficult because there has been no standard way to represent their unique characteristics. JMI provides the metadata framework that captures these semantics. EJBs have proven to be highly effective at masking the complexities of the computing platform and enabling developers to build components without needing to directly handle transactions, security, resource pooling, and a host of other low-level programming tasks. Similarly, JMI will allow developers to mask complexities in their business logic by creating high-level models of their specific technology and business domains. JMI thus reduces the complexity of interoperability and integration by providing:
A host of companies are participating in the definition of the JMI specification. They are: Adaptive, DSTC, Hyperion Solutions Corporation, IBM Corporation, Iona, Novosoft, Oracle Corporation, Perfekt-UML, Rational Software Corporation, SAS Institute, Sun Microsystems, Sybase, and Unisys Corporation, which is the JCP Specification Lead. The JMI specification has been released for public review. Unisys, Sun, IBM, Adaptive, and other companies are in the process of implementing the specification. Others, such as Hyperion and Oracle, will be implementing JMI interfaces and functionality as part of embedded metadata management services within their Java for OLAP (JSR-69) and Java for Data Mining (JSR-73) product offerings. Compliance to the specification will be ensured by the JMI Technology Compatibility Kit (TCK), which will be available, along with a Reference Implementation of the specification, when the standard becomes final. The Value of JMILet's examine all of the features that a JMI implementation provides:
Model-Driven DevelopmentIncreasingly, software developers are looking to find ways to capture and leverage best practices when developing their systems. They are doing this with patterns, high-level descriptions of software solutions to recurring problems. Some well known examples are the Model-View-Controller pattern, which decouples the presentation, control, and data management aspects of an application; or a Data Access Object, which encapsulates loosely-coupled access to back-end data resources. These patterns allow for reusability in a much broader context than the particular domain in which the pattern was first identified. As artifacts that are captured at multiple levels of abstraction, patterns are most often characterized using UML notation. These patterns are just another kind of metamodel. Coupled with the facilities that JMI provides of automatically generating the Java interfaces and XML DTDs to the models, patterns have the potential to drastically speed up the design and development of well-built, successful systems. The OMG has based its future architecture and development on just such a model-based approach, known as the Model Driven Architecture (MDA), in which MOF and JMI are core components. Other Specifications Leveraging JMIThere are a number of specifications that are using the metadata management facilities of JMI:
We continue to have discussions with other JSR Specification Leads to explore other areas where this technology can be leveraged. SummaryWe have seen how a standard for metadata management is becoming increasingly important for enabling the integration and interoperability of applications, tools, and services. We have seen that a common way to represent metadata is required, and that this representation must be both capable of modeling multiple levels of abstraction as well as use generic, precisely defined artifacts. With these precisely defined artifacts, unambiguous mappings to specific languages such as Java, or interchange mechanisms such as XML can be defined. The JMI specification provides this standard. JMI will allow Java applications to specify, store, access, and interchange metadata using standard metadata services. By describing metadata using the widely-adopted UML notation, developers can take advantage of the capabilities of the JMI framework to automatically generate the APIs necessary to manage and interchange their metadata. The specification is likely to increase the adoption of standards-based metadata and hence accelerate the creation of robust applications and solutions in which there are no barriers to information exchange. The ability to reuse modules and systems is also increased, because the components can be well characterized. Finally, comprehensive metadata management is an enabler for the services-based architecture. A standard for describing objects and services is essential in building services that can query, negotiate, and interact with each other. The final release of the JMI specification, along with its Reference Implementation and compatibility test suite is expected by June 2002. An early implementation of JMI that works with the NetBeans open-source IDE is the Metadata Repository (MDR). It can be downloaded from netbeans.org. More information, including pointers to all specifications, whitepapers, presentations, and other collateral, can be found at the JMI website. ReferencesData Warehousing, CWM And MOF Resource Page Common Warehouse Metamodel (CWM) OMG's Model-Driven Architecture About the AuthorChuck Mosher is a Staff Engineer in Market Development Engineering at Sun Microsystems. He has been with Sun for over 12 years, and works with ISVs in the data warehousing and business intelligence market segments. His recent focus has been to help drive Java standards in the areas of metadata management and data analytics. | ||||||||||
|
| ||||||||||||