Contents Previous   Next  

Chapter 1


Introduction
 

Speech technology, once limited to the realm of science fiction, is now available for use in real applications. The JavaTM Speech API, developed by Sun Microsystems in cooperation with speech technology companies, defines a software interface that allows developers to take advantage of speech technology for personal and enterprise computing. By leveraging the inherent strengths of the Java platform, the Java Speech API enables developers of speech-enabled applications to incorporate more sophisticated and natural user interfaces into Java applications and applets that can be deployed on a wide range of platforms.

 


 

1.1     What is the Java Speech API?

The Java Speech API defines a standard, easy-to-use, cross-platform software interface to state-of-the-art speech technology. Two core speech technologies are supported through the Java Speech API: speech recognition and speech synthesis. Speech recognition provides computers with the ability to listen to spoken language and to determine what has been said. In other words, it processes audio input containing speech by converting it to text. Speech synthesis provides the reverse process of producing synthetic speech from text generated by an application, an applet or a user. It is often referred to as text-to-speech technology.

Enterprises and individuals can benefit from a wide range of applications of speech technology using the Java Speech API. For instance, interactive voice response systems are an attractive alternative to touch-tone interfaces over the telephone; dictation systems can be considerably faster than typed input for many users; speech technology improves accessibility to computers for many people with physical limitations.

Speech interfaces give Java application developers the opportunity to implement distinct and engaging personalities for their applications and to differentiate their products. Java application developers will have access to state- of-the-art speech technology from leading speech companies. With a standard API for speech, users can choose the speech products which best meet their needs and their budget.

The Java Speech API was developed through an open development process. With the active involvement of leading speech technology companies, with input from application developers and with months of public review and comment, the specification has achieved a high degree of technical excellence. As a specification for a rapidly evolving technology, Sun will support and enhance the Java Speech API to maintain its leading capabilities.

The Java Speech API is an extension to the Java platform. Extensions are packages of classes written in the Java programming language (and any associated native code) that application developers can use to extend the functionality of the core part of the Java platform.

 


 

1.2     Design Goals for the Java Speech API

Along with the other Java Media APIs, the Java Speech API lets developers incorporate advanced user interfaces into Java applications. The design goals for the Java Speech API included:

 


 

1.3     Speech-Enabled Java Applications

The existing capabilities of the Java platform make it attractive for the development of a wide range of applications. With the addition of the Java Speech API, Java application developers can extend and complement existing user interfaces with speech input and output. For existing developers of speech applications, the Java platform now offers an attractive alternative with:

1.3.1     Speech and other Java APIs

The Java Speech API is one of the Java Media APIs, a suite of software interfaces that provide cross-platform access to audio, video and other multimedia playback, 2D and 3D graphics, animation, telephony, advanced imaging, and more. The Java Speech API, in combination with the other Java Media APIs, allows developers to enrich Java applications and applets with rich media and communication capabilities that meet the expectations of today's users, and can enhance person-to-person communication.

The Java Speech API leverages the capabilities of other Java APIs. The Internationalization features of the Java programming language plus the use of the Unicode character set simplify the development of multi-lingual speech applications. The classes and interfaces of the Java Speech API follow the design patterns of JavaBeansTM. Finally, Java Speech API events integrate with the event mechanisms of AWT, JavaBeans and the Java Foundation Classes (JFC).

 


 

1.4     Applications of Speech Technology

Speech technology is becoming increasingly important in both personal and enterprise computing as it is used to improve existing user interfaces and to support new means of human interaction with computers. Speech technology allows hands-free use of computers and supports access to computing capabilities away from the desk and over the telephone. Speech recognition and speech synthesis can improve computer accessibility for users with disabilities and can reduce the risk of repetitive strain injury and other problems caused by current interfaces.

The following sections describe some current and emerging uses of speech technology. The lists of uses are far from exhaustive. New speech products are being introduced on a weekly basis and speech technology is rapidly entering new technical domains and new markets. The coming years should see speech input and output truly revolutionize the way people interact with computers and present new and unforeseen uses of speech technology.

1.4.1     Desktop

Speech technology can augment traditional graphical user interfaces. At its simplest, it can be used to provide audible prompts with spoken "Yes/No/OK" responses that do not distract the user's focus. But increasingly, complex commands are enabling rapid access to features that have traditionally been buried in sub-menus and dialogs. For example, the command "Use 12-point, bold, Helvetica font" replaces multiple menu selections and mouse clicks.

Drawing, CAD and other hands-busy applications can be enhanced by using speech commands in combination with mouse and keyboard actions to improve the speed at which users can manipulate objects. For example, while dragging an object, a speech command could be used to change its color and line type all without moving the pointer to the menu-bar or a tool palette.

Natural language commands can provide improvements in efficiency but are increasingly being used in desktop environments to enhance usability. For many users it's easier and more natural to produce spoken commands than to remember the location of functions in menus and dialog boxes. Speech technology is unlikely to make existing user interfaces redundant any time soon, but spoken commands provide an elegant complement to existing interfaces.

Speech dictation systems are now affordable and widely available. Dictation systems can provide typing rates exceeding 100 words per minute and word accuracy over 95%. These rates substantially exceed the typing ability of most people.

Speech synthesis can enhance applications in many ways. Speech synthesis of text in a word processor is a reliable aid to proof-reading, as many users find it easier to detect grammatical and stylistic problems when listening rather than reading. Speech synthesis can provide background notification of events and status changes, such as printer activity, without requiring a user to lose current context. Applications which currently include speech output using pre-recorded messages can be enhanced by using speech synthesis to reduce the storage space by a factor of up to 1000, and by removing the restriction that the output sentences be defined in advance.

In many situations where keyboard input is impractical and visual displays are restricted, speech may provide the only way to interact with a computer. For example, surgeons and other medical staff can use speech dictation to enter reports when their hands are busy and when touching a keyboard represents a hygiene risk. In vehicle and airline maintenance, warehousing and many other hands-busy tasks, speech interfaces can provide practical data input and output and can enable computer-based training.

1.4.2     Telephony Systems

Speech technology is being used by many enterprises to handle customer calls and internal requests for access to information, resources and services. Speech recognition over the telephone provides a more natural and substantially more efficient interface than touch-tone systems. For example, speech recognition can "flatten out" the deep menu structures used in touch tone systems.

Systems are already available for telephone access to email, calendars and other computing facilities that have previously been available only on the desktop or with special equipment. Such systems allow convenient computer access by telephones in hotels, airports and airplanes.

Universal messaging systems can provide a single point of access to multiple media such as voice-mail, email, fax and pager messages. Such systems rely upon speech synthesis to read out messages over the telephone. For example: "Do I have any email?" "Yes, you have 7 messages including 2 high priority messages from the production manager." "Please read me the mail from the production manager." "Email arrived at 12:30pm...".

1.4.3     Personal and Embedded Devices

Speech technology is being integrated into a range of small-scale and embedded computing devices to enhance their usability and reduce their size. Such devices include Personal Digital Assistants (PDAs), telephone handsets, toys and consumer product controllers.

Speech technology is particularly compelling for such devices and is being used increasingly as the computer power of these devices increases. Speech recognition through a microphone can replace input through a much larger keyboard. A speaker for speech synthesis output is also smaller than most graphical displays.

PersonalJavaTM and EmbeddedJavaTM are the Java application environments targeted at these same devices. PersonalJava and EmbeddedJava are designed to operate on constrained devices with limited computing power and memory, and with more constrained input and output mechanisms for the user interface.

As an extension to the Java platform, the Java Speech API can be provided as an extension to PersonalJava and EmbeddedJava devices, allowing the devices to communicate with users without the need for keyboards or other large peripherals.

1.4.4     Speech and the Internet

The Java Speech API allows applets transmitted over the Internet or intranets to access speech capabilities on the user's machine. This provides the ability to enhance World Wide Web sites with speech and support new ways of browsing. Speech recognition can be used to control browsers, fill out forms, control applets and enhance the WWW/Internet experience in many other ways. Speech synthesis can be used to bring web pages alive, inform users of the progress of applets, and dramatically improve browsing time by reducing the amount of audio sent across the Internet.

The Java Speech API utilizes the security features of the Java platform to ensure that applets cannot maliciously use system resources on a client. For example, explicit permission is required for an applet to access a dictation recognizer since otherwise a recognizer could be used to bug a user's workspace.

 


 

1.5     Implementations

The Java Speech API can enable access to the most important and useful state-of- the-art speech technologies. Sun is working with speech technology companies on implementations of the API. Already speech recognition and speech synthesis are available through the Java Speech API on multiple computing platforms.

The following are the primary mechanisms for implementing the API.

 


 

1.6     Requirements

To use the Java Speech API, a user must have certain minimum software and hardware available. The following is a broad sample of requirements. The individual requirements of speech synthesizers and speech recognizers can vary greatly and users should check product requirements closely.

*As used on this web site, the terms "Java virtual machine" or "JVM" mean a virtual machine for the Java platform.


Contents Previous   Next  


JavaTM Speech API Programmer's Guide
Copyright © 1997-1998 Sun Microsystems, Inc. All rights reserved
Send comments or corrections to javaspeech-comments@sun.com