Contents Previous Next
1. IntroductionSpeech recognition systems provide computers with the ability to listen to user speech and determine what is said. Current technology does not yet support unconstrained speech recognition: the ability to listen to any speech in any context and transcribe it accurately. To achieve reasonable recognition accuracy and response time, current speech recognizers constrain what they listen for by using grammars. The Java Speech Grammar Format (JSGF) defines a platform-independent, vendor-independent way of describing one type of grammar, a rule grammar (also known as a command and control grammar or regular grammar). It uses a textual representation that is readable and editable by both developers and computers, and can be included in Java source code. The other major grammar type, the dictation grammar, is not discussed in this document. A rule grammar specifies the types of utterances a user might say (a spoken utterance is similar to a written sentence). For example, a simple window control grammar might listen for "open a file", "close the window", and similar commands. What the user can say depends upon the context: is the user controlling an email application, reading a credit card number, or selecting a font? Applications know the context, so applications are responsible for providing a speech recognizer with appropriate grammars. This document is the specification for the Java Speech Grammar Format. First, the basic naming and structural mechanisms are described. Following that, the basic components of the grammar, the grammar header and the grammar body, are described. The grammar header declares the grammar name and lists the imported rules and grammars. The grammar body defines the rules of this grammar as combinations of speakable text and references to other rules. Finally, some simple examples of grammar declarations are provided. 1.1 Related DocumentationThe following is a list of related documentation that may be helpful in understanding and using the Java Speech Grammar Format. The Java Speech Grammar Format has been developed for use with recognizers that implement the Java Speech API. However, it may also be used by other speech recognizers and in other types of applications. Readers interested in the programmatic use of the Java Speech Grammar Format with the Java Speech API are referred to the technical documentation and the Programmers Guide for the API. Among other issues, those documents define the mechanisms for loading grammars into recognizers, methods for controlling and modifying loaded grammars, error handling and so on. The Java Speech Grammar Format adopts some of the style and conventions of the Java Programming Language. Readers interested in a comprehensive specification are referred to The Java Language Specification, Gosling, Joy and Steele, Addison Wesley, 1996 (GJS96). Finally, like the Java Programming Language, grammars in the Java Speech Grammar Format may be written with the Unicode character set, as defined in The Unicode Standard, Version 2.0, The Unicode Consortium, Addison-Wesley Developers Press, 1996 (Uni96).
2. Definitions2.1 Grammar Names and Package NamesEach grammar defined by Java Speech Grammar Format has a unique name that is declared in the grammar header. Legal structures for grammar names are: The first form (package name + simple grammar name) is a full grammar name. The second form is a simple grammar name (grammar name only). Examples of full grammar names and simple grammar names include: The package name and grammar name have the same format as packages and classes in the Java programming language. A full grammar name is a dot- separated list of Java identifiers1 (see GJS96, §3.8 and §6.5). The grammar naming convention also follows the naming convention for classes in the Java Programming Language (see GJS96). The convention minimizes the chance of naming conflicts. The package name should be:
For example, for 2.2 RulenamesA grammar is composed of a set of rules that together define what may be spoken. Rules are combinations of speakable text and references to other rules. Each rule has a unique rulename. A reference to a rule is represented by the rule's name in surrounding <> characters (less-than and greater-than). A legal rulename is similar to a Java identifier but allows additional extra symbols. A legal rulename is an unlimited-length sequence of Unicode characters matching the following2:
Grammar developers should be aware of two specific constraints. First, rulenames
are compared with exact Unicode string matches, so case is significant. For
example,
The rulenames The Unicode character set includes most writing scripts from the world's living languages, so rulenames can be written in Chinese, Japanese, Korean, Thai, Arabic, European languages, and many more. The following are examples of rulenames. 2.2.1 Qualified and Fully-Qualified Names
Although rulenames are unique within a grammar, separate grammars may reuse
the same simple rulename. A later section introduces the A fully-qualified rulename includes the full grammar name and the simple rulename. For example: A qualified rulename includes only the simple grammar name and the rulename and is a useful shorthand representation. For example: The following conditions apply to the use of rulenames:
2.2.2 Resolving RulenamesIt is an error to use an ambiguous reference to a rulename. The following defines behavior for resolving references:
When a rulename reference cannot be resolved (not defined locally and not a public rule of an imported grammar), the handling of the reference is defined by the recognizer's software interface4. 2.2.3 Special RulenamesThe Java Speech Grammar Format defines two special rules,<NULL> and
<VOID>. These rules are universally defined - they are available in any grammar
without an explicit import statement - and they cannot be redefined. Both names
are fully-qualified so no qualifying grammar name is required.
The 2.3 TokensA token, sometimes called a terminal symbol, is the part of a grammar that defines what may be spoken by a user. Most often, a token is equivalent to a word. Tokens may appear in isolation, for example, or as sequences of tokens separated by whitespace characters, for example, In Java Speech Grammar Format, a token is a character sequence bounded by whitespace, by quotes or delimited by the other symbols that are significant in the grammar: A token is a reference to an entry in a recognizer's vocabulary, often referred to as the lexicon. The recognizer's vocabulary defines the pronunciation of the token. With the pronunciation, the recognizer is able to listen for that token. The Java Speech Grammar Format allows multi-lingual grammars, that is, grammars that include tokens from more than one language. However, most recognizers operate mono-lingually so a typical grammar will contain only one language. It is the responsibility of the application that loads a grammar into a recognizer to ensure that it has appropriate language support. As an example, the following is a simple multi-lingual rule. Most recognizers have a comprehensive vocabulary for each language they support. However, it is never possible to include 100% of a language. For example, names, technical terms and foreign words are often missing from the vocabulary. For tokens missing from the vocabulary, there are three possibilities:
Tokens do not need to be normal written words of a language, assuming that the token is properly defined in the recognizers vocabulary. For example, to handle the two pronunciations of "read" (past tense sounds like "red", present tense sounds like "reed") an application could define two separate tokens "read_past" and "read_present" with appropriate pronunciations. 2.3.1 Quoted TokensA token does not need to be a word. A token may be a sequence of words or a symbol. Quotes can be used to surround multi-word tokens and special symbols. For example: A multi-word token is useful when the pronunciation of words varies because of the context. Multi-word tokens can also be used to simplify the processing of results, for example, getting single-token results for "New York", "Sydney" and "Rio de Janeiro." Quoted tokens can be included in the recognizer's vocabulary like any other token. If a multi-word quoted token is not found in the vocabulary, then the default behavior is to determine the pronunciation of each space-separated token within the quotes, but otherwise treat the text within quotes as a single token. To include a quote symbol in a token, surrounding quotes must be used and the quote within the token must be preceded by a backslash `\'. Similarly, to include a backslash in a quoted token, it should be preceded by another backslash. For example, the following are two tokens representing a single backslash and a single quote character. White space is significant in quoted tokens. 2.3.2 Symbols and PunctuationMost speech recognizers provide the ability to handle common symbols and punctuation forms. For example, recognizers for English usually handle apostrophes ("Mary's", "it's") and hyphens ("new-age"). There are, however, many textual forms that are difficult for a recognizer to handle unambiguously. In these instances, a grammar developer should use tokens that are as close as possible to the way people will speak and that are likely to be built into the vocabulary. The following are common examples.
2.4 CommentsComments may appear in both the header and body. The comment style of the Java Programming Language is adopted (see GJS96). There are two forms of comment.
Comments do not nest. Furthermore, Comments may appear anywhere in a grammar definition except within tokens, quoted tokens, rulenames, tags and weights. The Java Speech Grammar Format supports documentation comments with a similar style to the documentation comments of the Java Programming Language (GJS96, §18). These special comments are defined in the section on Documentation Comments.
3. Grammar HeaderA single file defines a single grammar. The definition grammar contains two parts: the grammar header and the grammar body. The grammar header includes a self- identifying header, declares the name of the grammar and declares imports of rules from other grammars. The body defines the rules of the grammar, some of which may be public. 3.1 Self-Identifying HeaderA Java Speech Grammar Format document starts with a self-identifying header. This header identifies that the document contains JSGF and indicates the version of JSGF being used (currently "V1.0"). Next, the header optionally specifies the character encoding used in the document. The header may also optionally specify the locale of the grammar specified in the document. The locale6 specifies the language and optionally the country or regional variant that the grammar supports. The header is terminated by a semi-colon character and a newline character.The header format is: The following are examples of self-identifying headers. The first example does not provide a character encoding or locale, so a reasonable default would be assumed. In the US, the default might be ISO8859-1 (a standard character set) and "en" (the symbol for English). The second example defines the ISO8859-5 set (Cyrillic) but the default locale is assumed. The final example defines the "JIS" character set (one of the Japanese character sets) and defines the language as "ja" (Japanese). The Java platform handles a very wide range of character sets which are converted internally to the Unicode character set. When using the Unicode character set, JSGF is suited to writing grammars for nearly all modern languages. The hash character (`#') must be the first character in the document and all characters in the self-identifying header must be in the ASCII subset of the encoding being used. 3.2 Grammar Name DeclarationThe grammar's name must be declared as the first statement of that grammar. The declaration must use the full grammar name (package name + simple grammar name). Thus, the declaration format is either of the following: The naming of packages and grammars is described in the section on Grammar Names and Package Names. For example: 3.3 ImportThe grammar header can optionally include import declarations. The import declarations follow the grammar declaration and must come before the grammar body (the rule definitions). An import declaration allows one or all of the public rules of another grammar to be referenced locally. The format of the import statement is one of the following For example,
The first example is an import of a single rule by its fully-qualified rulename: the
rule
The use of the asterisk in the second import statement requests the import of all
public rules of the Note that because both a grammar name and a rulename or asterisk are required, the import statement is never of the form:
An imported rule can be referenced in three ways: by its simple rulename (e.g.
The name resolving behavior is defined earlier in this document in Resolving Rulenames. Note that an import statement is optional when an imported rule is always referenced by its fully-qualified rulename.
4. Grammar Body4.0.1 Rule DefinitionsThe grammar body defines rules. Each rule is defined in a rule definition. A rule is defined once in a grammar. The order of definition of rules is not significant7. The two patterns for rule definitions are: The components of the rule definition are (1) an optional public declaration, (2) the name of the rule being defined, (3) an equals sign `=', (4) the expansion of the rule, and (5) a closing semi-colon `;'. White space is ignored before the definition, between the public keyword and the rulename, around the equal-sign character, and around the semi-colon. White space is significant within the rule expansion. The rule expansion defines how the rule may be spoken. It is a logical combination of tokens (text that may be spoken) and references to other rules. The term "expansion" is used because an expansion defines how a rule is expanded when it is spoken - a single rule may expand into many spoken words plus other rules which are themselves expanded. Later sections define the legal expansions. 4.0.2 Public Rules
Any rule in a grammar may be declared as public by the use of the
Without the public declaration, a rule is implicitly private8 and can only be referenced within rule definitions in the local grammar. 4.1 Rule ExpansionsThe simplest rule expansions are a reference to a token and a reference to a rule. For example,
The rule
The rule In more formal terms, the following are legal expansions.
These reference policies are defined locally, that is, by the scope of the current
rule. For example, a rule in An empty definition is not legal.
Definition of a rule as either The following sections explain ways in which more complex rules can be defined by logical combinations of legal expansions using: 4.2 Composition4.2.1 SequencesA rule may be defined by a sequence of expansions. A sequence of legal expansions, each separated by white space, is itself a legal expansion. For example, because both tokens and rule references are legal expansions, the following are legal rule definitions.
To speak a sequence, each item in the sequence must be spoken in the defined
order. In the first example, to say the rule The items in a sequence may be any legal expansion. This includes the structures described below for alternatives, groups and so on. 4.2.2 AlternativesA rule may be defined as a set of alternative expansions separated by vertical bar characters `|' and optionally by whitespace. For example:
To say the rule Sequences have higher precedence than alternatives. For example, is a set of three alternatives, each naming a country. An empty alternative is not legal. 4.2.3 WeightsNot all ways of speaking a grammar are equally likely. Weights may be attached to the elements of a set of alternatives to indicate the likelihood of each alternative being spoken. A weight is a floating point number10 surrounded by forward slashes, e.g. /3.14/. The higher the weight, the more likely it is that an entry will be spoken. The weight is placed before each item in a set of alternatives. For example:
The weights should reflect the occurrence patterns of the elements of a set of alternatives. In the first example, the grammar writer is indicating that "small" is 10 times more likely to be spoken than "large" and 5 times more likely than "medium." The following conditions must be met when specifying weights:
Appropriate weights are difficult to determine and guessing weights does not always improve recognition performance. Effective weights are usually obtained by study of real speech and textual data. Not all recognizers utilize weights in the recognition process. However, as a minimum, a recognizer is required to ensure that any alternative with a weight of zero cannot be spoken. 4.3 Grouping4.3.1 (Parentheses)Any legal expansion may be explicitly grouped using matching parentheses `()'. Grouping has high precedence and so can be used to ensure the correct interpretation of rules. It is also useful for improving clarity. For example, because sequences have higher precedence than alternatives, parentheses are required in the following rule definition so that "please close" and "please delete" are legal. The following example shows a sequence of three items, with each item being a set of alternatives surrounded by parentheses to ensure correct grouping.
To say something matching If a grouping surrounds a single expansion, then the entity is defined to be a sequence of one item11. For example: Empty parentheses are not legal. 4.3.2 [Optional Grouping]Square brackets may be placed around any rule definition to indicate that the contents are optional. In other respects, it is equivalent to parentheses for grouping and has the same precedence. For example, allows a user to say "don't crash" and to optionally add one form of politeness such as "oh mighty computer don't crash" or "kindly don't crash". Empty brackets are not legal. 4.4 Unary OperatorsThere are three unary operators in the Java Speech Grammar Format: the Kleene star, the plus operator and tags. The unary operators share the following features:
Because the precedence of unary operators is higher than that of sequences and alternatives, parentheses must be used to surround a sequence or set of alternatives to attach an operator to the entire entity. 4.4.1 * Kleene StarA rule expansion followed by the asterisk symbol indicates that the expansion may be spoken zero or more times. The asterisk symbol is known as the Kleene star (after Stephen Cole Kleene, who originated the use of the symbol). For example, allows a user to say things like "please don't crash", "oh mighty computer please please don't crash", or to ignore politeness with "don't crash". As a unary operator, this symbol has high precedence. For example, in
the operator applies to the most immediate preceding legal expansion: the token
"York". Thus to speak does match "sing New York New York". 4.4.2 + Plus OperatorA rule expansion followed by the plus symbol indicates the expansion may be spoken one or more times. For example, requires at least one form of politeness. So, it allows a user to say "please please don't crash". However, "don't crash" is not legal. 4.5 TagsTags provide a mechanism for grammar writers to attach application-specific information to parts of rule definitions. Applications typically use tags to simplify or enhance the processing of recognition results. Tag attachments do not affect the recognition of a grammar. Instead, the tags are attached to the result object returned by the recognizer to an application. The software interface of the recognizer defines the mechanism for providing tags12. A tag is a unary operator. As such it may be attached to any legal rule expansion. The tag is a string delimited by curly braces `{}'. All characters within the braces are considered a part of the tag, including white-space. Empty braces "{}" are a legal tag. In this instance the tag should be interpreted as a zero-length string ("" in the Java programming language). To include a closing brace character in a quote, it must be escaped with a backslash `\' character. To include a backslash character, it must also be escaped by a backslash. For example, is processed as the following string: The tag attaches to the immediate preceding rule expansion (intervening whitespace is ignored). For example:
As a unary operator, tag attachment has higher precedence than sequences and alternatives. For example, in the "thing" tag is attached only to the "newspaper" token. Parentheses may be used to modify tag attachment: Unlike the other unary operators, more than one tag may follow a rule expansion. For example, This is interpreted as if parentheses were used: This syntactic convenience is only permitted for tags. The following are not permitted: 4.5.1 Using TagsTags can simplify the writing of applications by simplifying the processing of recognition results. The content of tags, and the use of tags are entirely up to the discretion of the developer. One important use of tags is in internationalizing applications. The following are examples of rule definitions for four grammars, each for a separate language. The application loads the grammar for the language spoken by the user into an appropriate recognizer. The tags remain the same across all languages and thus simplify the application software that processes the results. Typically, the grammar name will include the language identifier so that it can be programmatically located and loaded. In the English grammar: In the Japanese grammar: In the German grammar: In the French grammar: 4.6 PrecedenceThe following defines the relative precedence of the entities of a rule expansion in the Java Speech Grammar Format. The list proceeds from highest to lowest precedence.
4.7 RecursionRecursion is the definition of a rule in terms of itself. Recursion is a powerful tool that enables representation of many complex grammatical forms that occur in spoken languages. Recognizers supporting the Java Speech Grammar Format allow right recursion. In right recursion, the rule refers to itself as the last part of its definition. For example: allows the following commands: "stop", "stop and finish", "start and resume and finish". Nested right recursion is also permitted. Nested right recursion is a definition of a rule that references another rule that refers back to the first rule with each recursive reference being the last part of the definition. For example, Nested right recursion may occur across grammars. However, this is strongly discouraged, as it introduces unnecessary complexity and potential maintenance problems. Any right recursive rule can be re-written using the Kleene star `*' and/or the plus operator `+'. For example, the following rule definitions are equivalent: Although it is possible to re-write right recursive grammars using the `+' and `*' operators, the recursive form is permitted because it allows simpler and more elegant representations of some grammars. Other forms of recursion (left recursion, embedded recursion) are not supported because the re-write condition cannot be guaranteed13. 4.8 Uses of <NULL> and <VOID>
The two Special Rulenames defined earlier in this document -
A reference to The Kleene star and right recursion can be mapped as follows: For the two previous cases, the grammars are identical in the sense that the user may speak exactly the same utterances. However, there may be programmatic differences in the representation of results produced by a recognizer when the user says something matching the grammar.
A final use of
When
As a caution, to work effectively, this technique requires a recognizer that can
efficiently re-compile the grammar after changing the definition of 4.9 Documentation CommentsThe Java Speech Grammar Format supports documentation comments with a similar style to the documentation comments of the Java Programming Language (GJS96, §18). Such comments can appear before the grammar declaration, before import statements and before the definition of any rule. Hypertext web pages and other forms of documentation can be automatically generated from these comments.
A documentation comment commences with the characters The first sentence of the comment (terminated by the first period character) should be a summary sentence with a concise description of the entity being commented.
Unlike Java code documentation, documentation comments in the Java Speech
Grammar Format do not allow HTML tags. This is because HTML tags and JSGF
rulename references use the same format. For example, is A tagged paragraph is marked by a line of a documentation comment that begins with the @ character followed by one of the keywords defined below. A tagged paragraph includes the following lines of the comment up to the start of the next tagged paragraph or to the end of the documentation comment. (Documentation tags are not related to tags in rule definitions.) The following example shows the basic structure of a documentation comment. Example 3: Documentation Comments shows an example of comments for a simple grammar.
4.9.1 The @author TagThe@author tag may be used in the documentation comment for the grammar
declaration. Although there is no defined structure for the @author tag, it is
recommended that each author be listed in a separate tag. Multiple @author tags
may be included in a single comment.
For example, 4.9.2 The @version TagA single@version tag may be included in the documentation comment of a
grammar declaration. There is no defined structure or format. For example,
4.9.3 The @see TagAny number of@see tags may be used in any documentation comment. The tag
indicates a cross-reference to another rule (local or imported) or to another
grammar. The .* suffix indicates the reference is to a grammar.
4.9.4 The @example TagThe@example tag can be provided in documentation comments for rule
declarations to provide an example of how the rule may be spoken. Appropriate
examples make grammars and rules easier to understand. The example text may
also be used by grammar tools to verify the correctness of the grammar.
Developers are encouraged to use the same tokenization in examples as used in the rule definition. For example, the second sample above should be interpreted as two tokens because of the quotes around "south america". However, tool developers should consider the possibility that the example text includes human- readable formatting for clarity. For English this might include punctuation (period, comma, question mark, exclamation point etc.), capitalization of some tokens, modified tokenization (e.g. missing quotes).
The
would expand out with each example tag defined for the
5. ExamplesBy combining simple rules together, it is possible to build complex grammars that capture what users can say. The following are examples of grammars with complete headers and bodies. 5.1 Example 1: Simple Command and ControlThis example shows two basic grammars that define spoken commands that control a window. Optional politeness is included to show how speech interaction can be made a little more natural and conversational.
The
The
Because 5.2 Example 2: Resolving Names
The following example grammar illustrates the use of fully-qualified names for an
application that deals with clothing. The two imported grammars define import
rules regarding pants and shirts, including the lists of colors that shirts and pants
may have. The local
The reference to 5.3 Example 3: Documentation CommentsThe following example grammar illustrates the use of documentation comments for a simple grammar. It shows all three types of use of documentation comments, for the grammar declaration, for import statements and for rule definitions.
java.lang.Character class
defines methods to test identifiers and document the character sets in more detail:
isJavaIdentifierStart and isJavaIdentifierPart.
2
The 3
The 4
The Java Speech API throws a 5
The 6
The documentation for the 7 These properties differ from some linguistic and computational grammar formats which permit multiple alternate definition of non-terminals or for which order is significant. 8
Unlike the Java Programming Language, the Java Speech Grammar 9
As defined in the section on Rulenames, if 10 The Java programming language requires an `f' or `F' suffix for floating point numbers. This is not required in the Java Speech Grammar Format. 11 This definition provides consistency in parsing and other grammar analysis. 12
In the Java Speech API, both the 13 Technically speaking, the Java Speech Grammar Format defines what is formally called a regular grammar. Some features typically associated with a context-free grammar are permitted for clarity or convenience.
Contents Previous Next | |||||||||||||||||||||||||||
|
| ||||||||||||