Mobile X+V 1.2

2 September 2005

This version:
http://www.voicexml.org/specs/multimodal/x+v/mobile/12/
Latest version:
http://www.voicexml.org/specs/multimodal/x+v/mobile/12/
Editors:
Jonny Axelsson, Opera Software <jax@opera.no>
Chris Cross, IBM <xcross@us.ibm.com>
Jim Ferrans, Motorola <James.Ferrans@motorola.com >
Gerald McCobb, IBM <mccobb@us.ibm.com>
T. V. Raman, IBM <tvraman@us.ibm.com>
Les Wilson, IBM <lesw@us.ibm.com>

Abstract

The Mobile X+V profile brings spoken interaction to standard web content by integrating W3C standards for the visual and voice Web. XHTML is for rendering visual content, VoiceXML for spoken interaction, and the different modalities are integrated using XML Events to author DOM2 Event bindings. Using this integration framework, voice handlers can be attached to XHTML elements and respond to specific DOM events, thereby reusing the event model familiar to web developers. Voice interaction features are integrated with XHTML and CSS and can consequently be used directly within XHTML content. This specification builds on existing mobile profiles of W3C technologies such as XHTML Basic and CSS Mobile to create a multimodal specification suitable for use on mobile devices.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. This document is current with respect to the following W3C recommendations:

  • VoiceXML 2.0
  • XHTML 1.1
  • XML Events 1.0
  • XHTML Modularization
  • Table of Contents

    1 Introduction
        1.1 Motivation And Applications
        1.2 Differences between Mobile X+V and XHTML+Voice Profile 1.2
        1.3 Design Principles
        1.4 Mobile X+V Processing Model
            1.4.1 Processing within one Document
                1.4.1.1 Specifying External VoiceXML Documents
                1.4.1.2 VoiceXML Dialog Activation
                1.4.1.3 Accessing Speech Dialog Results from XHTML
                1.4.1.4 Returning from a VoiceXML Form
                1.4.1.5 Voice Handler Execution Context
            1.4.2 Cancel
            1.4.3 Declarative Synchronization of Input Modes
            1.4.4 Events and Event Handling
            1.4.5 Aural Style Sheets
            1.4.6 Voice Handler Resource Fetching
        1.5 Accessibility and Mobile X+V
    2 VoiceXML 2.0 Modules
        2.1 Modularization of VoiceXML 2.0
        2.2 Speech Dialogs
        2.3 Executable Content
        2.4 Speech Grammars
        2.5 Speech And Non-speech Audio Output
        2.6 Event Handling
        2.7 Script
        2.8 VoiceXML Container for X+V
            2.8.1 Document Conformance
            2.8.2 User Agent Conformance
            2.8.3 VoiceXML Namespace Integration
    3 XHTML Modularization
        3.1 Document Conformance
        3.2 User Agent Conformance
        3.3 XHTML Namespace Integration
        3.4 Mobile X+V
    4 XML-Events Module
        4.1 Listener
        4.2 Event Types
        4.3 X+V Event Propagation
    5 X+V Extension Module
        5.1 Sync
        5.2 Cancel
        5.3 VoiceXML Field ID Attribute
        5.4 VoiceXML Prompt SRC and EXPR Attributes
            5.4.1 Styling External Prompt Resources
            5.4.2 Dynamic Prompt Generation in XHTML
            5.4.3 Invalid Prompt Resource
            5.4.4 Prompt Resource Fetching Properties

    Appendices

    A Examples
        A.1 What You See Is What You Can Say
        A.2 Mixed-initiative Conversational Interface
    B DTD
        B.1 Mobile X+V 1.2 DTD
    C Schema
        C.1 Mobile X+V 1.2 Schema
        C.2 VoiceXML Container for Mobile X+V
    D Sync Grammars for XHTML Controls
    E References
        E.1 Normative References
        E.2 Informative References


    1 Introduction

    This document defines a mobile profile of XHTML+Voice (X+V). X+V is a member of the XHTML family of document types, as specified by XHTML Modularization [XHTML Modularization]. XHTML is extended with the XML-Events module and a module containing a small number of attribute extensions to both XHTML and VoiceXML. The latter module facilitates the sharing of multimodal input data between the VoiceXML dialog and XHTML input and text elements.

    The XML-Events module [XML Events] provides XML host languages the ability to uniformly integrate event listeners and associated event handlers with Document Object Model (DOM) Level 2 [DOM2 Events] event interfaces. The result is an event syntax for XHTML-based languages that enables an interoperable way of associating behaviors with document-level markup.

    VoiceXML [VoiceXML 2.0] has been designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed-initiative conversations. VoiceXML 2.0 is modularized in X+V and a suitable mobile profile of VoiceXML based on this modularization supports authoring speech dialogs for updating XHTML forms and form elements.

    The modularization of VoiceXML 2.0 as defined by X+V also specifies DOM event types specific to voice interaction for use with the XHTML Events module. Speech dialogs authored in VoiceXML 2.0 can then be treated as event handlers to add voice-interaction specific behaviors to XHTML documents. The language integration supports all of the modules defined in XHTML Modularization, and adds speech interaction functionality via XML Events to enable multimodal applications. This makes the resulting Mobile X+V suitable for adoption by mobile devices that already use XHTML Basic and the XHTML Modularization framework to integrate custom XML modules into the host language.

    1.1 Motivation And Applications

    XHTML 1.1 [XHTML 1.1] and XML Events [XML Events] are integrated using [XHTML Modularization] to enable interaction with VoiceXML 2.0 [VoiceXML 2.0] speech dialogs and thereby bring spoken interaction to the mobile web. The design leverages open industry APIs like the W3C DOM to create interoperable web content that can be deployed across a variety of end-user devices. Mobile X+V also enables the deployment of network-based voice services that can also be accessed from resource-limited thin clients to perform advanced multimodal functions.

    Today, mobile web applications are authored in XHTML Basic [XHTML Basic] with user interaction created via XHTML form elements. The W3C is presently working on XForms [XForms], the next generation of web forms that bring the power of XML to web application development. The combination of XHTML and Voice described in this document can leverage the semantic richness of web applications created using XForms, while providing a smooth transition for today's developers wishing to deploy multimodal applications by adding spoken interaction to present-day web content. Integrating the work of the W3C voice browser working group into mainstream XHTML content has the advantage of ensuring that future enhancements to the voice browser component such as natural language understanding will be incorporated. This provides a smooth transition path for mobile web developers wishing to deliver increasingly smart user interaction for their web applications; further, the ability to use standardized voice technologies enables mobile service providers to deploy advanced voice solutions on the network that provide added value.

    1.2 Differences between Mobile X+V and XHTML+Voice Profile 1.2

    • The <xv:sync> html-form-id attribute is required (Sync).

    • JavaScript variables or input controls defined within the XHTML container cannot be accessed within VoiceXML forms. For example, the VoiceXML <assign> element cannot be used to update an HTML control. However, the VoiceXML standard application variables are still available on the client (Accessing Speech Dialog Results from XHTML).

    • VoiceXML forms cannot be embedded within an XHTML document. The external VoiceXML forms are referenced within a Mobile X+V document with the XML Events handler attribute. Each external reference must contain a fragment identifer specifying the VoiceXML form ID appended to the absolute or relative URI (Processing Within One Document).

    • The external VoiceXML forms referenced by a Mobile X+V document must be placed in a VoiceXML 2.0 document (e.g., root element is <vxml>). The VoiceXML 2.0 document must include an XML processing instruction that identifies its VoiceXML forms as Mobile X+V VoiceXML forms (VoiceXML Container for X+V).

    • Document linking with voice is neither required nor disallowed by the X+V 1.0 mobile profile (Document Linking with Voice).

    • An X+V Mobile Profile user agent does not need to keep the document dynamically updated. If the document is updated after it is loaded, for example by modifying the DOM through dynamic HTML, the text referenced by a VoiceXML <prompt> src attribute may not be updated (Dynamic Prompt Generation in XHTML).

    • The location of a VoiceXML document can be specified using the XHTML <link> element (Specifying External VoiceXML Documents).

    • XHTML+Voice namespace prefixing is removed from the <sync> element's attributes, input, field, and html-form-id (Sync).

    • The voice-handler attribute is removed from the <cancel> element (Cancel).

    • The scope attribute on a VoiceXML <form> element is supported by Mobile X+V (VoiceXML Dialog Activation).

    1.3 Design Principles

    Mobile X+V is an XML application [XML 1.0].

    1. XHTML is the host language.
    2. Mobile X+V extends XHTML Basic [XHTML Basic] with an appropriate mobile subset of VoiceXML 2.0, as well as XML-Events.
    3. Mobile X+V makes authoring easy for common types of multimodal interactions.
    4. VoiceXML modularization is used to create a mobile profile that meets the needs of mobile clients with limited resources.
    5. VoiceXML modularization does not alter the VoiceXML execution model. Specifically, a speech dialog is run as specified by the VoiceXML form interpretation algorithm. This enables mobile devices to rely on network-based voice services built on the VoiceXML platform.
    6. VoiceXML modularization does not modify the function of the VoiceXML 2.0 elements and attributes that are part of the profile.

    1.4 Mobile X+V Processing Model

    Mobile X+V is designed for creating multimodal dialogs that combine in a straightforward way the visual input mode represented by XHTML and speech input and output as represented by VoiceXML. Here is a "Hello World" example of Mobile X+V.

    <?xml version="1.0"?>
    <html 
    xmlns="http://www.w3.org/1999/xhtml"
    xmlns:ev="http://www.w3.org/2001/xml-events"
    xmlns:xv="http://www.voicexml.org/2002/xhtml+voice" >
      <head>
        <title>Hello World!</title>
      </head>
      <body>
        <h1>X+V Example</h1>
        <p ev:event="click" ev:handler="helloworld.vxml#sayHello">
          Hello World!
        </p>
      </body>
    </html>
    Mobile X+V Hello World!

    <?xml version="1.0" encoding="UTF-8"?>
    <?xv version="1.2"?> 
    <vxml xmlns="http://www.w3.org/2001/vxml" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xsi:schemaLocation="http://www.w3.org/2001/vxml
        http://www.voicexml.org/specs/multimodal/x+v/mobile/12/schema/vxml.xsd"
      xmlns:xv="http://www.voicexml.org/2002/xhtml+voice"
      version="2.0">
      <!-- voice handler -->
      <form id="sayHello">
        <block>Hello World!</block>
      </form>
    </vxml>
    VoiceXML 2.0 (helloworld.vxml)

    The speech dialog identified by "sayHello" is activated when the user clicks anywhere on the paragraph identified by "hello." The speech dialog is a VoiceXML form that synthesizes the text obtained from the same paragraph that activated the form. The speech output is "Hello World!"

    When deploying to mobile devices, the voice functions may reside either on the device, or on the network; Mobile X+V permits both forms of deployment, thereby enabling mobile service providers to create value-add voice services that run on the network and are accessed by resource-limited clients.

    1.4.1 Processing within one Document

    A speech dialog is defined within Mobile X+V as an external [VoiceXML 2.0] form with a unique ID. The VoiceXML form is activated by an XML-event with an associated handler that references the form's unique ID. The XML-event is generated from a user interaction with an XHTML element, generally a form control, or from a document event such as load or unload. Activating the VoiceXML form sets all form and field item variables to their initial values. This clears the the guard conditions on all form items that don't have an initial value set with the expr attribute. The form is run according to the form interpretation algorithm (FIA) specified by VoiceXML.

    1.4.1.1 Specifying External VoiceXML Documents

    The XHTML <link> element can be used to specify the location of an external VoiceXML document. Use the <link> attributes as follows:

    • Set the value of the href attribute to the location of the VoiceXML document. The value of href is a URI.
    • Set the value of the type attribute to the VoiceXML media type: application/voicexml+xml.
    • Set the value of the rel attribute to voicexml.
    1.4.1.2 VoiceXML Dialog Activation

    When the browser loads the body of a Mobile X+V document a "load" event is generated. This begins the event cycle specified by the DOM Level-2 Events model. While the event cycle is running events propagate through the HTML tree. An XML-Events listener can observe an event on either a target HTML node, or an ancestor of the node, if the event bubbles. An XML-Events listener activates a handler in response to the observed event. The handler can be a voice dialog activated in response to a "click" event on an HTML input, for example.

    A voice dialog will also be activated if the scope attribute on a VoiceXML <form> is set to "document" and one of its grammars is matched while another dialog is active.

    1.4.1.3 Accessing Speech Dialog Results from XHTML

    Speech dialog results may be accessed from XHTML in one of the following ways:

    1. The X+V <sync> element is described in X+V Extension Module.
    2. The VoiceXML standard application variables are available to a Mobile X+V application as global JavaScript variables. Each variable listed is an array of elements [0..i..n], where each element represents a possible result. See [VoiceXML 2.0] for more details:

      • application.lastresult$[i].confidence
      • application.lastresult$[i].utterance
      • application.lastresult$[i].inputmode
      • application.lastresult$[i].interpretation
    1.4.1.4 Returning from a VoiceXML Form

    When an event is captured within a voice dialog the author may choose to end the dialog and return to the XHTML container. Mobile X+V uses the VoiceXML <return> element for this purpose. If the <return> element is run within executable content of a top level voice handler (i.e., one that is not called as a subdialog), the voice handler will end its execution and return to the XHTML. The following example shows how the <return> element can be used:

    <?xml version="1.0"?> 
    <html xmlns="http://www.w3.org/1999/xhtml"
          xmlns:ev="http://www.w3.org/2001/xml-events"
          xmlns:xv="http://www.voicexml.org/2002/xhtml+voice" >
       <head><title>Find City or Airport</title>
          <link rel="voicexml" type="application/voicexml+xml" href="cityorairport.vxml"/> 
    
          <xv:sync input="city" field="cityorairport.vxml#cityorairport"
    			html-form-id="xform" />      
    
       </head>
       <body bgcolor="#FFFFFF">
          <h3>City or Airport</h3>
          <form id="xform" action="cgi/cityorairport.jsp">
             <p id="cityorairportprompt">Enter city or airport:</br>
                <input type="text" name="city"
                         ev:event="focus" ev:handler="cityorairport.vxml#vform"/>
             </p>
          </form>
       </body>
    </html>
    Mobile X+V Return Example

    <?xml version="1.0" encoding="UTF-8"?>
    <?xv version="1.2"?>
    <vxml xmlns="http://www.w3.org/2001/vxml" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xsi:schemaLocation="http://www.w3.org/2001/vxml
        http://www.voicexml.org/specs/multimodal/x+v/mobile/12/schema/vxml.xsd"
      xmlns:xv="http://www.voicexml.org/2002/xhtml+voice"
      version="2.0">
    
          <!-- voice handler -->
          <form id="vform">
             <field name="cityorairport" xv:id="cityorairport">
                <grammar src="cityorairport.grxml"/>
                <prompt src="cityorairport.html#cityorairportprompt"/> 
                <catch event="error.badfetch">
                   Error fetching grammar!
                   <return/>
                </catch>
             </field> 
          </form>
    
    </vxml>
    VoiceXML 2.0 (cityorairport.vxml)

    When the <return> element is specified within a top-level voice form, its namelist attribute has no meaning and is ignored. However either the event or eventexpr attribute can be used to return a VoiceXML event to the XHTML container. Otherwise by default if <return> is run no VoiceXML event is returned to the XHTML.

    1.4.1.5 Voice Handler Execution Context

    An activated voice handler executes in a new execution context similar to the VoiceXML <subdialog> element. The context includes all the declarations and state information for the voice handler and its VoiceXML container, with counters reset, and variables initialized. The voice handler proceeds until the execution of a <return> element, or until no form items remain eligible for the FIA to select. A <return> element causes control to be returned to the XHTML container, with each filled field synchronized with an XHTML control as specified by a <sync> element. When the voice handler returns its execution context is deleted.

    1.4.2 Cancel

    Multiple speech dialogs running simultaneously are not allowed by Mobile X+V. A speech dialog runs in its own thread and, for many devices, the audio subsystem can be owned by only one thread at one time. Also, other resources that are not guaranteed to be thread-safe may cause a voice handler to indefinitely block. Therefore, only one speech dialog can be running at one time per loaded Mobile X+V document. If only one speech dialog can be running at one time, the activating speech dialog must cancel the currently running dialog. This is the default behavior. The running dialog should also be canceled when the current Mobile X+V document is unloaded.

    The document author can cancel the currently running speech dialog with the <cancel> element that can be specified by an XHTML element as a handler for an XML Event. The X+V Extension Module section provides more details.

    Cancel is a message from the visual browser that must be handled by the VoiceXML FIA. It is separate from the cancel event supported by VoiceXML that cancels the currently running prompt. The cancel message from the visual browser modifies the FIA in the sense that it must be checked throughout the FIA, and if it is received then the FIA must terminate.

    1.4.3 Declarative Synchronization of Input Modes

    The X+V <sync> element provides a declarative synchronization of XHTML form control elements and the VoiceXML <field> element. The <sync> element specifies the following behaviors. First, sync allows input from one speech or visual modality to set the field in the other modality. Second, setting the focus of an <input> element that is synchronized with a VoiceXML field updates the FIA to visit that VoiceXML field. This is useful when there are multiple fields within a VoiceXML form. Sync is both a message to the VoiceXML FIA from the visual browser, like cancel, and a message from the FIA to the visual browser. The X+V Extension Module section provides more details.

    1.4.4 Events and Event Handling

    The nomatch, noinput, help, and error VoiceXML event types are propagated as XML-events to XHTML. They can be linked to an XML-Events handler using the XML-events syntax for specifying target, observer, event, and handler. The events are propagated regardless of whether the event has already been caught and handled properly within the VoiceXML form. The VoiceXML event types nomatch, noinput, help, and error propagate to the XHTML container as the X+V event types vxmlnomatch, vxmlnoinput, vxmlhelp, and vxmlerror, respectively.

    Within VoiceXML a chain of events can be created, where one event is caught and another event is thrown, and so on. Because the entire chain of events is propagated to XHTML, the application author should be careful not to chain multiple events of the same type. The VoiceXML error event subtypes error.semantic, error.badfetch, error.unsupport.element, etc., are propagated as the vxmlerror event type to XHTML. This is in accordance with the VoiceXML specification. This allows for the application to define additional error subtypes that can be handled by the visual browser. More general application-defined event types are also supported. If an application-defined event type is defined within the VoiceXML form, such as "foo.bar", then when that event is thrown within the form, it is propagated to XHTML as an XML-event. For the example below, both the vxmlnoinput and foo.bar events are handled by the visual browser via the XML-events listener tag. Note that the VoiceXML form exits because the foo.bar event is not handled within the form.

    <?xml version="1.0"?>
    <html  xmlns="http://www.w3.org/1999/xhtml" 
           xmlns:ev="http://www.w3.org/2001/xml-events"
           xmlns:xv="http://www.voicexml.org/2002/xhtml+voice">
      <head><title>XML Events Example</title>
    
        <script>
               var isDocumentLoaded = false;
        </script>
    
        <script id="scr1" declare="declare">
           if (isDocumentLoaded) {
    	   alert("Please say yes or no");
           }
    
           isDocumentLoaded = true;
        </script>
    
        <ev:listener ev:observer="ex1" ev:event="vxmlnoinput" ev:handler="#scr1"/>
        <ev:listener ev:observer="ex1" ev:event="foo.bar"
            ev:handler="eventexample.vxml#vform2"/>
    
        <xv:sync input="button" field="eventexample.vxml#fld1" html-form-id="fo1"/>
    
      </head>
      <body id="bd1" ev:event="load" ev:handler="eventexample.vxml#vform1">
         <form id="fo1" action="http://www.example.com/x+v/cgi/example.pl">
             <input name="button" type="radio" value="yes"/>
             <input name="button" type="radio" value="no"/>
         </form>
      </body>
    </html>
    
    Mobile X+V XML Events Example

    <?xml version="1.0" encoding="UTF-8"?>
    <?xv version="1.2"?>
    <vxml xmlns="http://www.w3.org/2001/vxml" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xsi:schemaLocation="http://www.w3.org/2001/vxml
        http://www.voicexml.org/specs/multimodal/x+v/mobile/12/schema/vxml.xsd"
      xmlns:xv="http://www.voicexml.org/2002/xhtml+voice"
      version="2.0">
    
      <!-- voice handlers -->
      <form id="vform1">
         <catch event="noinput">
           <throw event="foo.bar"/>
         </catch>
         <field name="fld1" xv:id="fld1">
            <grammar type="boolean"/>
            <prompt>Say yes or no</prompt>
         </field>
      </form>
    
      <form id="vform2">
         <block>
           Foo bar event received.
         </block>
      </form>
    
    </vxml>
    
    VoiceXML 2.0 (eventexample.vxml)

    Note that the XML <listener> elements observe the VoiceXML events on the XHTML element that activated the VoicexML form, in this case the <body> element.

    In addition to the VoiceXML event types listed above, Mobile X+V supports the vxmldone event type. The vxmldone event is generated when the currently running VoiceXML form completes without an error, or due to running a <return> element. All the event types that Mobile X+V supports are listed in the XML-Events Module.

    1.4.5 Aural Style Sheets

    With the addition of the src and expr attributes to the VoiceXML <prompt> element, Mobile X+V is able to support Aural style sheets declared according to [CSS2]. Within XHTML, a paragraph with id set to "warnPara" can be styled with the CSS "warn" class:

    <p id="warnPara" class="warn">warning</p>

    The CSS has visual and aural rules for class "warn." When the VoiceXML<form> processes a prompt with the src attribute set to that paragraph, the aural style rules for "warn" are invoked. The VoiceXML Prompt SRC and EXPR Attributes Section provides more details and a complete example.

    1.4.6 Voice Handler Resource Fetching

    Each time a voice handler is activated, its expiration status must be checked according to the caching policy specified in Section 6.1.2 of [VoiceXML 2.0]. Note that the VoiceXML max-age and max-stale attributes and properties mentioned in Section 6.1.2 are not provided by a Mobile X+V document.

    1.5 Accessibility and Mobile X+V

    This section is TBD.

    2 VoiceXML 2.0 Modules

    This section specifies a profile for VoiceXML to be used in Mobile X+V. VoiceXML 2.0 was first modularized in XHTML+Voice 1.0. This section gives a high-level overview of each of the modules that make up the VoiceXML profile.

    2.1 Modularization of VoiceXML 2.0

    Table 1: VoiceXML Modules
    Module Purpose Elements Mobile X+V?
    Events Events thrown by Voice XML processor catch help noinput nomatch error throw Y
    Executable statements Statements for use in voice handlers assign clear log reprompt Y
    Filled Voice handlers invoked when a slot is filled. filled Y
    Flow control Flow control constructs from VoiceXML if else elseif return Y
    Forms Encapsulate voice dialogs form field record subdialog block initial Y
    Script ECMAScript support in the VoiceXML container var script Y
    Non-Local Non-local transfers in VoiceXML exit goto submit link N
    Menus VoiceXML menus menu choice N
    Object Foreign objects for VoiceXML object N
    Resources Specifying resources for VoiceXML param property Y
    Root VoiceXML container for Mobile X+V vxml meta metadata Y
    Enumerate Enumerate choices or options available to user enumerate Y
    Option Specify option in a field option Y
    Output Speech and audio output prompt value audio desc emphasis lexicon mark voice break prosody say-as sub phoneme p s meta metadata Y
    Telephony Telephony control transfer disconnect N
    User Input Speech input constructs from VoiceXML grammar lexicon example tag token item meta metadata one-of rule ruleref Y
    Attributes Common attributes used in VoiceXML NA Y
    Datatypes Common datatypes used in VoiceXML NA Y
    Document Model Defines content model for VoiceXML elements NA N

    2.2 Speech Dialogs

    Modules vxml-exec-1.xsd, vxml-filled-1.xsd, vxml-resource-1.xsd, vxml-flow-1.xsd, vxml-enumerate-1.xsd, vxml-option-1.xsd, and vxml-form-1.xsd support authoring handlers that implement speech dialogs.

    2.3 Executable Content

    Modules vxml-filled-1.xsd, vxml-flow-1.xsd, vxml-exec-1.xsd, and vxml-resource-1.xsd declare constructs for use within voice handlers. The semantics of these constructs are as defined in the VoiceXML 2.0 specification.

    2.4 Speech Grammars

    The speech grammar modules provide constructs for authoring speech grammars as specified in VoiceXML 2.0. The modules are provided by the normative VoiceXML 2.0 SCHEMA and are unchanged: grammar-core.xsd, grammar.xsd, vxml-grammar-restriction.xsd, and vxml-grammar-extension.xsd. The restriction and extension modules allow the elements and attributes normatively specified by the speech grammar specification [Speech Grammars] to be included within the VoiceXML 2.0 namespace.

    2.5 Speech And Non-speech Audio Output

    The speech and audio output modules define constructs for producing spoken and non-spoken audio output. The modules are provided by the normative VoiceXML SCHEMA and are unchanged: synthesis-core.xsd, synthesis.xsd, vxml-synthesis-restriction.xsd, and vxml-synthesis-extension.xsd. As with the speech grammar modules, the elements and attributes normatively defined in the SSML specification [SSML 1.0] are included within the VoiceXML 2.0 namespace.

    2.6 Event Handling

    Module vxml-events-1.xsd declares the event types defined in VoiceXML 2.0.

    2.7 Script

    Module vxml-script-1.xsd declares the <script> and <var> elements for ECMAScript support in the VoiceXML container. The <script> element is not allowed below the VoiceXML <form> element.

    ECMAScript functions and variables can be accessed by voice handlers by declaring <script> and <var> elements below the VoiceXML container's <vxml> root element. However, global variables are initialized each time a voice handler is activated, which prevents data from being shared between voice handlers in the same container.

    2.8 VoiceXML Container for X+V

    Module vxml-root-1.xsd declares the root <vxml> document element and child elements <meta and <metadata>. The <meta> and <metadata< elements may be also be declared below the <grammar> and <prompt> elements.

    2.8.1 Document Conformance

    A conforming VoiceXML container for an X+V document is a document that requires only the facilities described as mandatory in this specification. Such a document must meet all of the following criteria:

    1. It must validate against the XML Schema for the VoiceXML container found in the appendix of this document.

    2. The root element of the document must be vxml.

    3. The name of the default namespace on the root element must be the VoiceXML 2.0 namespace name: http://www.w3.org/2001/vxml.

    4. There must be an XML processing instruction that identifies the forms contained in the VoiceXML document as X+V 1.2 voice handlers:

      <?xv version="1.2"?>

    2.8.2 User Agent Conformance

    The user agent must conform to the "User Agent Conformance" section of the VoiceXML 2.0 specification [VoiceXML 2.0], Appendix F, and the conformance requirements detailed in the VoiceXML modules supported by the mobile profile.

    The user agent must conform to the following additional user agent rule:

    1. When the user agent claims to support facilities defined within the VoiceXML 2.0 specifications or facilities required by this specification through normative reference, it must do so in ways consistent with the facilities' definition. Support for the voice modules may be provided directly on the device, or by accessing a network-based voice service.

    2.8.3 VoiceXML Namespace Integration

    The default XML namespace of a VoiceXML container document is VoiceXML. Mobile X+V extends VoiceXML with a small set of attribute extensions. The X+V extension attributes are included through an additional namespace declaration:

    <vxml xmlns="http://www.w3.org/2001/vxml"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
          xsi:schemaLocation="http://www.w3.org/2001/vxml
        http://www.voicexml.org/specs/multimodal/x+v/mobile/12/schema/vxml.xsd"
          xmlns:xv="http://www.voicexml.org/2002/xhtml+voice">

    The name of the unique prefix identifier for each namespace within the document, for example, xv for the X+V attributes, is left to the document author's discretion.

    3 XHTML Modularization

    3.1 Document Conformance

    A conforming Mobile X+V document is a document that requires only the facilities described as mandatory in this specification. Such a document must meet all of the following criteria:

    1. It must validate against the XML Schema found in the appendix of this document.

    2. The root element of the document must be html.

    3. The name of the default namespace on the root element must be the XHTML namespace name: http://www.w3.org/1999/xhtml.

    3.2 User Agent Conformance

    The user agent must conform to the "User Agent Conformance" section of the XHTML Basic specification [XHTML Basic], section 3.2.

    The user agent must conform to the following additional user agent rule:

    1. When the user agent claims to support facilities required by this specification through normative reference, it must do so in ways consistent with the facilities' definition.

    3.3 XHTML Namespace Integration

    The default XML namespace of a Mobile X+V document is XHTML. Mobile X+V extends XHTML with XML-events and X+V extensions. The XML-events and X+V extension elements and attributes are included through additional namespace declarations:

    <html xmlns="http://www.w3.org/1999/xhtml"
          xmlns:ev="http://www.w3.org/2001/xml-events"
          xmlns:xv="http://www.voicexml.org/2002/xhtml+voice">

    The name of the unique prefix identifier for each namespace within the document, for example, ev for XML Events elements and attributes, is left to the document author's discretion.

    3.4 Mobile X+V

    The XHTML functionality in the Mobile X+V document type is based upon the XHTML modules defined in [XHTML Modularization]. The Mobile X+V profile includes the XHTML modules defined in [XHTML Basic], such as the basic XHTML forms and tables modules. Added to the XHTML Basic modules are the following modules:

    • The XHTML scripting module.
    • XML Events as defined by the XML Events module, [XML Events]. XML-events with VoiceXML event types and handlers allow the XHTML author to associate voice-interaction specific behaviors.
    • An X+V Extension module for facilitating the authoring of the interaction between the visual and speech modules.

    The notation, terms and document conventions used here are borrowed from [XHTML 1.1].

    The profile includes the XHTML basic module defined in [XHTML Basic], the XHTML scripting module defined in [XHTML 1.1], the XML Event module defined in [XML Events], the X+V extension module, and the VoiceXML 2.0 modules as defined by the VoiceXML Profile.

    4 XML-Events Module

    4.1 Listener

    X+V extends XHTML with the XML-Events <listener> element and its attributes. The <listener> attributes are added to XHTML elements primarily for activating voice handlers. The <listener> element and attributes belong to the XML-Events namespace:

    xmlns:ev="http://www.w3.org/2001/xml-events"
    

    4.2 Event Types

    For a given XML language extended with XML Events, a set of event types must be specified independently of the [XML Events] module. The XML Event types supported by the Mobile X+V profile include all event types defined for [HTML 4.01] intrinsic events. VoiceXML handler activation is specified by including with an XHTML element one of these event types as an XML event and an ID reference to the VoiceXML form as an XML event handler.

    The Mobile X+V profile supports the following VoiceXML 2.0 event types: nomatch, noinput, error, and help. These event types are emitted to the XHTML container as the following X+V event types: vxmlnomatch, vxmlnoinput, vxmlerror, and vxmlhelp, respectively. The VoiceXML exit and cancel event types are supported within the VoiceXML form but are not propagated to the visual browser. Event types defined by the author within VoiceXML, also known as application-defined event types, are also propagated to the visual browser. However, the VoiceXML <form> element does not support adding the XML-Events attributes.

    An additional X+V event type, vxmldone, is supported. The vxmldone event is generated when the voice handler completes without an error and the VoiceXML <return> element is not run. However, the <return> element can explicitly return the vxmldone event.

    The Mobile X+V profile extends the XHTML <script> element with XML Events. The <script> element doesn't generate any events of its own, so the observer attribute is required to specify observing an XML event on another node in the XHTML tree. The <script> element can observe any HTML 4.01 intrinsic event or VoiceXML event. Here is an example of how a <script> element can be a handler for a vxmldone event. The value of XHTML input "drink" is updated when the voice handler "fid" completes:

    <?xml version="1.0"?>
    <html xmlns="www.w3.org/1999/xhtml"
          xmlns:ev="http://www.w3.org/2001/xml-events"
          xmlns:vxml="http://www.w3.org/2001/vxml"
          xmlns:xv="http://www.voicexml.org/2002/xhtml+voice" >
      <head><title>Script Event Handler</title>
    
        <script type="text/javascript">
          var isDocumentLoaded = false;
        </script>
    
        <script type="text/javascript" 
          ev:event="vxmldone" ev:observer="drink" declare="declare">
          if (isDocumentLoaded) {
            document.xform.drink.value = application.lastresult$[0].utterance;
          }
    
          isDocumentLoaded = true;
        </script>
    
      </head>
      <body>
        <form name="xform" action="cgi/submit">
          <input type="text" id="drink"
            ev:event="focus" ev:handler="scripthandler.vxml#fid"/>
        </form>
      </body>
    </html>
    
    Mobile X+V Script Handler Example

    <?xml version="1.0" encoding="UTF-8"?>
    <?xv version="1.2"?>
    <vxml xmlns="http://www.w3.org/2001/vxml" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xsi:schemaLocation="http://www.w3.org/2001/vxml
        http://www.voicexml.org/specs/multimodal/x+v/mobile/12/schema/vxml.xsd"
      xmlns:xv="http://www.voicexml.org/2002/xhtml+voice"
      version="2.0">
    
      <form id="fid">
        <field name="f1">
          <grammar src="drink.grxml"/>
          <prompt>Coffee, tea, or milk?</prompt>
        </field>
      </form>
    
    </vxml>
    
    VoiceXML 2.0 (scripthandler.vxml)

    Note that the "declare" attribute is necessary to prevent the <script> contents from being evaluated on document load. For backwards compatibility with older web browsers a JavaScript variable, such as isDocumentLoaded in the above example, should be used to prevent the document load evaluation.

    The following table matches the X+V event types with the XHTML or VoiceXML elements that support them. When the <listener> event attribute is added to an XHTML element, it must specify a event type supported by the element in the right-hand column. Because the HTML 4.01 event types have been translated into XML-event types, the "on" prefix for these event types have been removed.

    Table 2: X+V Event Types
    Elements Event Type
    XHTML body load, unload
    Most XHTML elements click, dblclick, mousedown, mouseup, mouseover, mouseout, keypress, keydown, keyup
    XHTML elements: a, label, input, select, textarea focus, blur
    XHTML form submit, reset
    XHTML elements: input, textarea select
    XHTML elements: input, select, textarea change
    VoiceXML form vxmlnomatch, vxmlnoinput, vxmlerror, vxmlhelp, vxmldone, "application defined"

    4.3 X+V Event Propagation

    All XML-events in a mobile X+V document propagate within the XHTML tree. The following diagram shows the capture and bubbling phases of an XML-Event as it travels from the HTML root tag to the target input tag, and from the target input tag back to the HTML root. The flow of events shown could be in response to the user clicking on a text input control.

    Flow of XML-Events in X+V
    Event flow in X+V

    For an XML-event emitted from a VoiceXML form, the target of the event is the XHTML element that activated the VoiceXML form. This allows a listener to observe a VoiceXML event, such as vxmldone, on the XHTML form or body element, where the target is an input element contained by the form. Here is an example that shows how the vxmldone event can be observed on the XHTML body element.

    <?xml version="1.0" encoding="iso-8859-1"?>
    <html xmlns="http://www.w3.org/1999/xhtml"
          xmlns:ev="http://www.w3.org/2001/xml-events"
          xmlns:vxml="http://www.w3.org/2001/vxml"
          xmlns:xv="http://www.voicexml.org/2002/xhtml+voice">
    <head><title>Flight Information</title>
    
    <script>
       var isDocumentLoaded = false;
    </script>
    
    <script id="selectFlight" ev:event="vxmldone" ev:observer="bodyelm" declare="declare">
       if (isDocumentLoaded) {
          alert("You entered "+document.getElementById('in1').value);
       }
    
       isDocumentLoaded = true;
    </script>
    
    <xv:sync input="in1" field="vxmldone.vxml#flightno" html-form-id="formelm"/>
    
    </head>
    <body id="bodyelm" ev:event="load" ev:handler="flightno.vxml#flightform">
      <form id="formelm" action="">
          <label id="flt-lbl">Please choose a flight:<br/>
            <input type="text" name="in1" id="in1'/>
        </label>
      </form>
    </body>
    </html>
    
    Mobile X+V "vxmldone" Example

    <?xml version="1.0" encoding="UTF-8"?>
    <?xv version="1.2"?>
    <vxml xmlns="http://www.w3.org/2001/vxml" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xsi:schemaLocation="http://www.w3.org/2001/vxml
        http://www.voicexml.org/specs/multimodal/x+v/mobile/12/schema/vxml.xsd"
      xmlns:xv="http://www.voicexml.org/2002/xhtml+voice"
      version="2.0">
    
    <form id="flightform">
       <field name="flightno" xv:id="flightno">
        <prompt src="#flt-lbl"/>
        <grammar src="flightinfo.grxml" type="application/srgs+xml"/>  
        <catch event="help nomatch noinput">
           Say United 480, for example.
        </catch>
      </field>
    </form>
    
    </vxml>
    
    VoiceXML 2.0 (flightno.vxml)

    5 X+V Extension Module

    The X+V Extension module includes the <sync> element, <cancel> element, the src and expr attributes of the VoiceXML <prompt> element, and the id attribute of the VoiceXML field element. The element and attributes in this module belong to their own namespace:

    xmlns:xv="http://www.voicexml.org/2002/xhtml+voice"
    

    5.1 Sync

    The X+V <sync> element adds support for synchronization of data entered via either speech or visual input. It synchronizes the value property of an XHTML input control with a VoiceXML field, as follows:

    1. Speech dialog results are returned to both the VoiceXML field and the XHTML <input> element. Speech results are returned to the XHTML <input> element after all filled processing has been performed.
    2. Keyboard data entered into the <input> element updates both the VoiceXML field and the XHTML <input> element.
    3. Keyboard data entered into the <input> element satisfies the guard condition on the VoiceXML field.
    4. For an active VoiceXML form with multiple fields, if the user gives focus to the input field, the FIA is instructed to visit the referenced VoiceXML field as the next item. This includes the mixed initiative case.

    Sync does not activate a voice handler. If the <sync> element has specified an XHTML input control but no VoiceXML form is currently active, nothing will happen. If an event and event handler are also specified, then when the user clicks on the input control the VoiceXML form is activated and the guard conditions of the VoiceXML form items are cleared. The XHTML input control is not cleared if data is already there.

    Only changes made while a VoiceXML form is active are synchronized. An existing XHTML input value does not update the synchronized VoiceXML <field> when the VoiceXML form is activated.

    The <sync> element attributes are:

    Table 3: <sync> attributes
    input The name of an XHTML input control.
    field A URI reference to a field ID within a VoiceXML form.
    html-form-id A reference to the ID of the XHTML form enclosing the input field.

    All <sync> element attributes are required. The type of the input attribute is NMTOKEN. The type of the field attribute is URI. The URI must include a fragment identifier that references a VoiceXML <field> ID. If the <field> element is in an external file, then the fragment identifier is appended to the URI. The type of the html-form-id attribute is IDREF.

    5.2 Cancel

    The X+V <cancel> element allows a document author to cancel a running speech dialog. It is a stand-alone element with no content that can be referenced as an XML Events event handler.

    The <cancel> element has one attribute:

    Table 4: <cancel> attributes
    id Unique document identifier.

    The id attribute is required. The type of the id attribute is ID.

    <?xml version="1.0" encoding="iso-8859-1"?>
    <html xmlns="http://www.w3.org/1999/xhtml"
          xmlns:ev="http://www.w3.org/2001/xml-events"
          xmlns:vxml="http://www.w3.org/2001/vxml"
          xmlns:xv="http://www.voicexml.org/2002/xhtml+voice">
    <head><title>Cancel Example</title>
    
      <xv:sync field="example.vxml#fld1" input="in1" html-form-id="xform"/>
      <xv:sync field="example.vxml#fld2" input="in2" html-form-id="xform"/>
    
      <xv:cancel id="can1"/>
    
    </head>
      <body id="bd1" ev:event="load" ev:handler="example.vxml#form1">
        <form id="xform" action=".">
          <input type="text" name="in1"/>&nbsp;&nbsp;
          <input type="text" name="in2"/><br/>
          <input type="reset" name="reset" ev:event="click" ev:handler="example.vxml#form1"/><br/>
          <input type="button" name="cancel" value="Cancel Voice" ev:event="click" ev:handler="#can1">
      </form>
      </body>
    </html>
    

    The example above shows how <cancel> can be used to cancel either a specific speech dialog or the currently running speech dialog. The reset button in the example cancels the speech dialog identified by "fid1." The "Cancel Voice" button cancels the currently running dialog because the handler attribute is omitted from the <cancel> element that is activated when the button is clicked.

    5.3 VoiceXML Field ID Attribute

    X+V adds an optional id attribute to the VoiceXML <field> element. The id attribute is used by the <sync> element's field attribute to uniquely specify a VoiceXML <field> element.

    5.4 VoiceXML Prompt SRC and EXPR Attributes

    X+V extends the VoiceXML <prompt> element with two attributes, src and expr. The src attribute allows for the specification of a text source for speech output in an external document. In addition, the source may be [SSML 1.0] or text styled according to the aural styling rules defined in [CSS2]. For example, a style sheet may have the following styling rules for the XHTML <p> element:

    P.romeo { voice-family: male; volume: loud; pause-before: 20ms; }
    P.juliet { voice-family: female; volume: soft; }
    

    A voice handler can play two prompts from two different text sources in the document, as follows:

    <?xml version="1.0"?>
    <html xmlns="http://www.w3.org/1999/xhtml" 
          xmlns:ev="http://www.w3.org/2001/xml-events"
          xmlns:xv="http://www.voicexml.org/2002/xhtml+voice">
      <head>
        <title>Prompt src Example</title>
      </head>
    
      <body ev:event="load" ev:handler="promptsrc.vxml#sayHello">
        <p id="hello_romeo" class="juliet">
          Romeo, Romeo, where art thou?
        </p>
        <p id="hello_juliet" class="romeo">
          I am here.
        </p>
      </body>
    </html>
    Mobile X+V Prompt "src" Example

    <?xml version="1.0" encoding="UTF-8"?>
    <?xv version="1.2"?>
    <vxml xmlns="http://www.w3.org/2001/vxml" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xsi:schemaLocation="http://www.w3.org/2001/vxml
        http://www.voicexml.org/specs/multimodal/x+v/mobile/12/schema/vxml.xsd"
      xmlns:xv="http://www.voicexml.org/2002/xhtml+voice"
       version="2.0">
      <form id="sayHello">
        <block><prompt xv:src="promptsrc.html#hello_romeo"/>
               <prompt xv:src="promptsrc.html#hello_juliet"/>
        </block>
      </form>
    </vxml>
    VoiceXML 2.0 (promptsrc.vxml)

    The first prompt plays a soft female voice. The second prompt plays a loud male voice after a 20 ms pause.

    The expr attribute allows for the text source to be determined dynamically. The value of the expr attribute is an expression that evaluates to a URI with a fragment identifier. Both the URI referenced by src and the URI resolved by expr include a fragment identifier that references the id attribute of an XML element containing text for the prompt. The type of the src attribute is URI and the type of the expr attribute is CDATA. Exactly one of src or expr may be specified; if both are specified an error.badfetch event is thrown.

    A voice handler can play a prompt from a text source determined dynamically by the expr attribute expression, as follows:

    <?xml version="1.0"?>
    <html xmlns="http://www.w3.org/1999/xhtml" 
          xmlns:ev="http://www.w3.org/2001/xml-events"
          xmlns:xv="http://www.voicexml.org/2002/xhtml+voice">
      <head>
        <title>Prompt expr Example</title>
      </head>
    
      <body ev:event="load" ev:handler="promptexpr.vxml#sayHello">
        <p id="hello_romeo" class="juliet">
          Romeo, Romeo, where art thou?
        </p>
        <p id="hello_juliet" class="romeo">
          I am here.
        </p>
      </body>
    </html>
    Mobile X+V Prompt "expr" Example

    <?xml version="1.0" encoding="UTF-8"?>
    <?xv version="1.2"?>
    <vxml xmlns="http://www.w3.org/2001/vxml" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xsi:schemaLocation="http://www.w3.org/2001/vxml
        http://www.voicexml.org/specs/multimodal/x+v/mobile/12/schema/vxml.xsd"
      xmlns:xv="http://www.voicexml.org/2002/xhtml+voice"
      version="2.0">
    
    <form id="sayHello">
      <var name="count" expr="0"/>
      <block name="block_1">
         <prompt xv:expr="count == 0 ?
           'promptexpr.html#hello_romeo' : 'promptexpr.html#hello_juliet'"/>
         <assign name="count" expr="count+1"/>
         <assign name="block_1" expr="undefined"/>
      </block>
    </form>
    
    </vxml>
    VoiceXML 2.0 (promptexpr.vxml)

    5.4.1 Styling External Prompt Resources

    If the prompt resource is in an external file, the rules for styling the resource apply only to the retrieved XML element and its children. For example, the style attribute on the XML element should be honored, while style rules inherited from its parent elements in the external document can be ignored. It is also the author's responsibility to reference the style sheets used to style the external resource in the originating document. Style sheet references in the external document can be ignored.

    5.4.2 Dynamic Prompt Generation in XHTML

    An X+V Mobile Profile user agent does not need to keep the document dynamically updated. If the document is updated after it is loaded, for example by modifying the DOM through dynamic HTML, the text referenced by a VoiceXML <prompt> src attribute may not be updated.

    5.4.3 Invalid Prompt Resource

    If the prompt resource cannot be played (e.g., 'src' referencing or 'expr' evaluating to an invalid URI), the content of the <prompt> element is played instead. If the prompt resource cannot be played and the content of the <prompt> element is empty, the prompt is not played and no error event is thrown. This behavior follows the specification of the VoiceXML 2.0 <audio> element.

    5.4.4 Prompt Resource Fetching Properties

    The VoiceXML 2.0 attributes that govern fetching the content associated with a URI also apply to the fetching of <prompt> text. While Mobile X+V does not add the fetching attributes to the <prompt> element, the fetch of the src URI, or URI resolved from evaluating expr, is governed by the VoiceXML 2.0 "documentfetchhint," "fetchtimeout," "documentmaxage," and "documentmaxstale" properties. Section 6.1 of the VoiceXML 2.0 specification provides details.



    Appendices

    A Examples

    A.1 What You See Is What You Can Say

    <?xml version="1.0"?>
    <html xmlns="http://www.w3.org/1999/xhtml" 
          xmlns:ev="http://www.w3.org/2001/xml-events"
          xmlns:xv="http://www.voicexml.org/2002/xhtml+voice">
      <head>
        <title>What You See Is What You Can Say</title>
    
          <!-- declare inputs synchronized with VoiceXML fields -->
          <xv:sync input="city"
            field="example01.vxml#field_city" html-form-id="hotel_query"/>
          <xv:sync input="hotel"
            field="example01.vxml#field_hotel" html-form-id="hotel_query/>
      </head>
      <body ev:event="load" ev:handler="./example01.vxml#voice_city_hotel">
        <h1>What You See Is What You Can Say</h1>
    
        <p>This example permits the user to enter data using
          either the GUI or voice.
        </p>
        <h2>Hotel Picker</h2>
    
        <form id="hotel_query" method="post" action="cgi/hotel.pl">
          <label id="city_label">Please enter city:<br/>
            <input name="city" type="text"/>
          </label>
          
          <label id="hotel_label">Please enter hotel:<br/>
            <input name="hotel" type="text"/>
          </label>
    
          <input type="submit" value="Submit"/>
          <input type="reset" value="Reset"
            ev:event="click" xv:handler="example01.vxml#voice_city_hotel"/>
        </form>
      </body>
    </html>
    
    Example01.html

    <?xml version="1.0" encoding="UTF-8"?>
    <?xv version="1.2"?>
    <vxml xmlns="http://www.w3.org/2001/vxml" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xsi:schemaLocation="http://www.w3.org/2001/vxml
        http://www.voicexml.org/specs/multimodal/x+v/mobile/12/schema/vxml.xsd"
      xmlns:xv="http://www.voicexml.org/2002/xhtml+voice"
      version="2.0">
    
      <!-- voice handlers -->   
      <form id="voice_city_hotel">
        <field xv:id="field_city" name="field_city">
          <grammar src="city.grxml" type="application/srgs+xml"/>
          <prompt xv:src="example01.html#city_label"/>
          <catch event="help nomatch noinput">
            For example, say Chicago.
          </catch>
        </field>
    
        <field xv:id="field_hotel" name="field_hotel">
          <grammar src="hotel.grxml" type="application/srgs+xml"/>
          <prompt xv:src="example01.html#hotel_label"/>
          <catch event="help nomatch noinput">
            For example, say Hilton.
          </catch>
          <filled>
            <prompt>
    	  You selected <value expr="field_hotel"/>.
            </prompt>
          </filled>
        </field>
      </form>
    
    </vxml>
    
    Example01.vxml

    A.2 Mixed-initiative Conversational Interface

    <?xml version="1.0"?>
    <html xmlns="http://www.w3.org/1999/xhtml" 
          xmlns:ev="http://www.w3.org/2001/xml-events"
          xmlns:xv="http://www.voicexml.org/2002/xhtml+voice">
      <head>
        <title>Mixed Initiative Conversational Interface</title>
    
          <!-- declare inputs synchronized with VoiceXML fields -->
          <xv:sync input="city" field="example02.vxml#field_city"
            html-form-id="visual_city_hotel"/>
          <xv:sync input="hotel" field="example02.vxml#field_hotel"
            html-form-id="visual_city_hotel"/>
      </head>
      <body>
        <h1>Mixed-Initiative Conversational Interface</h1>
    
        <p>Here is a mixed-initiative dialog.  The user can either specify
           both hotel and city in a single utterance, or can fill one
           field at a time.
        </p>
        
        <h2>Hotel Picker</h2>
        <p>This voice-enabled application lets you pick a city and a hotel.
        </p>
        <form id="visual_city_hotel" method="post" action="cgi/hotel.pl"
          ev:event="click" ev:handler="example02.vxml#voice_city_hotel" >
          <p id="please_choose">
            Please choose a city and hotel where you wish to stay.
          </p>
    
          <!-- input name attrib required except for type "text" -->
          <input name="city" type="text"/>
          <input name="hotel" type="text"/>
    
          <input type="submit" value="Submit" />
          <input type="reset" value="Reset"
            ev:event="click" ev:handler="example02.vxml#voice_city_hotel"/>
        </form>
      </body>
    </html>
    
    Example02.html

    <?xml version="1.0" encoding="UTF-8"?>
    <?xv version="1.2"?>
    <vxml xmlns="http://www.w3.org/2001/vxml" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xsi:schemaLocation="http://www.w3.org/2001/vxml
        http://www.voicexml.org/specs/multimodal/x+v/mobile/12/schema/vxml.xsd"
      xmlns:xv="http://www.voicexml.org/2002/xhtml+voice"
      version="2.0">
    
      <!-- VoiceXML form supporting a mixed-initiative grammar -->
      <form id="voice_city_hotel">
        <grammar src="city_hotel.gram" type="application/srgs+xml"/>
         
        <!-- Mixed initiative form begins with initial prompt -->
        <initial name="start">
          <prompt>Please say a city and a hotel.</prompt>
          <help>
            Please say the name of a city and a hotel to make a reservation.
          </help>
          <!-- If user is silent, reprompt once, then try directed prompts -->
          <noinput count="1"><reprompt/>
          </noinput>
          <noinput count="2">
            <reprompt/>
            <assign name="start" expr="true"/>
          </noinput>
        </initial>
    
        <field xv:id="field_city" name="field_city">
          <grammar src="city.grxml" type="application/srgs+xml"/>
          <prompt>Please say a city.</prompt>
          <catch event="help nomatch noinput">
            For example, say Chicago.
          </catch>
        </field>
          
        <field xv:id="field_hotel" name="field_hotel">
          <grammar src="hotel.grxml" type="application/srgs+xml"/>
          <prompt>Please say a hotel.</prompt>
          <catch event="help nomatch noinput">
            For example say Hilton.
          </catch>      
          <filled>
            <prompt>
              You selected <value expr="field_hotel"/>.
    	</prompt>
          </filled>
        </field>
      </form>
    
    </vxml>
    
    Example02.vxml

    B DTD

    This section defines the DTD used to define the Mobile X+V 1.2 profile.

    B.1 Mobile X+V 1.2 DTD

    The individual modules making up the DTD for the Mobile X+V 1.2 profile along with the top-level driver file mobile-x+v12.dtd are packaged together and available at mobile-xv12-DTD.zip.

    Packaged within the same file are also the individual modules making up the DTD for the VoiceXML container for Mobile X+V voice handlers along with the top-level driver file vxml-container.dtd. Although the use of the DTD in place of the SCHEMA requires the elements and attributes specified by both [Speech Grammars] and [SSML 1.0] to be placed within their respective namespaces, this DTD informally puts these elements in the VoiceXML 2.0 namespace.

    C Schema

    This section is normative.

    This section defines the formal XML Schema used to define the Mobile X+V 1.2 profile and the VoiceXML container for Mobile X+V voice handlers.

    C.1 Mobile X+V 1.2 Schema

    The individual modules making up the SCHEMA for the Mobile X+V 1.2 profile along with the top-level driver file mobile-x+v12.xsd are packaged together and available at mobile-xv12-SCHEMA.zip.

    C.2 VoiceXML Container for Mobile X+V

    The individual modules making up the SCHEMA for the VoiceXML Container for Mobile X+V 1.2 along with the top-level driver file vxml-container.xsd are packaged together and available at vxml-container-SCHEMA.zip.

    D Sync Grammars for XHTML Controls

    A VoiceXML field is filled when the user's utterance matches a word or phrase in the field's grammar. The grammar, along with [Semantic Interpretation], determines how the VoiceXML field is filled and can also be used to determine how a field's contents updates an arbitrary XHTML control, or group of controls, using <sync>. The following example [Speech Grammars] can be used with the <sync> element for synchronizing the various HTML control types such as radio button and radio group, checkbox and checkbox group, hidden, password, file, text, textarea, select-one, select-multiple, submit, reset, and button.

    Here is an example of a grammar for a single selection list (i.e., <select>) and a radio group (i.e., multiple HTML inputs of type "radio" with the same name). The radio button that has a value that matches the value contained in the filled VoiceXML field is set.

    <rule id="crust" scope="public">
      <one-of>
        <item>thin</item>
        <item>medium</item>
        <item>thick</item>
        <item>chicago <item repeat="0-1">style</item></item>
        <item>cheese</item>
      </one-of>
    </rule>

    Here is an example of a grammar for a multiple selection list (i.e., <select multiple="multiple">) and a checkbox group (i.e., multiple HTML inputs of type "checkbox" with the same name). Each selected item is pushed onto an array. The filled VoiceXML field is an array containing the selected items. Each checkbox or selection that has a value matching a value contained in the array stored in the VoiceXML field is set.

    <rule id="meat_toppings" scope="public">
      <ruleref special="NULL"/><tag><![CDATA[$= new Array;]]></tag>
      <item repeat="1-">
           <ruleref uri="#meats"/>
           <item repeat="0-1">and</item>
           <tag><![CDATA[$.push($meats)]]></tag>
      </item>
    <rule id="meats">
      <one-of>
        <item>bacon</item>
        <item>chicken</item>
        <item>ham</item>
        <item>meatball</item>
        <item>sausage</item>
        <item>pepperoni</item>
      </one-of>
    </rule>

    Here is an example of a grammar for a single radio button, checkbox, or button (button includes the submit and reset buttons). For the radio button or checkbox, the "checked" attribute is toggled according to the semantic interpretation tag contained in the filled VoiceXML field. For the button input type, a semantic interpretation value of "true" causes the button to be clicked.

    <rule id="pizza_extra" scope="public">
      <one-of>
        <item>no<tag><![CDATA[$=false]]></tag></item>
        <item>nope<tag><![CDATA[$=false]]></tag></item>
        <item>next<tag><![CDATA[$=false]]></tag></item>
        <item>yes<tag><![CDATA[$=true]]></tag></item>
        <item>yep<tag><![CDATA[$=true]]></tag></item>
      </one-of>
    </rule>

    The grammar for the text, textarea, password, hidden, and file input types do not require any semantic interpretation. These controls are filled with any arbitrary grammar specified for the synchronized VoiceXML field. The value attribute of these input types is set to the contents of the filled VoiceXML field.

    The user should always have the option of saying "none" or "next" to decline updating the HTML control. This can be supported by adding a grammar to the VoiceXML field which is outside of one of the example grammars used for that field. Here is an example of a grammar, added to the grammar for a multiple selection list, that allows the user to say "none" or "skip":

    <grammar root="meat_toppings">
      <rule id="meat_toppings" scope="public">
        <ruleref special="NULL"/><tag><![CDATA[$= new Array;]]></tag>
        <item repeat="1-">
           <ruleref uri="#meats"/>
           <item repeat="0-1">and</item>
           <tag><![CDATA[$.push($meats)]]></tag>
        </item>
      <rule id="meats">
        <one-of>
          <item>bacon</item>
          <item>chicken</item>
          <item>ham</item>
          <item>meatball</item>
          <item>sausage</item>
          <item>pepperoni</item>
        </one-of>
      </rule>
    </grammar>
    <grammar root="no_sel">
      <rule id="no_sel">
        <item>none</item>
        <item>no</item>
        <item>next</item>
        <item>skip</item>
      </rule>
    </grammar>

    E References

    E.1 Normative References

    XHTML Basic
    XHTML Basic , 19 December 2000, Mark Baker, Masayasu Ishikawa, Shinichi Matsui, Peter Stark, Ted Wugofski, Toshihiko Yamakami
    CSS2
    Cascading Style Sheets, level 2 (CSS2) Specification, Bert Bos, Håkon Wium Lie, Chris Lilley, Ian Jacobs, 1998. W3C Recommendation available at: http://www.w3.org/TR/REC-CSS2/.
    DOM2 Events
    Document Object Model (DOM) Level 2 Events Specification, Tom Pixley, 2000. W3C Recommendation available at: http://www.w3.org/TR/DOM-Level-2-Events/.
    HTML 4.01
    HTML 4.01 Specification, Dave Raggett, Arnaud le Hors, Ian Jacobs, 1999. W3C Recommendation available at: http://www.w3.org/TR/html4/.
    RFC 2396
    RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax., Tim Berners-Lee, et. al., 1998. Available at: http://www.ietf.org/rfc/rfc2396.txt.
    Semantic Interpretation
    Semantic Interpretation for Speech Recognition , Luc Van Tichelen. W3C Working Draft, April, 2003 available at: http://www.w3.org/TR/semantic-interpretation/.
    Speech Grammars
    Speech Recognition Grammar Specification Version 1.0, Andrew Hunt and Scott McGlashan. W3C Recommendation, March, 2004 available at: http://www.w3.org/TR/speech-grammar/.
    SSML 1.0
    Speech Synthesis Markup Language Specification, Mark Walker, Dan Burnett, and Andrew Hunt. W3C Candidate Working Draft, December, 2003 available at: http://www.w3.org/TR/speech-synthesis/.
    VoiceXML 2.0
    Voice Extensible Markup Language (VoiceXML) , Scott McGlashan et al, W3C Recommendation, March, 2004 available at: http://www.w3.org/tr/voicexml20/.
    XHTML 1.0
    XHTML 1.0: The Extensible HyperText Markup Language - A Reformulation of HTML 4 in XML 1.0, Steven Pemberton, et. al, 2000. W3C Recommendation available at: http://www.w3.org/TR/xhtml1/.
    XHTML 1.1
    XHTML 1.1 - Module-based XHTML Murray Altheim, Shane McCarron available at: http://www.w3.org/TR/xhtml11/.
    XHTML+Voice 1.0
    XHTML+Voice Profile 1.0, Jonny Axelsson, et. al., December 21, 2001. Available at: http://www.w3.org/TR/xhtml+voice.
    XHTML+Voice 1.1
    XHTML+Voice Profile 1.1, Jonny Axelsson, et. al., January 28, 2003. Available at: http://www-3.ibm.com/software/pervasive/multimodal/x+v/11/spec.htm.
    XHTML+Voice 1.2
    XHTML+Voice Profile 1.2, Jonny Axelsson, et. al., March 16, 2004. Available at: http://www.voicexml.org/specs/multimodal/x+v/12/.
    XHTML Modularization
    Modularization of XHTML Murray Altheim, Frank Boumphrey, et. al., available at: http://www.w3.org/TR/xhtml-modularization/.
    XML 1.0
    Extensible Markup Language (XML) 1.0 (Second Edition), Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, 2000. W3C Recommendation: available at: http://www.w3.org/TR/REC-xml.
    XML Events
    XML Events - An events syntax for XML, Steven Pemberton, T. V. Raman and Shane P McCarron, October 14, 2003. W3C Recommendation available at: http://www.w3.org/TR/xml-events/.
    XML Names
    Namespaces in XML, Tim Bray, Dave Hollander, Andrew Layman, 1999. W3C Recommendation available at: http://www.w3.org/TR/REC-xml-names/.
    XSchema-1
    XML Schema Part 1: Structures, Henry S. Thompson, David Beech, Murray Maloney, Noah Mendelsohn, 2001. W3C Recommendation available at: http://www.w3.org/TR/xmlschema-1/.
    XSchema-2
    XML Schema Part 2: Datatypes, Paul V. Biron, Ashok Malhotra, 2001. W3C Recommendation available at: http://www.w3.org/TR/xmlschema-2/.

    E.2 Informative References

    ECMA 262
    ECMA-262: ECMAScript Language Specification, European Computer Manufacturers' Association (ECMA), 1999. Available at ftp://ftp.ecma.ch/ecma-st/Ecma-262.pdf.
    RFC 2141
    RFC 2141: URN Syntax, R. Moats, 1997. Available at: http://www.ietf.org/rfc/rfc2141.txt.
    XForms
    XForms 1.0 , Micah Dubinko, Josef Dietl, Roland Merrick,Dave Raggett, T. V. Raman, Linda Bucsay Welsh 2001. W3C Candidate Recommendation available at: http://www.w3.org/TR/xforms/.
    XSchema-0
    XML Schema Part 0: Primer, David C. Fallside, 2001. W3C Recommendation available at: http://www.w3.org/TR/xmlschema-0/.
    XSLT
    XSL Transformations (XSLT) Version 1.0, James Clark, 1999. W3C Recommendation available at: http://www.w3.org/TR/xslt.