XHTML+Voice 1.2 Specification Errata


Abstract

This document records all known errors in the XHTML+Voice 1.2 Specification. The errata are listed in reverse chronological order of their date of publication.

Please email error reports to mccobb@us.ibm.com.

Table of Contents

1 Errata as of 2005-05-11
    1.1 Aural Style Sheets
    1.2 Normative References
    1.3 Informative References
2 Errata as of 2005-02-15
    2.1 Event Types Example
    2.2 Mixed-Initiative Conversational Interface Example
3 Errata as of 2004-07-30
    3.1 The vxmldone event
4 Errata as of 2004-07-21
    4.1 XHTML+Voice supports the <option> element's accept attribute
    4.2 SRGS Standard Grammars


1 Errata as of 2005-05-11

1.1 Aural Style Sheets

1.3.6 Aural Style Sheets

XHTML+Voice supports aural style sheets declared according to [CSS3 Speech], because the CSS3 Speech Module is compatible with [SSML 1.0]. Therefore, [CSS3 Speech] is a normative reference and [CSS2] is informative. These updates to the references sections are shown below.

1.2 Normative References

Appendix I.1 Normative References

CSS3 Speech
CSS3 Speech Module, Dave Raggett, Daniel Glazman, and Claudio Santambroglio. W3C Candidate Working Draft, 16 December 2004 available at: http://www.w3.org/TR/css3-speech/.
SSML 1.0
Speech Synthesis Markup Language Specification, Mark Walker, Dan Burnett, and Andrew Hunt. W3C Candidate Working Draft, December, 2003 available at: http://www.w3.org/TR/speech-synthesis/.

1.3 Informative References

Appendix I.2 Informative References

CSS2
Cascading Style Sheets, level 2 (CSS2) Specification, Bert Bos, Håkon Wium Lie, Chris Lilley, Ian Jacobs, 1998. W3C Recommendation available at: http://www.w3.org/TR/REC-CSS2/.

2 Errata as of 2005-02-15

2.1 Event Types Example

4.2 Event Types

The example in section 4.2 Event Types is incorrect. The value of the <script> element's ev:observer attribute should be "xform." The corrected example is shown below.

<?xml version="1.0"?>
<html xmlns="www.w3.org/1999/xhtml"
      xmlns:ev="http://www.w3.org/2001/xml-events"
      xmlns:vxml="http://www.w3.org/2001/vxml"
      xmlns:xv="http://www.voicexml.org/2002/xhtml+voice" >
  <head><title>Script Event Handler</title>

    <script type="text/javascript" 
      ev:event="vxmldone" ev:observer="xform" declare="declare">
      document.getElementById('drink').value = application.lastresult$[0].utterance;
    </script>
    <vxml:form id="fid">
      <vxml:field name="f1">
        <vxml:grammar src="drink.gram"/>
        <vxml:prompt>Coffee, tea, or milk?</vxml:prompt>
      </vxml:field>
    </vxml:form>

  </head>
  <body>
    <form id="xform" action="cgi/submit">
      <input type="text" id="drink" ev:event="focus" ev:handler="#fid"/>
    </form>
  </body>
</html>

2.2 Mixed-Initiative Conversational Interface Example

B.2 Mixed-Initiative Conversational Interface

The example in section B.2 Mixed-Initiative Conversational Interface is incorrect. The value of the XHTML form's element's ev:event attribute should be "click" because the "focus" event doesn't bubble. See DOM Level 2 Events and HTML Event Types. The corrected example is shown below.


<?xml version="1.0"?>
<html xmlns="http://www.w3.org/1999/xhtml" 
      xmlns:vxml="http://www.w3.org/2001/vxml"
      xmlns:ev="http://www.w3.org/2001/xml-events"
      xmlns:xv="http://www.voicexml.org/2002/xhtml+voice"
>
  <head>
    <title>Mixed Initiative Conversational Interface</title>

      <!-- VXML form supporting a mixed-initiative grammar -->
      <vxml:form id="voice_city_hotel">
        <vxml:grammar src="city_hotel.jsgf" type="application/x-jsgf"/>
     
        <!-- Mixed initiative form begins with initial prompt -->
        <vxml:initial name="start">
	    <vxml:prompt xv:src="#please_choose"/>
          <vxml:help>
            Please say the name of a city and a hotel to make 
            a reservation.
          </vxml:help>
          <!-- If user is silent, reprompt once, then try 
               directed prompts. -->
          <vxml:noinput count="1"><vxml:reprompt/>
          </vxml:noinput>
          <vxml:noinput count="2">
             <vxml:reprompt/>
             <vxml:assign name="start" expr="true"/>
          </vxml:noinput>
        </vxml:initial>

        <vxml:field xv:id="field_city" name="field_city">
          <vxml:grammar src="city.jsgf" type="application/x-jsgf"/>
          <vxml:prompt>Please choose a city.</vxml:prompt>
          <vxml:catch event="help nomatch noinput">
            For example, say Chicago.
          </vxml:catch>
        </vxml:field>
      
        <vxml:field xv:id="field_hotel" name="field_hotel">
          <vxml:grammar src="hotel.jsgf" type="application/x-jsgf"/>
          <vxml:prompt>Select your hotel.</vxml:prompt>
          <vxml:catch event="help nomatch noinput">
            For example say Hilton.
          </vxml:catch>      
          <vxml:filled>
              <vxml:prompt>
                You selected <vxml:value expr="field_hotel"/>.
	        </vxml:prompt>
          </vxml:filled>
        </vxml:field>
      </vxml:form>
      <!-- done voice handlers -->

      <!-- declare inputs synchronized with VoiceXML fields -->
      <xv:sync xv:input="city" xv:field="#field_city"/>
      <xv:sync xv:input="hotel" xv:field="#field_hotel"/>
      <xv:cancel id="voice_cancel" xv:voice-handler="#voice_city_hotel"/>
  </head>
  <body>
    <h1>Mixed-Initiative Conversational Interface</h1>

    <p>In this example, we demonstrate a mixed-initiative dialog.  By 
       activating a grammar capable of recognizing both cities and
       hotel names, for the entire application, the user can specify
       both hotel and city in a single utterance.  Alternatively,
       the user can fill one field at a time.
    </p>
    
    <h2>Hotel Picker</h2>
    <p>This voice-enabled application lets you pick a 
       city and a hotel.
    </p>
    <form id="visual_city_hotel" method="post" action="cgi/hotel.pl"
		ev:event="click" ev:handler="#voice_city_hotel" >
      <p id="please_choose">
      Please choose a city and hotel where you wish to stay.
      </p>

      <!-- input name attrib required except for type "text" -->
      <input name="city" type="text"/>
      <input name="hotel" type="text"/>

      <input type="submit" value="Submit" />
      <input type="reset" value="Reset"
		 ev:event="click" ev:handler="#voice_cancel"/>
    </form>
  </body>
</html>

3 Errata as of 2004-07-30

3.1 The vxmldone event

4.2 Event Types

The vxmldone event is generated when the voice handler completes without an error and the VoiceXML <return> element is not run. However, the <return> element can explicitly return the vxmldone event. Note that the voice handler would return vxmldone implicitly if the <return> element was removed from the example.

<?xml version="1.0"?>
<html xmlns="www.w3.org/1999/xhtml"
      xmlns:ev="http://www.w3.org/2001/xml-events"
      xmlns:vxml="http://www.w3.org/2001/vxml"
      xmlns:xv="http://www.voicexml.org/2002/xhtml+voice" >
  <head><title>Event Handler</title>

    <vxml:form id="fid">
      <vxml:field name="f1">
        <vxml:grammar src="drink.gram"/>
        <vxml:prompt>Coffee, tea, or milk?</vxml:prompt>
      </vxml:field>
      <vxml:block>
         <vxml:return event="vxmldone"/>
      </vxml:block>
    </vxml:form>

  <body ev:event="load" ev:handler="#fid">
    <form id="xform" action="cgi/submit">
      <input type="text" id="drink" />
    </form>
  </body>
</html>

4 Errata as of 2004-07-21

4.1 XHTML+Voice supports the <option> element's accept attribute

3.5 XHTML+Voice Abstract Modules

The VoiceXML <option> element has an accept attribute. This attribute is supported by XHTML+Voice but is missing from the list of attributes for <option> in the XHTML+Voice Abstract Modules Table.

4.2 SRGS Standard Grammars

5.1.1 Standard Grammars for XHTML Controls

Standard grammars are used with the <sync> element for synchronizing a VoiceXML <field> element with an HTML control, such as a radio button or radio button group, checkbox or checkbox group, text box, text area, button, or selection box. The JSGF examples have been replaced by the following equivalent Speech Recognition Grammar Specification (SRGS 1.0) examples. SRGS 1.0 is a W3C Recommendation.

Here is an example of a grammar for a single selection list (i.e., <select>) and a radio group (i.e., multiple HTML inputs of type "radio" with the same name).

<rule id="crust" scope="public">
  <one-of>
    <item>thin</item>
    <item>medium</item>
    <item>thick</item>
    <item>chicago <item repeat="0-1">style</item></item>
    <item>cheese</item>
  </one-of>
</rule>

Here is an example of a grammar for a multiple selection list (i.e., <select multiple="multiple">) and a checkbox group (i.e., multiple HTML inputs of type "checkbox" with the same name). Each selected item is pushed onto an array. The filled VoiceXML field is an array containing the selected items.

<rule id="meat_toppings" scope="public">
  <ruleref special="NULL"/><tag><![CDATA[$= new Array;]]></tag>
  <item repeat="1-">
       <ruleref uri="#meats"/>
       <item repeat="0-1">and</item>
       <tag><![CDATA[$.push($meats)]]></tag>
  </item>
<rule id="meats">
  <one-of>
    <item>bacon</item>
    <item>chicken</item>
    <item>ham</item>
    <item>meatball</item>
    <item>sausage</item>
    <item>pepperoni</item>
  </one-of>
</rule>

Here is an example of a grammar for a single radio button, checkbox, or button (button includes the submit and reset buttons). For the radio button or checkbox, the "checked" attribute is toggled according to the semantic interpretation tag contained in the filled VoiceXML field. For the button input type, a semantic interpretation value of "true" causes the button to be clicked.

<rule id="pizza_extra" scope="public">
  <one-of>
    <item>no<tag><![CDATA[$=false]]></tag></item>
    <item>nope<tag><![CDATA[$=false]]></tag></item>
    <item>next<tag><![CDATA[$=false]]></tag></item>
    <item>yes<tag><![CDATA[$=true]]></tag></item>
    <item>yep<tag><![CDATA[$=true]]></tag></item>
  </one-of>
</rule>

The grammar for the text, textarea, password, hidden, and file input types does not require any semantic interpretation. The contents of the filled VoiceXML field is set to the value attribute of these input types. Here is an example:

<rule id="one_twenty" scope="public">
  <one-of>
    <item>1</item><item>2</item><item>3</item>
    <item>4</item><item>5</item><item>6</item>
    <item>7</item><item>8</item><item>9</item>
    <item>10</item><item>11</item><item>12</item>
    <item>13</item><item>14</item><item>15</item>
    <item>16</item><item>17</item><item>18</item>
    <item>19</item><item>20</item>
  </one-of>
</rule>

The user should always have the option of saying "none" or "next" to decline updating the HTML control. This is supported by adding a grammar to the VoiceXML field which is outside of the standard grammar used for that field. Here is an example of a grammar, added to the grammar for a multiple selection list, that allows the user to say "none" or "skip":

<grammar root="meat_toppings">
  <rule id="meat_toppings" scope="public">
    <ruleref special="NULL"/><tag><![CDATA[$= new Array;]]></tag>
    <item repeat="1-">
       <ruleref uri="#meats"/>
       <item repeat="0-1">and</item>
       <tag><![CDATA[$.push($meats)]]></tag>
    </item>
  <rule id="meats">
    <one-of>
      <item>bacon</item>
      <item>chicken</item>
      <item>ham</item>
      <item>meatball</item>
      <item>sausage</item>
      <item>pepperoni</item>
    </one-of>
  </rule>
</grammar>
<grammar root="no_sel">
  <rule id="no_sel">
    <item>none</item>
    <item>no</item>
    <item>next</item>
    <item>skip</item>
  </rule>
</grammar>

Last updated $Date: 2005/02/15 14:14:50 $ by $Author: mccobb $