VoiceXML's
History
VoiceXML
traces its lineage back to several informal
gatherings in 1995 by Dave Ladd, Chris
Ramming, Ken Rehor, and Curt Tuckey of
AT&T Research. They were brainstorming
ideas about how the Internet would affect
telephony applications, when all the pieces
fit into place: why not have a gateway
system running a voice browser interpreting
a voice dialog markup language, which
delivers web content and services to ordinary
phones? Thus began the AT&T Phone
Web project. When AT&T spun off Lucent,
a separate Phone Web project continued
on there as well. Chris remained at AT&T,
Ken went with Lucent, and Dave and Curt
moved on to Motorola.
By
early 1999 AT&T and Lucent had incompatible
dialects of the Phone Markup Language
(PML), Motorola had its new VoxML, and
other companies were also experimenting
with these ideas, in particular IBM with
SpeechML. A standard language had to be
designed to enable the voice web. The
original Phone Web people remained close
friends, so AT&T, Lucent, and Motorola
began the organization of the VoiceXML
Forum. IBM joined as a founder soon afterwards.
From March to August of 1999 a small team
of Forum technologists worked together
to produce a new language, VoiceXML 0.9,
combining the best features of the earlier
languages and pushing on into new areas,
especially DTMF (touch-tone key) support
and mixed-initiative dialogs. After 0.9
was published, there began an extensive
period of comment from the growing VoiceXML
Forum community. These comments resulted
in huge improvements to the language,
including client-side scripting, properties,
and subdialogs. VoiceXML 1.0 came out
in March 2000, and almost overnight fifteen
or twenty different implementations sprang
up.
The
following month, the VoiceXML Forum submitted
the 1.0 language to the World Wide Web
Consortium (W3C) for consideration. In
May, the W3C "accepted" VoiceXML,
an event that generated a lot of press
coverage, but which merely acknowledged
receipt of the submission. But the W3C's
Voice Browser Working Group eagerly took
on the job of the next revision.
The W3C process has taken more time than any of us
expected, but the emphasis on consensus among the many
participating companies has led to a strong standard.
The first public Working Draft of VoiceXML 2.0 was
published in October 2001, the Last Call Working Draft
came out in April 2002, and VoiceXML 2.0 became a
Candidate Recommendation in January 2003.
The changes from VoiceXML 1.0 to 2.0 were fairly
conservative. Much thought and effort went into
clarifying expected behaviors, and correcting a few
errors in the specification. Another large amount of
work was spent in developing and weaving in new
standards for speech recognition grammars and
text-to-speech markup. There were a few extensions,
such as the new element, but overall there is a
high degree of similarity between 1.0 and 2.0.
VoiceXML's
Future
The W3C is now completing the Implementation Report, part of which
consists of hundreds of interoperability tests to ensure that the
VoiceXML standard is implementable, and that different implementations
of VoiceXML can execute the same content in the same way. The VoiceXML
Forum's Conformance Committee will then round these tests out into a
complete conformance suite, which will be a powerful tool to ensure
interoperability between VoiceXML implementations.
Beginning in 2003, the W3C's Voice Browser Working Group
will start work on VoiceXML 3.0. Some suggestions that
were too large to incorporate in 2.0 will be addressed, as
well as other new extensions. Some of the improvements
being discussed are:
- Using
the proposed W3C Natural Language Semantics
Markup Language to represent recognition
results.
- Currently
the <form> ties together the notions
of input tasks and the data filled by
those input tasks. Should a new high
level task-oriented dialog construct
parallel to <form> and <menu>
be defined?
- In
some cases, the FIA does not provide
application developers close enough
control. Should a new low level procedural
dialog construct parallel to <form>
and <menu> be defined?
- Should
grammar and audio resources be defined
centrally and then referenced by "id"
attributes elsewhere?
- What
about standardized audio playback controls
for changing the speed and volume of
the audio, and for moving back and forward
in the audio stream? These would be
analogous to CD player controls.
- Should
standard speaker verification features
be added to VoiceXML for additional
security? What about enabling the generation
of speaker-trained grammars, for use
in personal address books and similar
applications?
There
will also likely be changes to VoiceXML
to support new multimodal markup standards.
The conceptually cleanest approaches to
multimodal use XHTML as a container for
mode-specific markup (XHTML for visual,
VoiceXML for voice, InkXML for ink, etc.),
and then define how the modes interact
using XML Events. As part of this effort,
a modularization of VoiceXML would be
defined such that one subset could be
used for multimodal markup.
The
final official act of the original VoiceXML
1.0 language design team was to sign the
Taylor Brewing Company Accord. The TBCA
sought to rectify the chief imperfection
of the VoiceXML 1.0 standard: its lack
of author names. Here they are, for posterity:
Linda Boyer, IBM; Peter Danielsen, Lucent;
Jim Ferrans, Motorola; Gerald Karam, AT&T;
David Ladd, Motorola; Bruce Lucas, IBM;
and Kenneth Rehor, Lucent. We hope you
have as much fun learning and using VoiceXML
as we did putting it together. Enjoy!
.