Introduction
VoiceXML
is the HTML of the voice web, the open
standard markup language for voice applications.
VoiceXML harnesses the massive web infrastructure
developed for HTML to make it easy to
create and deploy voice applications.
Like HTML, VoiceXML has opened up huge
business opportunities: the
Economist
even says that "VoiceXML could yet
rescue telecoms carriers from their folly
in stringing so much optical fibre around
the world."
VoiceXML
1.0 was published by the VoiceXML Forum,
a consortium of over 500 companies, in
March 2000. The Forum then gave control
of the standard to the World Wide Web
Consortium (W3C), and now concentrates
on conformance, education, and marketing.
The W3C has just published VoiceXML 2.0
as a Candidate Recommendation. Products
based on VoiceXML 2.0 are already widely
available.
While
HTML assumes a graphical web browser with
display, keyboard, and mouse, VoiceXML
assumes a voice browser with audio output,
audio input, and keypad input. Audio input
is handled by the voice browser's speech
recognizer. Audio output consists both
of recordings and speech synthesized by
the voice browser's text-to-speech system.
A voice browser typically runs on a specialized voice
gateway node that is connected both to the Internet
and to the public switched telephone network (see
Figure 1). The voice gateway can support hundreds
or thousands of simultaneous callers, and be accessed
by any one of the world's estimated 1,500,000,000
phones, from antique black candlestick phones up to
the very latest mobiles.

VoiceXML
takes advantage of several trends:
- The
growth of the World-Wide Web and of
its capabilities.
- Improvements
in computer-based speech recognition
and text-to-speech synthesis.
- The
spread of the WWW beyond the desktop
computer.