XHTML+Voice (commonly X+V) is an
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
language for describing
multimodal user interfaces. The two essential modalities are visual and auditory. Visual interaction is defined like most current web pages via
XHTML
Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages which mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.
While HTML, pr ...
. Auditory components are defined by a subset of
Voice XML. Interfacing the voice and visual components of X+V documents is accomplished through a combination of
ECMAScript
ECMAScript (; ES) is a standard for scripting languages, including JavaScript, JScript, and ActionScript. It is best known as a JavaScript standard intended to ensure the interoperability of web pages across different web browsers. It is stan ...
,
JavaScript
JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior.
Web browsers have ...
, and
XML Events
In computer science and web development, XML Events is a W3C standard for handling events that occur in an XML document. These events are typically caused by users interacting with the web page using a device, such as a web browser on a personal ...
.
Voice input
Voice input or
speech recognition
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also ...
is based on grammars that define the set of possible input text. In contrast to a probabilistic approach employed by popular software packages such as
Dragon Naturally Speaking
Dragon NaturallySpeaking (also known as Dragon for PC, or DNS) is a speech recognition software package developed by Dragon Systems of Newton, Massachusetts, which was acquired in turn by Lernout & Hauspie Speech Products, Nuance Communications ...
, the grammar based approach provides the recognizer with important contextual information that significantly boosts recognition accuracy. The specific formats for grammars include
JSGF.
Voice output
Voice output or
speech synthesis
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal langua ...
can read any string at virtually any time. Pitch, volume, and other characteristics can be customized using
CSS and
Speech Synthesis Markup Language
Speech Synthesis Markup Language (SSML) is an XML-based markup language for speech synthesis applications. It is a recommendation of the W3C's Voice Browser Working Group. SSML is often embedded in VoiceXML scripts to drive interactive telephony sy ...
(SSML) however the
Opera
Opera is a form of History of theatre#European theatre, Western theatre in which music is a fundamental component and dramatic roles are taken by Singing, singers. Such a "work" (the literal translation of the Italian word "opera") is typically ...
web browser doesn't currently support all these features.
MIME types
The previously recommended MIME type for any X+V document is application/xhtml+voice+xml which is what the
Opera browser
Opera is a form of History of theatre#European theatre, Western theatre in which music is a fundamental component and dramatic roles are taken by Singing, singers. Such a "work" (the literal translation of the Italian word "opera") is typically ...
uses. Opera will also interpret X+V documents served as text/xml. The current recommended MIME type for any X+V document is application/xv+xml. Since most web servers associate the .xml extension with text/xml, an xml extension is a fairly safe way of making your static X+V document files browsable.
X+V-enabled browsers
The most commonly used X+V browser is the
Opera browser
Opera is a form of History of theatre#European theatre, Western theatre in which music is a fundamental component and dramatic roles are taken by Singing, singers. Such a "work" (the literal translation of the Italian word "opera") is typically ...
. Users of the
Opera browser
Opera is a form of History of theatre#European theatre, Western theatre in which music is a fundamental component and dramatic roles are taken by Singing, singers. Such a "work" (the literal translation of the Italian word "opera") is typically ...
can enable X+V support through steps described a
https://web.archive.org/web/20080516174104/http://www.opera.com/voice
. Voice is not yet supported in
Opera Mini or on platforms other than Windows.
Detecting support for X+V is best done from the server by checking the HTTP header "Accept" for the MIME type application/xhtml+voice+xml. Here is some PHP code that returns "true" if and only if the requesting browser supports XHTML+Voice:
HTTP_ACCEPT') else {
echo 'false';
}
?>
Related technology
Speech Application Language Tags (SALT) is a very similar format developed by
Microsoft
Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
in 2001 to compete with
VoiceXML
VoiceXML (VXML) is a digital document standard for specifying interactive media and voice dialogs between humans and computers. It is used for developing audio and voice response applications, such as banking systems and automated customer service ...
and XHTML+Voice. SALT also provides users with multimodal support including grammar based recognition and speech synthesized output. The main differences are in the providers of support. Many different companies support VoiceXML and XHTML+Voice by providing various development tools and in particular
IBM
International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
and
Opera Software
Opera (formerly Opera Software AS) is a Norwegian multinational technology corporation headquartered in Oslo, Norway with additional offices in European Union, Europe, China, and Africa. Opera offers a range of products and services that inclu ...
. SALT is supported almost exclusively from Microsoft by products such as the
Microsoft Speech Application SDK and
Microsoft Speech Server.
External links
XHTML+Voice v1.2Voice - Opera Developer CommunityXHTML+Voice Programmer's GuideDownload Opera Web BrowserThe SpeechWeb ProjectRFC 4374 on MIME typeVideo Demonstration of XHTML+Voice Page
XML-based standards