XHTML+Voice (commonly X+V) is an

XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...

language for describing multimodal user interfaces. The two essential modalities are visual and auditory. Visual interaction is defined like most current web pages via

XHTML Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages which mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated. While HTML, pr ...

. Auditory components are defined by a subset of Voice XML. Interfacing the voice and visual components of X+V documents is accomplished through a combination of

ECMAScript ECMAScript (; ES) is a standard for scripting languages, including JavaScript, JScript, and ActionScript. It is best known as a JavaScript standard intended to ensure the interoperability of web pages across different web browsers. It is stan ...

JavaScript JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior. Web browsers have ...

, and

XML Events In computer science and web development, XML Events is a W3C standard for handling events that occur in an XML document. These events are typically caused by users interacting with the web page using a device, such as a web browser on a personal ...

Voice input

Voice input or

speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also ...

is based on grammars that define the set of possible input text. In contrast to a probabilistic approach employed by popular software packages such as

Dragon Naturally Speaking Dragon NaturallySpeaking (also known as Dragon for PC, or DNS) is a speech recognition software package developed by Dragon Systems of Newton, Massachusetts, which was acquired in turn by Lernout & Hauspie Speech Products, Nuance Communications ...

, the grammar based approach provides the recognizer with important contextual information that significantly boosts recognition accuracy. The specific formats for grammars include JSGF.

Voice output

Voice output or

speech synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal langua ...

can read any string at virtually any time. Pitch, volume, and other characteristics can be customized using CSS and

Speech Synthesis Markup Language Speech Synthesis Markup Language (SSML) is an XML-based markup language for speech synthesis applications. It is a recommendation of the W3C's Voice Browser Working Group. SSML is often embedded in VoiceXML scripts to drive interactive telephony sy ...

(SSML) however the

Opera Opera is a form of History of theatre#European theatre, Western theatre in which music is a fundamental component and dramatic roles are taken by Singing, singers. Such a "work" (the literal translation of the Italian word "opera") is typically ...

web browser doesn't currently support all these features.

MIME types

The previously recommended MIME type for any X+V document is application/xhtml+voice+xml which is what the

Opera browser Opera is a form of History of theatre#European theatre, Western theatre in which music is a fundamental component and dramatic roles are taken by Singing, singers. Such a "work" (the literal translation of the Italian word "opera") is typically ...

uses. Opera will also interpret X+V documents served as text/xml. The current recommended MIME type for any X+V document is application/xv+xml. Since most web servers associate the .xml extension with text/xml, an xml extension is a fairly safe way of making your static X+V document files browsable.

X+V-enabled browsers

The most commonly used X+V browser is the

. Users of the

can enable X+V support through steps described a
https://web.archive.org/web/20080516174104/http://www.opera.com/voice
. Voice is not yet supported in Opera Mini or on platforms other than Windows. Detecting support for X+V is best done from the server by checking the HTTP header "Accept" for the MIME type application/xhtml+voice+xml. Here is some PHP code that returns "true" if and only if the requesting browser supports XHTML+Voice: HTTP_ACCEPT') else { echo 'false'; } ?>

Related technology

Speech Application Language Tags (SALT) is a very similar format developed by

Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...

in 2001 to compete with

VoiceXML VoiceXML (VXML) is a digital document standard for specifying interactive media and voice dialogs between humans and computers. It is used for developing audio and voice response applications, such as banking systems and automated customer service ...

and XHTML+Voice. SALT also provides users with multimodal support including grammar based recognition and speech synthesized output. The main differences are in the providers of support. Many different companies support VoiceXML and XHTML+Voice by providing various development tools and in particular

IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...

and

Opera Software Opera (formerly Opera Software AS) is a Norwegian multinational technology corporation headquartered in Oslo, Norway with additional offices in European Union, Europe, China, and Africa. Opera offers a range of products and services that inclu ...

. SALT is supported almost exclusively from Microsoft by products such as the Microsoft Speech Application SDK and Microsoft Speech Server.

Voice input

Voice output

MIME types

X+V-enabled browsers

Related technology

External links