Voice search, also called voice-enabled search, allows the user to use a voice command to search the Internet, a website, or an app. In a broader definition, voice search includes open-domain keyword query on any information on the Internet, for example in Google Voice Search, Cortana,

Siri Siri ( , backronym: Speech Interpretation and Recognition Interface) is a digital assistant purchased, developed, and popularized by Apple Inc., which is included in the iOS, iPadOS, watchOS, macOS, Apple TV, audioOS, and visionOS operating sys ...

and

Amazon Echo Amazon Echo, often shortened to Echo, is a brand of smart speakers developed by Amazon (company), Amazon. Echo devices connect to the voice-controlled Virtual assistant, intelligent personal assistant service. ''Amazon Alexa, Alexa'', which resp ...

. Voice search is often interactive, involving several rounds of interaction that allows a system to ask for clarification. Voice search is a type of

dialog system A dialogue system, or conversational agent (CA), is a computer system intended to converse with a human. Dialogue systems employed one or more of text, speech, graphics, haptics, gestures, and other modes for communication on both the input and ...

. Voice search is not a replacement for typed search. Rather the search terms, experience and use cases can differ heavily depending on the input type.

Supported language

Language is the most essential factor for a system to understand, and provide the most accurate results of what the user searches. This covers across languages, dialects, and accents, as users want a voice assistant that both understands them and speaks to them understandably. While spoken and written languages differ, voice search should support natural spoken language instead of only transforming voice into text and doing a regular text search with the help

speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also ...

. For example, in typed search an eCommerce user can easily copy and paste an alphanumeric product code to search field, but when speaking the search terms can be very different, such as "show me the new Bluetooth headphones by Samsung".

How it works

The difference between text and voice search is not only the input type. The mechanism must include an automatic speech recognition (ASR) for input, but it can also include

natural language understanding Natural language understanding (NLU) or natural language interpretation (NLI) is a subset of natural language processing in artificial intelligence that deals with machine reading comprehension. NLU has been considered an AI-hard problem. Ther ...

for natural spoken search queries such as "What's the population for the United States" It can include text-to-speech (TTS) or a regular display for output modalities. Users might sometimes be required to activate the search by using a wake word. Then, the search system will detect the language spoken by the user. It will then detect the keywords and context of the sentence. Lastly, the device will return results depending on its output. A device with a screen might display the results, while a device without a screen will speak them back to the searcher.

References

{{Internet search Information retrieval genres Speech recognition

Supported language

How it works

See also

References