Keyword spotting (or more simply, word spotting) is a problem that was historically first defined in the context of
speech processing.
In speech processing, keyword spotting deals with the identification of
keywords in
utterances
In spoken language analysis, an utterance is a continuous piece of speech, often beginning and ending with a clear pause. In the case of oral languages, it is generally, but not always, bounded by silence. Utterances do not exist in written langu ...
.
Keyword spotting is also defined as a separate, but related, problem in the context of document image processing.
In document image processing, keyword spotting is the problem of finding all instances of a query word that exist in a scanned document image, without fully recognizing it.
In speech processing
The first works in keyword spotting appeared in the late 1980s.
A special case of keyword spotting is wake word (also called hot word) detection used by personal digital assistants such as
Alexa
Alexa may refer to: Technology
*Amazon Alexa, a virtual assistant developed by Amazon
* Alexa Internet, a defunct website ranking and traffic analysis service
* Arri Alexa, a digital motion picture camera
People
*Alexa (name)
Alexa is a fem ...
or
Siri to activate the dormant speaker, in other words "wake up" when their name is spoken.
In the United States, the
National Security Agency has made use of keyword spotting since at least 2006. This technology allows analysts to search through large volumes of recorded conversations and isolate mentions of suspicious keywords. Recordings can be indexed and analysts can run queries over the database to find conversations of interest.
IARPA
The Intelligence Advanced Research Projects Activity (IARPA) is an organization within the Office of the Director of National Intelligence responsible for leading research to overcome difficult challenges relevant to the United States Intellige ...
funded research into keyword spotting in the
Babel program.
Some algorithms used for this task are:
*
Sliding window
A sliding window protocol is a feature of packet-based data transmission protocols. Sliding window protocols are used where reliable in-order delivery of packets is required, such as in the data link layer (OSI layer 2) as well as in the Trans ...
and
garbage model
*
K-best hypothesis
*
Iterative Viterbi decoding Iterative Viterbi decoding is an algorithm that spots the subsequence ''S'' of an observation ''O'' = having the highest average probability (i.e., probability scaled by the length of ''S'') of being generated by a given hidden Markov model ''M'' w ...
*
Convolutional neural network
In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of artificial neural network (ANN), most commonly applied to analyze visual imagery. CNNs are also known as Shift Invariant or Space Invariant Artificial Neural Networ ...
on
Mel-frequency cepstrum
In sound processing, the mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.
Mel-frequency cepstral coe ...
coefficients
*
Transformer
A transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit, or multiple circuits. A varying current in any coil of the transformer produces a varying magnetic flux in the transformer's ...
-based small-footprint keyword spotting
In document image processing
Keyword spotting in document image processing can be seen as an instance of the more generic problem of
content-based image retrieval
Content-based image retrieval, also known as query by image content (QBIC) and content-based visual information retrieval (CBVIR), is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching f ...
(CBIR).
Given a query, the goal is to retrieve the most relevant instances of words in a collection of scanned documents.
The query may be a text string (query-by-string keyword spotting) or a word image (query-by-example keyword spotting).
References
{{reflist
Pattern recognition