HOME

TheInfoList



OR:

A cache language model is a type of statistical
language model A language model is a probability distribution over sequences of words. Given any sequence of words of length , a language model assigns a probability P(w_1,\ldots,w_m) to the whole sequence. Language models generate probabilities by training on ...
. These occur in the
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
subfield of
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
and assign
probabilities Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...
to given sequences of words by means of a
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
. Statistical language models are key components of
speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the ...
systems and of many
machine translation Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation or interactive translation), is a sub-field of computational linguistics that investigates ...
systems: they tell such systems which possible output word sequences are probable and which are improbable. The particular characteristic of a cache language model is that it contains a cache component and assigns relatively high probabilities to words or word sequences that occur elsewhere in a given text. The primary, but by no means sole, use of cache language models is in speech recognition systems. To understand why it is a good idea for a statistical language model to contain a cache component one might consider someone who is dictating a letter about elephants to a speech recognition system. Standard (non-cache)
N-gram In the fields of computational linguistics and probability, an ''n''-gram (sometimes also called Q-gram) is a contiguous sequence of ''n'' items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or ...
language models will assign a very low probability to the word "elephant" because it is a very rare word in
English English usually refers to: * English language * English people English may also refer to: Peoples, culture, and language * ''English'', an adjective for something of, from, or related to England ** English national ...
. If the speech recognition system does not contain a cache component the person dictating the letter may be annoyed: each time the word "elephant" is spoken another sequence of words with a higher probability according to the N-gram language model may be recognized (e.g., "tell a plan"). These erroneous sequences will have to be deleted manually and replaced in the text by "elephant" each time "elephant" is spoken. If the system has a cache language model, "elephant" will still probably be misrecognized the first time it is spoken and will have to be entered into the text manually; however, from this point on the system is aware that "elephant" is likely to occur again – the estimated probability of occurrence of "elephant" has been increased, making it more likely that if it is spoken it will be recognized correctly. Once "elephant" has occurred several times the system is likely to recognize it correctly every time it is spoken until the letter has been completely dictated. This increase in the probability assigned to the occurrence of "elephant" is an example of a consequence of
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
and more specifically of
pattern recognition Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics ...
. There exist variants of the cache language model in which not only single words but also multi-word sequences that have occurred previously are assigned higher probabilities (e.g., if "San Francisco" occurred near the beginning of the text subsequent instances of it would be assigned a higher probability). The cache language model was first proposed in a paper published in 1990, after which the IBM speech-recognition group experimented with the concept. The group found that implementation of a form of cache language model yielded a 24% drop in word-error rates once the first few hundred words of a document had been dictated. A detailed survey of language modeling techniques concluded that the cache language model was one of the few new language modeling techniques that yielded improvements over the standard N-gram approach: "Our caching results show that caching is by far the most useful technique for perplexity reduction at small and medium
training data In machine learning, a common task is the study and construction of algorithms that can learn from and make predictions on data. Such algorithms function by making data-driven predictions or decisions, through building a mathematical model from ...
sizes". The development of the cache language model has generated considerable interest among those concerned with
computational linguistics Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics ...
in general and
statistical natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to p ...
in particular: recently there has been interest in applying the cache language model in the field of statistical machine translation. The success of the cache language model in improving
word prediction Autocomplete, or word completion, is a feature in which an application software, application predicts the rest of a word a user is typing. In Android (operating system), Android and iOS smartphones, this is called predictive text. In graphical use ...
rests on the human tendency to use words in a "bursty" fashion: when one is discussing a certain topic in a certain context the frequency with which one uses certain words will be quite different from their frequencies when one is discussing other topics in other contexts. The traditional N-gram language models, which rely entirely on information from a very small number (four, three, or two) of words preceding the word to which a probability is to be assigned, do not adequately model this "burstiness". Recently, the cache language model concept - originally conceived for the N-gram statistical language model paradigm - has been adapted for use in the neural paradigm. For instance, recent work on continuous cache language models in the
recurrent neural network A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes can create a cycle, allowing output from some nodes to affect subsequent input to the same nodes. This allows it to exhibit temporal dynamic ...
(RNN) setting has applied the cache concept to much larger contexts than before, yielding significant reductions in perplexity . Another recent line of research involves incorporating a cache component in a
feed-forward Feedforward is the provision of context of what one wants to communicate prior to that communication. In purposeful activity, feedforward creates an expectation which the actor anticipates. When expected experience occurs, this provides confirmato ...
neural language model (FN-LM) to achieve rapid domain adaptation .


See also

*
Artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech ...
*
History of natural language processing The history of natural language processing describes the advances of natural language processing (Outline of natural language processing). There is some overlap with the history of machine translation, the history of speech recognition, and the h ...
*
History of machine translation Machine translation is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one natural language to another. In the 1950s, machine translation became a reality in research, although refe ...
*
Speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the ...
*
Statistical machine translation Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The statistical approach contras ...


References


Further reading

*{{cite book , last=Jelinek , first=Frederick , author-link=Frederick Jelinek , title=Statistical Methods for Speech Recognition , publisher=
The MIT Press The MIT Press is a university press affiliated with the Massachusetts Institute of Technology (MIT) in Cambridge, Massachusetts (United States). It was established in 1962. History The MIT Press traces its origins back to 1926 when MIT publish ...
, url=http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=7447 , year=1997 , isbn=0-262-10066-5 , access-date=2011-09-24 , archive-url=https://web.archive.org/web/20110805015427/http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=7447 , archive-date=2011-08-05 , url-status=dead Language modeling Computational linguistics Natural language processing Speech processing Speech recognition Machine translation