Distributional hypothesis
   HOME

TheInfoList



OR:

Distributional semantics is a research area that develops and studies theories and methods for quantifying and categorizing semantic similarities between linguistic items based on their distributional properties in large samples of language data. The basic idea of distributional semantics can be summed up in the so-called distributional hypothesis: ''linguistic items with similar distributions have similar meanings.''


Distributional hypothesis

The distributional hypothesis in
linguistics Linguistics is the science, scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure ...
is derived from the semantic theory of language usage, i.e. words that are used and occur in the same
context Context may refer to: * Context (language use), the relevant constraints of the communicative situation that influence language use, language variation, and discourse summary Computing * Context (computing), the virtual environment required to su ...
s tend to purport similar meanings. The underlying idea that "a word is characterized by the company it keeps" was popularized by
Firth Firth is a word in the English and Scots languages used to denote various coastal waters in the United Kingdom, predominantly within Scotland. In the Northern Isles, it more usually refers to a smaller inlet. It is linguistically cognate to ''fj ...
in the 1950s. The distributional hypothesis is the basis for
statistical semantics In linguistics, statistical semantics applies the methods of statistics to the problem of determining the meaning of words or phrases, ideally through unsupervised learning, to a degree of precision at least sufficient for the purpose of informat ...
. Although the Distributional Hypothesis originated in linguistics, it is now receiving attention in cognitive science especially regarding the context of word use. In recent years, the distributional hypothesis has provided the basis for the theory of similarity-based generalization in language learning: the idea that children can figure out how to use words they've rarely encountered before by generalizing about their use from distributions of similar words. The distributional hypothesis suggests that the more semantically similar two words are, the more distributionally similar they will be in turn, and thus the more that they will tend to occur in similar linguistic contexts. Whether or not this suggestion holds has significant implications for both the data-sparsity problem in computational modeling, and for the question of how children are able to learn language so rapidly given relatively impoverished input (this is also known as the problem of the
poverty of the stimulus Poverty of the stimulus (POS) is the controversial argument from linguistics that children are not exposed to rich enough data within their linguistic environments to acquire every feature of their language. This is considered evidence contrary to ...
).


Distributional semantic modeling in vector spaces

Distributional semantics favor the use of linear algebra as computational tool and representational framework. The basic approach is to collect distributional information in high-dimensional vectors, and to define distributional/semantic similarity in terms of vector similarity. Different kinds of similarities can be extracted depending on which type of distributional information is used to collect the vectors: topical similarities can be extracted by populating the vectors with information on which text regions the linguistic items occur in; paradigmatic similarities can be extracted by populating the vectors with information on which other linguistic items the items co-occur with. Note that the latter type of vectors can also be used to extract syntagmatic similarities by looking at the individual vector components. The basic idea of a correlation between distributional and semantic similarity can be operationalized in many different ways. There is a rich variety of computational models implementing distributional semantics, including
latent semantic analysis Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the do ...
(LSA),
Hyperspace Analogue to Language Semantic memory refers to general world knowledge that humans have accumulated throughout their lives. This general knowledge (word meanings, concepts, facts, and ideas) is intertwined in experience and dependent on culture. We can learn about n ...
(HAL), syntax- or dependency-based models,
random indexing Random indexing is a dimensionality reduction method and computational framework for distributional semantics, based on the insight that very-high-dimensional vector space model implementations are impractical, that models need not grow in dimension ...
, semantic folding and various variants of the
topic model In statistics and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden ...
. Distributional semantic models differ primarily with respect to the following parameters: * Context type (text regions vs. linguistic items) * Context window (size, extension, etc.) * Frequency weighting (e.g.
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...
,
pointwise mutual information In statistics, probability theory and information theory, pointwise mutual information (PMI), or point mutual information, is a measure of association. It compares the probability of two events occurring together to what this probability would b ...
, etc.) * Dimension reduction (e.g.
random indexing Random indexing is a dimensionality reduction method and computational framework for distributional semantics, based on the insight that very-high-dimensional vector space model implementations are impractical, that models need not grow in dimension ...
,
singular value decomposition In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any \ m \times n\ matrix. It is re ...
, etc.) *
Similarity measure In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity exists, usually such meas ...
(e.g.
cosine similarity In data analysis, cosine similarity is a measure of similarity between two sequences of numbers. For defining it, the sequences are viewed as vectors in an inner product space, and the cosine similarity is defined as the cosine of the angle betw ...
, Minkowski distance, etc.) Distributional semantic models that use linguistic items as context have also been referred to as word space, or vector space models.


Beyond Lexical Semantics

While distributional semantics typically has been applied to lexical items—words and multi-word terms—with considerable success, not least due to its applicability as an input layer for neurally inspired deep learning models,
lexical semantics Lexical semantics (also known as lexicosemantics), as a subfield of linguistic semantics, is the study of word meanings.Pustejovsky, J. (2005) Lexical Semantics: Overview' in Encyclopedia of Language and Linguistics, second edition, Volumes 1-14Ta ...
, i.e. the meaning of words, will only carry part of the semantics of an entire utterance. The meaning of a clause, e.g. ''"Tigers love rabbits."'', can only partially be understood from examining the meaning of the three lexical items it consists of. Distributional semantics can straightforwardly be extended to cover larger linguistic item such as constructions, with and without non-instantiated items, but some of the base assumptions of the model need to be adjusted somewhat. Construction grammar and its formulation of the lexical-syntactic continuum offers one approach for including more elaborate constructions in a distributional semantic model and some experiments have been implemented using the Random Indexing approach. Compositional distributional semantic models extend distributional semantic models by explicit semantic functions that use syntactically based rules to combine the semantics of participating lexical units into a ''compositional model'' to characterize the semantics of entire phrases or sentences. This work was originally proposed by Stephen Clark,
Bob Coecke Bob Coecke (born 23 July 1968) is a Belgian theoretical physicist and logician who was professor of Quantum Foundations, Logics and Structures at Oxford University until 2020, when he became Chief Scientist of Cambridge Quantum Computing, and ...
, and
Mehrnoosh Sadrzadeh Mehrnoosh Sadrzadeh is an Iranian British academic who is a professor at University College London. She was awarded a senior research fellowship at the Royal Academy of Engineering in 2022. Early life and education Sadrzadeh is from Iran. She r ...
of
Oxford University Oxford () is a city in England. It is the county town and only city of Oxfordshire. In 2020, its population was estimated at 151,584. It is north-west of London, south-east of Birmingham and north-east of Bristol. The city is home to th ...
in their 2008 paper, "A Compositional Distributional Model of Meaning". Different approaches to composition have been explored—including neural models—and are under discussion at established workshops such as
SemEval SemEval (Semantic Evaluation) is an ongoing series of evaluations of computational semantic analysis systems; it evolved from the Senseval word sense evaluation series. The evaluations are intended to explore the nature of meaning in language. ...
.


Applications

Distributional semantic models have been applied successfully to the following tasks: * finding
semantic similarity Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tool ...
between words and multi-word expressions; * word clustering based on semantic similarity; * automatic creation of
thesauri A thesaurus (plural ''thesauri'' or ''thesauruses'') or synonym dictionary is a reference work for finding synonyms and sometimes antonyms of words. They are often used by writers to help find the best word to express an idea: Synonym dictionar ...
and bilingual dictionaries; *
word sense disambiguation Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to consc ...
; * expanding search requests using synonyms and associations; * defining the topic of a document; *
document clustering Document clustering (or text clustering) is the application of cluster analysis to textual documents. It has applications in automatic document organization, topic extraction and fast information retrieval or filtering. Overview Document cluster ...
for information retrieval; * data mining and named entities recognition; * creating semantic maps of different subject domains; * paraphrasing; *
sentiment analysis Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjec ...
; * modeling selectional preferences of words.


Software


S-Space

SemanticVectors





Indra


See also

*
Conceptual space A conceptual space is a geometric structure that represents a number of quality dimensions, which denote basic features by which concepts and objects can be compared, such as weight, color, taste, temperature, pitch, and the three ordinary spatial ...
*
Co-occurrence In linguistics, co-occurrence or cooccurrence is an above-chance frequency of occurrence of two terms (also known as coincidence or concurrence) from a text corpus alongside each other in a certain order. Co-occurrence in this linguistic sense ca ...
*
Distributional–relational database A distributional–relational database, or word-vector database, is a Database Management System, database management system (DBMS) that uses distributional word embedding, word-vector representations to enrich the semantics of data model, structur ...
* Gensim * Phraseme *
Random indexing Random indexing is a dimensionality reduction method and computational framework for distributional semantics, based on the insight that very-high-dimensional vector space model implementations are impractical, that models need not grow in dimension ...
* Sentence embedding *
Statistical semantics In linguistics, statistical semantics applies the methods of statistics to the problem of determining the meaning of words or phrases, ideally through unsupervised learning, to a degree of precision at least sufficient for the purpose of informat ...
*
Word2vec Word2vec is a technique for natural language processing (NLP) published in 2013. The word2vec algorithm uses a neural network model to learn word associations from a large corpus of text. Once trained, such a model can detect synonymous words or ...
*
Word embedding In natural language processing (NLP), word embedding is a term used for the representation of words for text analysis, typically in the form of a real-valued vector that encodes the meaning of the word such that the words that are closer in the v ...


People

*
Scott Deerwester Scott Deerwester (born 1956) is one of the inventors of latent semantic analysis. He was a member of the faculty of the Colgate University, University of Chicago and the Hong Kong University of Science and Technology. He moved to Hong Kong in 1991 ...
* Susan Dumais *
J. R. Firth John Rupert Firth (June 17, 1890 in Keighley, Yorkshire – December 14, 1960 in Lindfield, West Sussex), commonly known as J. R. Firth, was an English linguist and a leading figure in British linguistics during the 1950s. Education and career F ...
* George Furnas *
Zellig Harris Zellig Sabbettai Harris (; October 23, 1909 – May 22, 1992) was an influential American linguist, mathematical syntactician, and methodologist of science. Originally a Semiticist, he is best known for his work in structural linguistics and dis ...
* Thomas Landauer * Magnus Sahlgren


References


Sources

* * Reprinted in * * * * * * * * * * * *


External links


Zellig S. Harris
{{DEFAULTSORT:Distributional Hypothesis Computational linguistics Semantics Language acquisition Semantic relations