HOME

TheInfoList



OR:

In linguistics, co-occurrence or cooccurrence is an above-chance frequency of ordered occurrence of two adjacent terms in a
text corpus In linguistics and natural language processing, a corpus (: corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated. Annotated, they have been used in corp ...
. Co-occurrence in this
linguistic Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...
sense can be interpreted as an indicator of semantic proximity or an
idiom An idiom is a phrase or expression that largely or exclusively carries a Literal and figurative language, figurative or non-literal meaning (linguistic), meaning, rather than making any literal sense. Categorized as formulaic speech, formulaic ...
atic expression. Corpus linguistics and its statistic analyses reveal patterns of co-occurrences within a language and enable to work out typical
collocation In corpus linguistics, a collocation is a series of words or terms that co-occur more often than would be expected by chance. In phraseology, a collocation is a type of compositional phraseme, meaning that it can be understood from the words t ...
s for its lexical items. A ''co-occurrence restriction'' is identified when linguistic elements never occur together. Analysis of these restrictions can lead to discoveries about the
structure A structure is an arrangement and organization of interrelated elements in a material object or system, or the object or system so organized. Material structures include man-made objects such as buildings and machines and natural objects such as ...
and development of a language. Co-occurrence can be seen an extension of word counting in higher dimensions. Co-occurrence can be quantitatively described using measures like a massive
correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
or
mutual information In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual Statistical dependence, dependence between the two variables. More specifically, it quantifies the "Information conten ...
.


See also

* Distributional hypothesis * Statistical semantics *
Idiom (language structure) An idiom (the quality of it being known as idiomaticness or idiomaticity) is a syntactical, grammatical, or phonological structure peculiar to a language that is actually realized, as opposed to possible but unrealized structures that could hav ...
* Co-occurrence matrix * Co-occurrence networks *
Similarity measure In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity exists, usually such mea ...
* Dice coefficient


References


External links

* Corpus linguistics {{Ling-stub