Collocate
   HOME

TheInfoList



OR:

In
corpus linguistics Corpus linguistics is an empirical method for the study of language by way of a text corpus (plural ''corpora''). Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a giv ...
, a collocation is a series of words or terms that co-occur more often than would be expected by chance. In
phraseology In linguistics, phraseology is the study of set or fixed expressions, such as idioms, phrasal verbs, and other types of multi-word lexical units (often collectively referred to as ''phrasemes''), in which the component parts of the expression tak ...
, a collocation is a type of compositional
phraseme A phraseme, also called a set phrase, fixed expression, multiword expression (in computational linguistics), or idiom, is a multi-word or multi-morphemic utterance whose components include at least one that is selectionally constrained or restri ...
, meaning that it can be understood from the words that make it up. This contrasts with an
idiom An idiom is a phrase or expression that largely or exclusively carries a Literal and figurative language, figurative or non-literal meaning (linguistic), meaning, rather than making any literal sense. Categorized as formulaic speech, formulaic ...
, where the meaning of the whole cannot be inferred from its parts, and may be completely unrelated. There are about seven main types of collocations: adjective + noun, noun + noun (such as
collective nouns In linguistics, a collective noun is a word referring to a collection of things taken as a whole. Most collective nouns in everyday speech are not specific to one kind of thing. For example, the collective noun "group" can be applied to people (" ...
), noun + verb, verb + noun, adverb + adjective, verbs + prepositional phrase (
phrasal verb In the traditional grammar of Modern English, a phrasal verb typically constitutes a single semantic unit consisting of a verb followed by a particle (e.g., ''turn down'', ''run into,'' or ''sit up''), sometimes collocated with a preposition (e. ...
s), and verb + adverb. Collocation extraction is a computational technique that finds collocations in a document or corpus, using various
computational linguistics Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics ...
elements resembling
data mining Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
.


Expanded definition

Collocations are partly or fully fixed expressions that become established through repeated context-dependent use. Such terms as ''crystal clear'', ''middle management'', ''nuclear family'', and ''cosmetic surgery'' are examples of collocated pairs of words. Collocations can be in a
syntactic In linguistics, syntax ( ) is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituency ...
relation (such as verb–object: ''make'' and ''decision''), lexical relation (such as
antonymy In lexical semantics, opposites are words lying in an inherently incompatible binary relationship. For example, something that is ''even'' entails that it is not ''odd''. It is referred to as a 'binary' relationship because there are two members i ...
), or they can be in no linguistically defined relation. Knowledge of collocations is vital for the competent use of a language: a grammatically correct sentence will stand out as awkward if collocational preferences are violated. This makes collocation a common focus for language teaching. Corpus linguists specify a key word in context ( KWIC) and identify the words immediately surrounding them, to illustrate the way words are used in practice. The processing of collocations involves a number of parameters, the most important of which is the ''measure of association'', which evaluates whether the co-occurrence is purely by chance or statistically significant. Due to the non-random nature of language, most collocations are classed as significant, and the association scores are simply used to rank the results. Commonly used measures of association include
mutual information In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual Statistical dependence, dependence between the two variables. More specifically, it quantifies the "Information conten ...
, t scores, and log-likelihood. Rather than select a single definition, Gledhill proposes that collocation involves at least three different perspectives: co-occurrence, a statistical view, which sees collocation as the recurrent appearance in a text of a node and its collocates; construction, which sees collocation either as a correlation between a lexeme and a lexical-grammatical pattern, or as a relation between a base and its collocative partners; and expression, a pragmatic view of collocation as a conventional unit of expression, regardless of form. These different perspectives contrast with the usual way of presenting collocation in phraseological studies. Traditionally speaking, collocation is explained in terms of all three perspectives at once, in a continuum: :Free combination ↔ bound collocation ↔ frozen idiom


In dictionaries

In 1933, Harold Palmer's ''Second Interim Report on English Collocations'' highlighted the importance of collocation as a key to producing natural-sounding language, for anyone learning a foreign language. Thus from the 1940s onwards, information about recurrent word combinations became a standard feature of monolingual learner's dictionaries. As these dictionaries became "less word-centred and more phrase-centred", more attention was paid to collocation. This trend was supported, from the beginning of the 21st century, by the availability of large text corpora and intelligent corpus-querying software, making it possible to provide a more systematic account of collocation in dictionaries. Using these tools, dictionaries such as the '' Macmillan English Dictionary'' and the ''
Longman Dictionary of Contemporary English The ''Longman Dictionary of Contemporary English'' (''LDOCE''), first published by Longman in 1978, is an advanced learner's dictionary, providing definitions using a restricted vocabulary, helping non-native English speakers understand meanin ...
'' included boxes or panels with lists of frequent collocations. There are also a number of specialized dictionaries devoted to describing the frequent collocations in a language. These include (for Spanish) ''Redes: Diccionario combinatorio del español contemporaneo'' (2004), (for French) ''Le Robert: Dictionnaire des combinaisons de mots'' (2007), and (for English) the ''LTP Dictionary of Selected Collocations'' (1997) and the ''Macmillan Collocations Dictionary'' (2010).


Statistically significant collocation

Student's ''t''-test can be used to determine whether the occurrence of a collocation in a corpus is statistically significant. For a bigram w_1w_2, let P(w_1) = \frac be the unconditional probability of occurrence of w_1 in a corpus with size N, and let P(w_2) = \frac be the unconditional probability of occurrence of w_2 in the corpus. The t-score for the bigram w_1w_2 is calculated as: : t = \frac, where \bar = \frac is the sample mean of the occurrence of w_1w_2, \#w_1w_2 is the number of occurrences of w_1w_2, \mu = P(w_i)P(w_j) is the probability of w_1w_2 under the null-hypothesis that w_1 and w_2 appear independently in the text, and s^2 = \bar(1-\bar) \approx \bar is the sample variance. With a large N, the ''t''-test is equivalent to a ''Z''-test.


See also

* English collocations *
Agreement (linguistics) In linguistics, agreement or concord ( abbreviated ) occurs when a word changes form depending on the other words to which it relates. It is an instance of inflection, and usually involves making the value of some grammatical category (such as gen ...
*
Cliché A cliché ( or ; ) is a saying, idea, or element of an artistic work that has become overused to the point of losing its original meaning, novelty, or literal and figurative language, figurative or artistic power, even to the point of now being b ...
* Collocational restriction * Collostructional analysis * Compound noun, adjective and verb *
Government (linguistics) In grammar and theoretical linguistics, government or rection refers to the relationship between a word and its dependents. One can discern between at least three concepts of government: the traditional notion of case government, the highly specia ...
*
Idiom (language structure) An idiom (the quality of it being known as idiomaticness or idiomaticity) is a syntactical, grammatical, or phonological structure peculiar to a language that is actually realized, as opposed to possible but unrealized structures that could hav ...
*
Irreversible binomial In linguistics and stylistics, an irreversible binomial, frozen binomial, binomial freeze, binomial expression, binomial pair, or nonreversible word pair is a pair of words used together in fixed order as an idiomatic expression or collocation. T ...
* Isocolon *
Lexical item In lexicography, a lexical item is a single word, a part of a word, or a chain of words (catena (linguistics), catena) that forms the basic elements of a language's lexicon (≈ vocabulary). Examples are ''cat'', ''traffic light'', ''take ca ...
* N-gram *
Phrasal verb In the traditional grammar of Modern English, a phrasal verb typically constitutes a single semantic unit consisting of a verb followed by a particle (e.g., ''turn down'', ''run into,'' or ''sit up''), sometimes collocated with a preposition (e. ...
*
Phraseology In linguistics, phraseology is the study of set or fixed expressions, such as idioms, phrasal verbs, and other types of multi-word lexical units (often collectively referred to as ''phrasemes''), in which the component parts of the expression tak ...
*
Phraseme A phraseme, also called a set phrase, fixed expression, multiword expression (in computational linguistics), or idiom, is a multi-word or multi-morphemic utterance whose components include at least one that is selectionally constrained or restri ...
* Sketch Engine * Statistically improbable phrase * Word sketch


References


External links


Ozdic Collocation Dictionary

A Small System Storing Spanish Collocations
(Igor A. Bolshakov & Sabino Miranda-Jiménez)
Morphological characterization of collocations and semantic relationships in Spanish
(Sabino Miranda-Jiménez & Igor A. Bolshakov)
Example of collocations for the word "Surgery"
at ''wordassociations.net'' {{Authority control Lexical units Language education Corpus linguistics Semantic relations