Word Sketch
   HOME

TheInfoList



OR:

A word sketch is a one-page, automatic, corpus-derived summary of a word’s grammatical and collocational behaviour. Word sketches were first introduced by the British corpus linguist
Adam Kilgarriff Adam Kilgarriff (12 February 1960 – 16 May 2015) was a corpus linguist, lexicographer, and co-author of Sketch Engine. Life His parents were booksellers. He spent one year as a volunteer in Kenya 1978–1979 then began studying at Cambridg ...
Kilgarriff, Adam; Rychlý, Pavel; Smrž, Pavel; Tugwell, David (2004) The Sketch Engine. Information Technology, 2004 and exploited within the
Sketch Engine Sketch Engine is a corpus manager and text analysis software developed by Lexical Computing since 2003. Its purpose is to enable people studying language behaviour (lexicographers, researchers in corpus linguistics, translators or language learn ...
corpus management system. They are an extension of the general
collocation In corpus linguistics, a collocation is a series of words or terms that co-occur more often than would be expected by chance. In phraseology, a collocation is a type of compositional phraseme, meaning that it can be understood from the words t ...
concept used in corpus linguistics in that they group collocations according to particular grammatical relations (e.g. subject, object, modifier etc.). The collocation candidates in a word sketch are sorted either by their frequency or using a lexicographic association score like
Dice A die (: dice, sometimes also used as ) is a small, throwable object with marked sides that can rest in multiple positions. Dice are used for generating random values, commonly as part of tabletop games, including dice games, board games, ro ...
,
T-score In statistics, the standard score or ''z''-score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores ...
or MI-score. Since the introduction, word sketches have been used by lexicographers to develop modern corpus-based dictionaries by major publishing houses including Oxford English Dictionary, Macmillan English Dictionary and comprising dozens of languages including English, Chinese, Slovene, Japanese, Dutch, Romanian, Russian, Czech, Polish, Vietnamese, Turkish, Portuguese, Hindi, Spanish and others.


Formal account

A word sketch triple is a triple consisting of ''headword, grammatical relation, collocation'' (e.g. ''man, modifier, young''). Considering an underlying
text corpus In linguistics and natural language processing, a corpus (: corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated. Annotated, they have been used in corp ...
, a word sketch quintuple is a quintuple consisting of ''headword, grammatical relation, collocation, position of headword in the corpus, position of collocation in the corpus'' (e.g. ''man, modifier, young, 104, 103''). A word sketch database is a set of such triples or quintuples, which may be generated either by querying a corpus using corpus query language or by parsing the corpus using a natural language parser.Aleš Horák, Pavel Rychlý, Adam Kilgarriff (2009) Czech word sketch relations with full syntax parser. In After Half a Century of Slavonic Natural Language Processing.


References

{{Reflist


External links

*
Word Sketch – word collocations
' in Sketch Engine User Manual Corpus linguistics