LOB Corpus
   HOME

TheInfoList



OR:

The Lancaster-Oslo/Bergen (LOB) Corpus is a one-million-word collection of
British English British English is the set of Variety (linguistics), varieties of the English language native to the United Kingdom, especially Great Britain. More narrowly, it can refer specifically to the English language in England, or, more broadly, to ...
texts which was compiled in the 1970s in collaboration between the
University of Lancaster Lancaster University (officially The University of Lancaster) is a collegiate public university, public research university in Lancaster, Lancashire, England. The university was established in 1964 by royal charter, as one of several new univer ...
, the
University of Oslo The University of Oslo (; ) is a public university, public research university located in Oslo, Norway. It is the List of oldest universities in continuous operation#Europe, oldest university in Norway. Originally named the Royal Frederick Univ ...
, and the
Norwegian Computing Centre for the Humanities Norwegian, Norwayan, or Norsk may refer to: *Something of, from, or related to Norway, a country in northwestern Europe *Norwegians, both a nation and an ethnic group native to Norway *Demographics of Norway *Norwegian language, including the two ...
,
Bergen Bergen (, ) is a city and municipalities of Norway, municipality in Vestland county on the Western Norway, west coast of Norway. Bergen is the list of towns and cities in Norway, second-largest city in Norway after the capital Oslo. By May 20 ...
, to provide a British counterpart to the
Brown Corpus The Brown University Standard Corpus of Present-Day American English, better known as simply the Brown Corpus, is an electronic collection of text samples of American English, the first major structured Text_corpus, corpus of varied genres. This ...
compiled by
Henry Kučera Henry Kučera (15 February 1925 – 20 February 2010), born Jindřich Kučera (), was a Czech-American linguist who pioneered corpus linguistics, linguistic software, a major contributor to the ''American Heritage Dictionary'', and a pioneer i ...
and W. Nelson Francis for American English in the 1960s. Its composition was designed to match the original Brown corpus in terms of its size and genres as closely as possible using documents published in the UK in 1961 by British authors. Both corpora consist of 500 samples each comprising about 2000 words in the following genres: The chief compilers of the LOB corpous were Geoffrey Leech (Lancaster University) and Stig Johansson (University of Oslo); see Leech & Johansson (2009). The corpus has been also tagged, i.e.
part-of-speech In grammar, a part of speech or part-of-speech ( abbreviated as POS or PoS, also known as word class or grammatical category) is a category of words (or, more generally, of lexical items) that have similar grammatical properties. Words that are ...
categories have been assigned to every word.


References


External links

* Stig, Johansson, Geoffrey N. Leech, and Helen Goodluck.
Manual of information to accompany the Lancaster-Oslo : Bergen Corpus of British English, for use with digital computers
Department of English, University of Oslo, (1978).
LOB Corpus from the Oxford Text Archive
1970s establishments in the United Kingdom 1970s establishments in Norway 1970s works Lancaster University University of Oslo English corpora Linguistic research Applied linguistics Corpora {{english-lang-stub