The Bank of English (BoE) is a representative subset of the 4.5 billion words
COBUILD corpus
Corpus (plural ''corpora'') is Latin for "body". It may refer to:
Linguistics
* Text corpus, in linguistics, a large and structured set of texts
* Speech corpus, in linguistics, a large set of speech audio files
* Corpus linguistics, a branch of ...
, a collection of English texts. These are mainly British in origin, but content from North America, Australia, New Zealand, South Africa and other
Commonwealth
A commonwealth is a traditional English term for a political community founded for the common good. The noun "commonwealth", meaning "public welfare, general good or advantage", dates from the 15th century. Originally a phrase (the common-wealth ...
countries is also being included.
The majority of the texts are from written English, collected from websites, newspapers, magazines and books. There is also a large component of spoken data using material from radio, TV and informal conversations. The Bank of English totals 650 million running words.
The Collins Corpus
/ref> Copies of the corpus are held both at HarperCollins
HarperCollins Publishers LLC is a British–American publishing company that is considered to be one of the "Big Five (publishers), Big Five" English-language publishers, along with Penguin Random House, Hachette Book Group USA, Hachette, Macmi ...
Publishers and the University of Birmingham
The University of Birmingham (informally Birmingham University) is a Public university, public research university in Birmingham, England. It received its royal charter in 1900 as a successor to Queen's College, Birmingham (founded in 1825 as ...
. The version at Birmingham can be accessed for academic research.
The Bank of English forms part of the ''Collins Word Web'' together with the French, German and Spanish corpora.
See also
* Corpus of Contemporary American English
The Corpus of Contemporary American English (COCA) is a one-billion-word corpus of contemporary American English. It was created by Mark Davies, retired professor of corpus linguistics at Brigham Young University (BYU).
Content
The Corpus of C ...
(COCA)
* British National Corpus
The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British English of the late 20th century from a wide variety of genres, with the intention ...
(BNC)
References
External links
COBUILD Reference
{{Corpus linguistics
English corpora
Online databases
Applied linguistics
Linguistic research
Corpora