German Reference Corpus
The German Reference Corpus (original: Deutsches Referenzkorpus; short: DeReKo) is an electronic archive of text corpora of contemporary written German. It was first created in 1964 and is hosted at the Institute for the German Language (Leibniz Institute for the German Language, : IDS) in Mannheim, Germany. The corpus archive is continuously updated and expanded. It currently comprises more than 4.0 billion word tokens (as of August 2010) and constitutes the largest linguistically motivated collection of contemporary German texts. Today, it is one of the major resources worldwide for the study of written German. Alternative names The German Reference Corpus is often referred to by other names, such as ''Mannheim corpora'', ''IDS corpora'', ''COSMAS corpora'' and the corresponding German translations. The name ''Deutsches Referenzkorpus (DeReKo)'' was originally used for a specific portion of the current archive which was collected between 1999 and 2002 by a number of institutio ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Text Corpus
In linguistics, a corpus (plural ''corpora'') or text corpus is a language resource consisting of a large and structured set of texts (nowadays usually electronically stored and processed). In corpus linguistics, they are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory. In search technology, a corpus is the collection of documents which is being searched. Overview A corpus may contain texts in a single language (''monolingual corpus'') or text data in multiple languages (''multilingual corpus''). In order to make the corpora more useful for doing linguistic research, they are often subjected to a process known as annotation. An example of annotating a corpus is part-of-speech tagging, or ''POS-tagging'', in which information about each word's part of speech (verb, noun, adjective, etc.) is added to the corpus in the form of ''tags''. Another example is indicating the lemma (b ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Corpora
Corpus is Latin language, Latin for "body". It may refer to: Linguistics * Text corpus, in linguistics, a large and structured set of texts * Speech corpus, in linguistics, a large set of speech audio files * Corpus linguistics, a branch of linguistics Music * Corpus (album), ''Corpus'' (album), by Sebastian Santa Maria * Corpus Delicti (band), also known simply as Corpus Medicine * Corpus callosum, a structure in the brain * Corpus cavernosum (other), a pair of structures in human genitals * Corpus luteum, a temporary endocrine structure in mammals * Corpus gastricum, the Latin term referring to the body of the stomach * Corpus alienum, a foreign object originating outside the body * Corpus albicans * Corpora amylacea * Corpora arenacea Other uses * Corpus (Bernini), ''Corpus'' (Bernini), a 1650 sculpture of Christ by Gian Lorenzo Bernini * Corpus (museum), a human body themed museum in the Netherlands * Corpus Clock, a large sculptural clock * Corpus (dance troupe), a ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Germanic Philology
Germanic philology is the philological study of the Germanic languages, particularly from a comparative or historical perspective. The beginnings of research into the Germanic languages began in the 16th century, with the discovery of literary texts in the earlier phases of the languages. Early modern publications dealing with Old Norse culture appeared in the 16th century, e.g. ''Historia de gentibus septentrionalibus'' (Olaus Magnus, 1555) and the first edition of the 13th century '' Gesta Danorum'' ( Saxo Grammaticus), in 1514. In 1603, Melchior Goldast made the first edition of Middle High German poetry, Tyrol and Winsbeck, including a commentary which focused on linguistic problems and set the tone for the approach to such works in the subsequent centuries. He later gave similar attention to the Old High German Benedictine Rule. In England, Cotton's studies of the manuscripts in his collection marks the beginnings of work on Old English language. The pace of publication in ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Oxford English Corpus
The Oxford English Corpus (OEC) is a text corpus of 21st-century English, used by the makers of the ''Oxford English Dictionary'' and by Oxford University Press' language research programme. It is the largest corpus of its kind, containing nearly 2.1 billion words. It includes language from the UK, the United States, Ireland, Australia, New Zealand, the Caribbean, Canada, India, Singapore, and South Africa. The text is mainly collected from web pages; some printed texts, such as academic journals, have been collected to supplement particular subject areas. The sources are writings of all sorts, from "literary novels and specialist journals to everyday newspapers and magazines and from Hansard to the language of blogs, emails, and social media". This may be contrasted with similar databases that sample only a specific kind of writing. The corpus is generally available only to researchers at Oxford University Press, but other researchers who can demonstrate a strong need may apply ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Corpus Of Contemporary American English
The Corpus of Contemporary American English (COCA) is a one-billion-word corpus of contemporary American English. It was created by Mark Davies, retired professor of corpus linguistics at Brigham Young University (BYU). Content The Corpus of Contemporary American English (COCA) is composed of one billion words as of November 2021. The corpus is constantly growing: In 2009 it contained more than 385 million words; In 2010 the corpus grew in size to 400 million words; By March 2019, the corpus had grown to 560 million words. As of November 2021, the Corpus of Contemporary American English is composed of 485,202 texts. According to the corpus website, the current corpus (November 2021) is composed of texts that include 24-25 million words for each year 1990-2019. For each year contained in the corpus (1990-2019), the corpus is evenly divided between six registers/genres: TV/movies, spoken, fiction, magazine, newspaper, and academic (see Texts and Registers page of the COCA websi ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Bank Of English
The Bank of English is a representative subset of the 4.5 billion words COBUILD corpus, a collection of English texts. These are mainly British in origin, but content from North America, Australia, New Zealand, South Africa and other Commonwealth countries is also being included. The majority of the texts are from written English, collected from websites, newspapers, magazines and books. There is also a large component of spoken data using material from radio, TV and informal conversations. The Bank of English totals 650 million running words. Copies of the corpus are held both at HarperCollins Publishers and the University of Birmingham. The version at Birmingham can be accessed for academic research. The Bank of English forms part of the ''Collins Word Web'' together with the French, German and Spanish corpora. See also * Corpus of Contemporary American English The Corpus of Contemporary American English (COCA) is a one-billion-word corpus of contemporary American English. ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
American National Corpus
The American National Corpus (ANC) is a text corpus of American English containing 22 million words of written and spoken data produced since 1990. Currently, the ANC includes a range of genres, including emerging genres such as email, tweets, and web data that are not included in earlier corpora such as the British National Corpus. It is annotated for part of speech and lemma, shallow parse, and named entities. The ANC is available from the Linguistic Data Consortium. A fifteen million word subset of the corpus, called the Open American National Corpus (OANC), is freely available with no restrictions on its use from the ANC Website. The corpus and its annotations are provided according to the specifications of ISO/TC 37 SC4's Linguistic Annotation Framework. By using a freely provided transduction tool (ANC2Go), the corpus and user-chosen annotations are provided in multiple formats, including CoNLL IOB format, the XML format conformant to the XML Corpus Encoding Standard (XCES) ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Corpus Linguistics
Corpus linguistics is the study of a language as that language is expressed in its text corpus (plural ''corpora''), its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference. The text-corpus method uses the body of texts written in any natural language to derive the set of abstract rules which govern that language. Those results can be used to explore the relationships between that subject language and other languages which have undergone a similar analysis. The first such corpora were manually derived from source texts, but now that work is automated. Corpora have not only been used for linguistics research, they have also been used to compile dictionaries (starting with ''The American Heritage Dictionary of the English Language'' in 1969) and grammar guides, such as '' A Comprehensive Grammar ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Virtual Corpus
Virtual may refer to: * Virtual (horse), a thoroughbred racehorse * Virtual channel, a channel designation which differs from that of the actual radio channel (or range of frequencies) on which the signal travels * Virtual function, a programming function or method whose behaviour can be overridden within an inheriting class by a function with the same signature * Virtual machine, the virtualization of a computer system * Virtual meeting, or web conferencing * Virtual memory, a memory management technique that abstracts the memory address space in a computer * Virtual particle, a type of short-lived particle of indeterminate mass * Virtual reality (virtuality), computer programs with an interface that gives the user the impression that they are physically inside a simulated space * Virtual world, a computer-based simulated environment populated by many users who can create a personal avatar, and simultaneously and independently explore the world, participate in its activities and co ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
German Language
German ( ) is a West Germanic language mainly spoken in Central Europe. It is the most widely spoken and official or co-official language in Germany, Austria, Switzerland, Liechtenstein, and the Italian province of South Tyrol. It is also a co-official language of Luxembourg and Belgium, as well as a national language in Namibia. Outside Germany, it is also spoken by German communities in France ( Bas-Rhin), Czech Republic (North Bohemia), Poland ( Upper Silesia), Slovakia (Bratislava Region), and Hungary ( Sopron). German is most similar to other languages within the West Germanic language branch, including Afrikaans, Dutch, English, the Frisian languages, Low German, Luxembourgish, Scots, and Yiddish. It also contains close similarities in vocabulary to some languages in the North Germanic group, such as Danish, Norwegian, and Swedish. German is the second most widely spoken Germanic language after English, which is also a West Germanic language. German ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Primordial Sample
Primordial may refer to: * Primordial era, an era after the Big Bang. See Chronology of the universe * Primordial sea (a.k.a. primordial ocean, ooze or soup). See Abiogenesis * Primordial nuclide, nuclides, a few radioactive, that formed before the Earth existed and are stable enough to still occur on Earth * Primordial elements, elements formed before the Earth came into existence * Primordial narcissism, the psychological condition of prenatal existence * Primordialism, the argument which contends that nations are ancient, natural phenomena * Primordial (band), Irish heavy metal band Religion and mythology * The Primordial Tradition, a school of religious philosophy * Primordial Greek gods, a group of Greek deities born in the beginning of our universe * Primordial Buddha, a self-emanating, self-originating Buddha * Primordial covenant, God's covenant with humanity in Islam See also * * Primal (other) * Primeval (other) * Primitive (other) Primi ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |