HOME
*





Scottish Corpus Of Texts And Speech
The Scottish Corpus of Texts & Speech (SCOTS) is an ongoing project to build a corpus of modern-day (post-1940) written and spoken texts in Scottish English and varieties of Scots. SCOTS has been available online since November 2004, and can be freely searched and browsed. It reached 4.7 million words by 2015. The project is a venture by the Department of English Language and STELLA project at the University of Glasgow. SCOTS is grant-funded by the Arts and Humanities Research Council. Language variety SCOTS contains texts in Scottish English and varieties of broad Scots, including Doric, Lallans, urban varieties such as Glaswegian and Insular Scots. SCOTS contains a geographical spread of texts as well as a demographic spread. Each text is accompanied by extensive metadata, including such information as author's decade of birth, gender, occupation, birthplace and place of residence, and details about the text such as publication information, audience, date and genre. Ge ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Text Corpus
In linguistics, a corpus (plural ''corpora'') or text corpus is a language resource consisting of a large and structured set of texts (nowadays usually electronically stored and processed). In corpus linguistics, they are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory. In search technology, a corpus is the collection of documents which is being searched. Overview A corpus may contain texts in a single language (''monolingual corpus'') or text data in multiple languages (''multilingual corpus''). In order to make the corpora more useful for doing linguistic research, they are often subjected to a process known as annotation. An example of annotating a corpus is part-of-speech tagging, or ''POS-tagging'', in which information about each word's part of speech (verb, noun, adjective, etc.) is added to the corpus in the form of ''tags''. Another example is indicating the lemma (ba ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Metadata
Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive information about a resource. It is used for discovery and identification. It includes elements such as title, abstract, author, and keywords. * Structural metadata – metadata about containers of data and indicates how compound objects are put together, for example, how pages are ordered to form chapters. It describes the types, versions, relationships, and other characteristics of digital materials. * Administrative metadata – the information to help manage a resource, like resource type, permissions, and when and how it was created. * Reference metadata – the information about the contents and quality of statistical data. * Statistical metadata – also called process data, may describe processes that collect, process, or produce s ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Corpora
Corpus is Latin for "body". It may refer to: Linguistics * Text corpus, in linguistics, a large and structured set of texts * Speech corpus, in linguistics, a large set of speech audio files * Corpus linguistics, a branch of linguistics Music * ''Corpus'' (album), by Sebastian Santa Maria * Corpus Delicti (band), also known simply as Corpus Medicine * Corpus callosum, a structure in the brain * Corpus cavernosum (other), a pair of structures in human genitals * Corpus luteum, a temporary endocrine structure in mammals * Corpus gastricum, the Latin term referring to the body of the stomach * Corpus alienum, a foreign object originating outside the body * Corpus albicans * Corpora amylacea * Corpora arenacea Other uses * ''Corpus'' (Bernini), a 1650 sculpture of Christ by Gian Lorenzo Bernini * Corpus (museum), a human body themed museum in the Netherlands * Corpus Clock, a large sculptural clock * Corpus (dance troupe), a Canadian dance troupe * Corpus (typogr ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Collocation
In corpus linguistics, a collocation is a series of words or terms that co-occur more often than would be expected by chance. In phraseology, a collocation is a type of compositional phraseme, meaning that it can be understood from the words that make it up. This contrasts with an idiom, where the meaning of the whole cannot be inferred from its parts, and may be completely unrelated. An example of a phraseological collocation is the expression ''strong tea''. While the same meaning could be conveyed by the roughly equivalent ''powerful tea'', this adjective does not modify ''tea'' frequently enough for English speakers to become accustomed to its co-occurrence and regard it as idiomatic or unmarked. (By way of counterexample, ''powerful'' is idiomatically preferred to ''strong'' when modifying a ''computer'' or a ''car''.) There are about six main types of collocations: adjective + noun, noun + noun (such as collective nouns), verb + noun, ad ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Key Word In Context
Key Word In Context (KWIC) is the most common format for concordance lines. The term KWIC was first coined by Hans Peter Luhn. The system was based on a concept called ''keyword in titles'' which was first proposed for Manchester libraries in 1864 by Andrea Crestadoro. A KWIC index is formed by sorting and aligning the words within an article title to allow each word (except the stop words) in titles to be searchable alphabetically in the index. It was a useful indexing method for technical manuals before computerized full text search became common. For example, a search query including all of the words in an example definition ("KWIC is an acronym for Key Word In Context, the most common format for concordance lines") and the Wikipedia slogan in English ("the free encyclopedia"), searched against a Wikipedia page, might yield a KWIC index as follows. A KWIC index usually uses a wide layout to allow the display of maximum 'in context' information (not shown in the following ex ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Plain Text
In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects (floating-point numbers, images, etc.). It may also include a limited number of "whitespace" characters that affect simple arrangement of text, such as spaces, line breaks, or tabulation characters (although tab characters can "mean" many different things, so are hardly "plain"). Plain text is different from formatted text, where style information is included; from structured text, where structural parts of the document such as paragraphs, sections, and the like are identified; and from binary files in which some portions must be interpreted as binary objects (encoded integers, real numbers, images, etc.). The term is sometimes used quite loosely, to mean files that contain ''only'' "readable" content (or just files with nothing that the speaker doesn't prefer). For example, that could exclude any indica ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Genre
Genre () is any form or type of communication in any mode (written, spoken, digital, artistic, etc.) with socially-agreed-upon conventions developed over time. In popular usage, it normally describes a category of literature, music, or other forms of art or entertainment, whether written or spoken, audio or visual, based on some set of stylistic criteria, yet genres can be aesthetic, rhetorical, communicative, or functional. Genres form by conventions that change over time as cultures invent new genres and discontinue the use of old ones. Often, works fit into multiple genres by way of borrowing and recombining these conventions. Stand-alone texts, works, or pieces of communication may have individual styles, but genres are amalgams of these texts based on agreed-upon or socially inferred conventions. Some genres may have rigid, strictly adhered-to guidelines, while others may show great flexibility. Genre began as an absolute classification system for ancient Greek literature, a ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Transcription (linguistics)
Transcription in the linguistic sense is the systematic representation of spoken language in written form. The source can either be utterances (''speech'' or ''sign language'') or preexisting text in another writing system. Transcription should not be confused with translation, which means representing the meaning of text from a source-language in a target language, (e.g. ''Los Angeles'' (from source-language Spanish) means ''The Angels'' in the target language English); or with transliteration, which means representing the spelling of a text from one script to another. In the academic discipline of linguistics, transcription is an essential part of the methodologies of (among others) phonetics, conversation analysis, dialectology, and sociolinguistics. It also plays an important role for several subfields of speech technology. Common examples for transcriptions outside academia are the proceedings of a court hearing such as a criminal trial (by a court reporter) or a phy ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Multimedia
Multimedia is a form of communication that uses a combination of different content forms such as text, audio, images, animations, or video into a single interactive presentation, in contrast to traditional mass media, such as printed material or audio recordings, which features little to no interaction between users. Popular examples of multimedia include video podcasts, audio slideshows and animated videos. Multimedia also contains the principles and application of effective interactive communication such as the building blocks of software, hardware, and other technologies. Multimedia can be recorded for playback on computers, laptops, smartphones, and other electronic devices, either on demand or in real time (streaming). In the early years of multimedia, the term "rich media" was synonymous with interactive multimedia. Over time, hypermedia extensions brought multimedia to the World Wide Web. Terminology The term ''multimedia'' ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Demographic
Demography () is the statistical study of populations, especially human beings. Demographic analysis examines and measures the dimensions and dynamics of populations; it can cover whole societies or groups defined by criteria such as education, nationality, religion, and ethnicity. Educational institutions usually treat demography as a field of sociology, though there are a number of independent demography departments. These methods have primarily been developed to study human populations, but are extended to a variety of areas where researchers want to know how populations of social actors can change across time through processes of birth, death, and migration. In the context of human biological populations, demographic analysis uses administrative records to develop an independent estimate of the population. Demographic analysis estimates are often considered a reliable standard for judging the accuracy of the census information gathered at any time. In the labor ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Scottish English
Scottish English ( gd, Beurla Albannach) is the set of varieties of the English language spoken in Scotland. The transregional, standardised variety is called Scottish Standard English or Standard Scottish English (SSE). Scottish Standard English may be defined as "the characteristic speech of the professional class n Scotlandand the accepted norm in schools". IETF language tag for "Scottish Standard English" is en-scotland. In addition to distinct pronunciation, grammar and expressions, Scottish English has distinctive vocabulary, particularly pertaining to Scottish institutions such as the Church of Scotland, local government and the education and legal systems. Scottish Standard English is at one end of a bipolar linguistic continuum, with focused broad Scots at the other. Scottish English may be influenced to varying degrees by Scots.Stuart-Smith J. ''Scottish English: Phonology'' in Varieties of English: The British Isles, Kortman & Upton (Eds), Mouton de Gruyt ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Geographical
Geography (from Greek: , ''geographia''. Combination of Greek words ‘Geo’ (The Earth) and ‘Graphien’ (to describe), literally "earth description") is a field of science devoted to the study of the lands, features, inhabitants, and phenomena of Earth. The first recorded use of the word γεωγραφία was as a title of a book by Greek scholar Eratosthenes (276–194 BC). Geography is an all-encompassing discipline that seeks an understanding of Earth and its human and natural complexities—not merely where objects are, but also how they have changed and come to be. While geography is specific to Earth, many concepts can be applied more broadly to other celestial bodies in the field of planetary science. One such concept, the first law of geography, proposed by Waldo Tobler, is "everything is related to everything else, but near things are more related than distant things." Geography has been called "the world discipline" and "the bridge between the human and t ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]