HOME

TheInfoList



OR:

The Scottish Corpus of Texts & Speech (SCOTS) is an ongoing project to build a corpus of modern-day (post-1940) written and spoken texts in
Scottish English Scottish English ( gd, Beurla Albannach) is the set of varieties of the English language spoken in Scotland. The transregional, standardised variety is called Scottish Standard English or Standard Scottish English (SSE). Scottish Standa ...
and varieties of Scots. SCOTS has been available online since November 2004, and can be freely searched and browsed. It reached 4.7 million words by 2015. The project is a venture by the Department of English Language and STELLA project at the
University of Glasgow , image = UofG Coat of Arms.png , image_size = 150px , caption = Coat of arms Flag , latin_name = Universitas Glasguensis , motto = la, Via, Veritas, Vita , ...
. SCOTS is grant-funded by the
Arts and Humanities Research Council The Arts and Humanities Research Council (AHRC), formerly Arts and Humanities Research Board (AHRB), is a British research council, established in 1998, supporting research and postgraduate study in the arts and humanities. History The Arts ...
.


Language variety

SCOTS contains texts in
Scottish English Scottish English ( gd, Beurla Albannach) is the set of varieties of the English language spoken in Scotland. The transregional, standardised variety is called Scottish Standard English or Standard Scottish English (SSE). Scottish Standa ...
and varieties of broad Scots, including
Doric Doric may refer to: * Doric, of or relating to the Dorians of ancient Greece ** Doric Greek, the dialects of the Dorians * Doric order, a style of ancient Greek architecture * Doric mode, a synonym of Dorian mode * Doric dialect (Scotland) * Doric ...
, Lallans, urban varieties such as Glaswegian and Insular Scots. SCOTS contains a geographical spread of texts as well as a
demographic Demography () is the statistical study of populations, especially human beings. Demographic analysis examines and measures the dimensions and dynamics of populations; it can cover whole societies or groups defined by criteria such as ed ...
spread. Each text is accompanied by extensive
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
, including such information as author's decade of birth, gender, occupation, birthplace and place of residence, and details about the text such as publication information, audience, date and genre.


Genre and mode

SCOTS is a
multimedia Multimedia is a form of communication that uses a combination of different content forms such as text, audio, images, animations, or video into a single interactive presentation, in contrast to tradit ...
corpus, containing written texts and spoken texts, available as orthographic transcriptions, accompanied by source audio or video files. SCOTS includes a large number of
genre Genre () is any form or type of communication in any mode (written, spoken, digital, artistic, etc.) with socially-agreed-upon conventions developed over time. In popular usage, it normally describes a category of literature, music, or other f ...
s and text types, including prose fiction, poetry, business and personal correspondence, religious texts, parliamentary and administrative documents, emails, conversations and interviews.


Search and analysis

SCOTS can be investigated in various ways, depending on the user's interest. The corpus can be browsed, for example by the author's name or date of the text, and all texts can be downloaded in
plain text In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects (floating-point numbers, images, etc.). It may also include a limit ...
format. Transcriptions are synchronised with audio / video files, which are streamed and may also be downloaded. An Advanced Search facility allows the user to build up more complex queries, choosing from all the fields available in the
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
. Geographical results are plotted on an interactive map, so regional variation may be investigated. Advanced Search results can also be viewed as a KWIC concordance, which can be reordered to highlight
collocation In corpus linguistics, a collocation is a series of words or terms that co-occur more often than would be expected by chance. In phraseology, a collocation is a type of compositional phraseme, meaning that it can be understood from the words ...
al patterns.


References


External links

* Scots language Scottish English Corpora University of Glasgow Applied linguistics Linguistic research {{Germanic-lang-stub