Computational linguistics is an
interdisciplinary
Interdisciplinarity or interdisciplinary studies involves the combination of multiple academic disciplines into one activity (e.g., a research project). It draws knowledge from several fields such as sociology, anthropology, psychology, economi ...
field concerned with the
computational modelling of
natural language
A natural language or ordinary language is a language that occurs naturally in a human community by a process of use, repetition, and change. It can take different forms, typically either a spoken language or a sign language. Natural languages ...
, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics draws upon
linguistics
Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...
,
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
,
artificial intelligence
Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
,
mathematics
Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
,
logic
Logic is the study of correct reasoning. It includes both formal and informal logic. Formal logic is the study of deductively valid inferences or logical truths. It examines how conclusions follow from premises based on the structure o ...
,
philosophy
Philosophy ('love of wisdom' in Ancient Greek) is a systematic study of general and fundamental questions concerning topics like existence, reason, knowledge, Value (ethics and social sciences), value, mind, and language. It is a rational an ...
,
cognitive science
Cognitive science is the interdisciplinary, scientific study of the mind and its processes. It examines the nature, the tasks, and the functions of cognition (in a broad sense). Mental faculties of concern to cognitive scientists include percep ...
,
cognitive psychology
Cognitive psychology is the scientific study of human mental processes such as attention, language use, memory, perception, problem solving, creativity, and reasoning.
Cognitive psychology originated in the 1960s in a break from behaviorism, whi ...
,
psycholinguistics
Psycholinguistics or psychology of language is the study of the interrelation between linguistic factors and psychological aspects. The discipline is mainly concerned with the mechanisms by which language is processed and represented in the mind ...
,
anthropology
Anthropology is the scientific study of humanity, concerned with human behavior, human biology, cultures, society, societies, and linguistics, in both the present and past, including archaic humans. Social anthropology studies patterns of behav ...
and
neuroscience
Neuroscience is the scientific study of the nervous system (the brain, spinal cord, and peripheral nervous system), its functions, and its disorders. It is a multidisciplinary science that combines physiology, anatomy, molecular biology, ...
, among others. Computational linguistics is closely related to
mathematical linguistics.
Origins
The field overlapped with
artificial intelligence
Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
since the efforts in the United States in the 1950s to use computers to automatically translate texts from foreign languages, particularly Russian scientific journals, into English. Since rule-based approaches were able to make
arithmetic
Arithmetic is an elementary branch of mathematics that deals with numerical operations like addition, subtraction, multiplication, and division. In a wider sense, it also includes exponentiation, extraction of roots, and taking logarithms.
...
(systematic) calculations much faster and more accurately than humans, it was expected that
lexicon
A lexicon (plural: lexicons, rarely lexica) is the vocabulary of a language or branch of knowledge (such as nautical or medical). In linguistics, a lexicon is a language's inventory of lexemes. The word ''lexicon'' derives from Greek word () ...
,
morphology,
syntax
In linguistics, syntax ( ) is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituenc ...
and
semantics
Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...
can be learned using explicit rules, as well. After the
failure of rule-based approaches,
David Hays coined the term in order to distinguish the field from AI and co-founded both the
Association for Computational Linguistics (ACL) and the
International Committee on Computational Linguistics (ICCL) in the 1970s and 1980s. What started as an effort to translate between languages evolved into a much wider field of
natural language processing
Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
.
Annotated corpora
In order to be able to meticulously study the
English language
English is a West Germanic language that developed in early medieval England and has since become a English as a lingua franca, global lingua franca. The namesake of the language is the Angles (tribe), Angles, one of the Germanic peoples th ...
, an annotated text corpus was much needed. The Penn
Treebank was one of the most used corpora. It consisted of IBM computer manuals, transcribed telephone conversations, and other texts, together containing over 4.5 million words of American English, annotated using both
part-of-speech tagging and syntactic bracketing.
Japanese sentence corpora were analyzed and a pattern of
log-normality was found in relation to sentence length.
Modeling language acquisition
The fact that during
language acquisition
Language acquisition is the process by which humans acquire the capacity to perceive and comprehend language. In other words, it is how human beings gain the ability to be aware of language, to understand it, and to produce and use words and s ...
, children are largely only exposed to positive evidence, meaning that the only evidence for what is a correct form is provided, and no evidence for what is not correct,
[Braine, M.D.S. (1971). On two types of models of the internalization of grammars. In D.I. Slobin (Ed.), The ontogenesis of grammar: A theoretical perspective. New York: Academic Press.] was a limitation for the models at the time because the now available
deep learning models were not available in late 1980s.
[Powers, D.M.W. & Turk, C.C.R. (1989). ''Machine Learning of Natural Language''. Springer-Verlag. .]
It has been shown that languages can be learned with a combination of simple input presented incrementally as the child develops better memory and longer attention span,
which explained the long period of
language acquisition
Language acquisition is the process by which humans acquire the capacity to perceive and comprehend language. In other words, it is how human beings gain the ability to be aware of language, to understand it, and to produce and use words and s ...
in human infants and children.
[
Robots have been used to test linguistic theories. Enabled to learn as children might, models were created based on an affordance model in which mappings between actions, perceptions, and effects were created and linked to spoken words. Crucially, these robots were able to acquire functioning word-to-meaning mappings without needing grammatical structure.
Using the Price equation and Pólya urn dynamics, researchers have created a system which not only predicts future linguistic evolution but also gives insight into the evolutionary history of modern-day languages.
]
Chomsky's theories
Noam Chomsky
Avram Noam Chomsky (born December 7, 1928) is an American professor and public intellectual known for his work in linguistics, political activism, and social criticism. Sometimes called "the father of modern linguistics", Chomsky is also a ...
's theories have influenced computational linguistics, particularly in understanding how infants learn complex grammatical structures, such as those described in Chomsky normal form. Attempts have been made to determine how an infant learns a "non-normal grammar" as theorized by Chomsky normal form.[ Research in this area combines structural approaches with computational models to analyze large linguistic corpora like the Penn Treebank, helping to uncover patterns in language acquisition.]
See also
* Artificial intelligence in fiction
* Collostructional analysis
* Computational lexicology
* ''Computational Linguistics'' (journal)
* Computational models of language acquisition
* Computational semantics
Computational semantics is the study of how to automate the process of constructing and reasoning with semantics, meaning representations of natural language expressions. It consequently plays an important role in natural language processing, nat ...
* Computational semiotics
* Computer-assisted reviewing
* Dialog systems
* Glottochronology
* Grammar induction
* Human speechome project
* Internet linguistics
* Lexicostatistics
* Natural language processing
Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
* Natural language user interface
* Quantitative linguistics
* Semantic relatedness
* Semantometrics
* Systemic functional linguistics
Systemic functional linguistics (SFL) is an approach to linguistics, among functional linguistics, that considers language as a social semiotic system.
It was devised by Michael Halliday, who took the notion of system from J. R. Firth, his ...
* Translation memory
* Universal Networking Language
References
Further reading
*
* Steven Bird, Ewan Klein, and Edward Loper (2009). ''Natural Language Processing with Python''. O'Reilly Media. .
* Daniel Jurafsky and James H. Martin (2008). ''Speech and Language Processing'', 2nd edition. Pearson Prentice Hall. .
* Mohamed Zakaria KURDI (2016). ''Natural Language Processing and Computational Linguistics: speech, morphology, and syntax'', Volume 1. ISTE-Wiley. .
* Mohamed Zakaria KURDI (2017). ''Natural Language Processing and Computational Linguistics: semantics, discourse, and applications'', Volume 2. ISTE-Wiley. .
External links
Association for Computational Linguistics (ACL)
*
ACL Anthology of research papers
*
ACL Wiki for Computational Linguistics
CICLing annual conferences on Computational Linguistics
Computational Linguistics – Applications workshop
*
Language Technology World
The Research Group in Computational Linguistics
{{DEFAULTSORT:Computational Linguistics
Formal sciences
Cognitive science
Computational fields of study
*