Mathematical linguistics is the application of
mathematics
Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
to model phenomena and solve problems in general
linguistics
Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...
and
theoretical linguistics
Theoretical linguistics is a term in linguistics that, like the related term general linguistics, can be understood in different ways. Both can be taken as a reference to the theory of language, or the branch of linguistics that inquires into the ...
. Mathematical linguistics has a significant amount of overlap with
computational linguistics
Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics ...
.
Discrete Mathematics
Discrete mathematics
Discrete mathematics is the study of mathematical structures that can be considered "discrete" (in a way analogous to discrete variables, having a bijection with the set of natural numbers) rather than "continuous" (analogously to continuous f ...
is used in language modeling, including formal grammars, language representation, and historical linguistic trends.
Set Theory
Semantic classes,
word classes,
natural classes, and the
allophonic variations of each
phoneme
A phoneme () is any set of similar Phone (phonetics), speech sounds that are perceptually regarded by the speakers of a language as a single basic sound—a smallest possible Phonetics, phonetic unit—that helps distinguish one word fr ...
in a language are all examples of applied
set theory
Set theory is the branch of mathematical logic that studies Set (mathematics), sets, which can be informally described as collections of objects. Although objects of any kind can be collected into a set, set theory – as a branch of mathema ...
. Set theory and
concatenation theory are used extensively in phonetics and phonology.
Combinatorics
In
phonotactics
Phonotactics (from Ancient Greek 'voice, sound' and 'having to do with arranging') is a branch of phonology that deals with restrictions in a language on the permissible combinations of phonemes. Phonotactics defines permissible syllable struc ...
,
combinatorics
Combinatorics is an area of mathematics primarily concerned with counting, both as a means and as an end to obtaining results, and certain properties of finite structures. It is closely related to many other areas of mathematics and has many ...
is useful for determining which sequences of phonemes are permissible in a given language, and for calculating the total number of possible syllables or words, based on a given set of phonological constraints.
Combinatorics on words can reveal patterns within words, morphemes, and sentences.
Finite-State Transducers
Context-sensitive rewriting rules of the form ''a'' → ''b'' / ''c'' _ ''d'', used in
linguistics
Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...
to model
phonological rule
A phonological rule is a formal way of expressing a systematic phonological or morphophonological process in linguistics. Phonological rules are commonly used in generative phonology as a notation to capture sound-related operations and computati ...
s and
sound change, are computationally equivalent to
finite-state transducers, provided that application is nonrecursive, i.e. the rule is not allowed to rewrite the same substring twice.
Weighted FSTs found applications in
natural language processing
Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
, including
machine translation, and in
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
.
An implementation for
part-of-speech tagging
In corpus linguistics, part-of-speech tagging (POS tagging, PoS tagging, or POST), also called grammatical tagging, is the process of marking up a word in a text ( corpus) as corresponding to a particular part of speech, based on both its defini ...
can be found as one component of the OpenGrm library.
Algorithms
Optimality theory
Optimality theory (frequently abbreviated OT) is a linguistic model proposing that the observed forms of language arise from the optimal satisfaction of conflicting constraints. OT differs from other approaches to phonological analysis, which ty ...
(OT) and maximum entropy (Maxent) phonotactics use
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
ic approaches when evaluating candidate forms (phoneme strings) for determining the phonotactic constraints of a language.
Graph Theory
Trees have several applications in linguistics, including:
*
Parsing trees
*
Sentence diagrams
*
Semantic networks
*
Language family
A language family is a group of languages related through descent from a common ancestor, called the proto-language of that family. The term ''family'' is a metaphor borrowed from biology, with the tree model used in historical linguistics ...
trees
*
Etymology
Etymology ( ) is the study of the origin and evolution of words—including their constituent units of sound and meaning—across time. In the 21st century a subfield within linguistics, etymology has become a more rigorously scientific study. ...
trees
Other graphs that are used in linguistics include:
*
Weighted graphs, which are used to model the lexical similarity between different languages (after computing lexicostatistics).
*
Lattice graphs, which can model
optimality theory
Optimality theory (frequently abbreviated OT) is a linguistic model proposing that the observed forms of language arise from the optimal satisfaction of conflicting constraints. OT differs from other approaches to phonological analysis, which ty ...
.
Formal linguistics
Formal linguistics is the branch of
linguistics
Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...
which uses
formal language
In logic, mathematics, computer science, and linguistics, a formal language is a set of strings whose symbols are taken from a set called "alphabet".
The alphabet of a formal language consists of symbols that concatenate into strings (also c ...
s,
formal grammar
A formal grammar is a set of Terminal and nonterminal symbols, symbols and the Production (computer science), production rules for rewriting some of them into every possible string of a formal language over an Alphabet (formal languages), alphabe ...
s and
first-order logic
First-order logic, also called predicate logic, predicate calculus, or quantificational logic, is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantified variables over ...
al expressions for the analysis of
natural language
A natural language or ordinary language is a language that occurs naturally in a human community by a process of use, repetition, and change. It can take different forms, typically either a spoken language or a sign language. Natural languages ...
s. Since the 1980s, the term is often used to refer to
Chomskyan linguistics. Generative models of formal linguistics, such as
head-driven phrase structure grammar
Head-driven phrase structure grammar (HPSG) is a highly lexicalized, constraint-based grammar
developed by Carl Pollard and Ivan Sag. It is a type of phrase structure grammar, as opposed to a dependency grammar, and it is the immediate successor t ...
, have also been used in natural language processing.
Logic
Logic is used to model
syntax
In linguistics, syntax ( ) is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituenc ...
,
formal semantics, and
pragmatics
In linguistics and the philosophy of language, pragmatics is the study of how Context (linguistics), context contributes to meaning. The field of study evaluates how human language is utilized in social interactions, as well as the relationship ...
.
Modal logic
Modal logic is a kind of logic used to represent statements about Modality (natural language), necessity and possibility. In philosophy and related fields
it is used as a tool for understanding concepts such as knowledge, obligation, and causality ...
can model syntax that employs different
grammatical mood
In linguistics, grammatical mood is a grammatical feature of verbs, used for signaling modality. That is, it is the use of verbal inflections that allow speakers to express their attitude toward what they are saying (for example, a statement ...
s. Most
linguistic universals (e.g.
Greenberg's linguistic universals) employ
propositional logic
The propositional calculus is a branch of logic. It is also called propositional logic, statement logic, sentential calculus, sentential logic, or sometimes zeroth-order logic. Sometimes, it is called ''first-order'' propositional logic to contra ...
.
Lexical relations between words can be determined based on whether a pair of words satisfies
conditional propositions.
Semiotics
Methods of formal linguistics were introduced by
semioticians such as
Charles Sanders Peirce
Charles Sanders Peirce ( ; September 10, 1839 – April 19, 1914) was an American scientist, mathematician, logician, and philosopher who is sometimes known as "the father of pragmatism". According to philosopher Paul Weiss (philosopher), Paul ...
and
Louis Hjelmslev. Building on the work of
David Hilbert
David Hilbert (; ; 23 January 1862 – 14 February 1943) was a German mathematician and philosopher of mathematics and one of the most influential mathematicians of his time.
Hilbert discovered and developed a broad range of fundamental idea ...
and
Rudolf Carnap
Rudolf Carnap (; ; 18 May 1891 – 14 September 1970) was a German-language philosopher who was active in Europe before 1935 and in the United States thereafter. He was a major member of the Vienna Circle and an advocate of logical positivism.
...
, Hjelmslev proposed the use of formal grammars to analyse, generate and explain language in his 1943 book ''Prolegomena to a Theory of Language''.
In this view, language is regarded as arising from a mathematical relationship between meaning and form.
The formal description of language was further developed by linguists including
J. R. Firth and
Simon Dik, giving rise to modern grammatical frameworks such as
systemic functional linguistics
Systemic functional linguistics (SFL) is an approach to linguistics, among functional linguistics, that considers language as a social semiotic system.
It was devised by Michael Halliday, who took the notion of system from J. R. Firth, his ...
and
functional discourse grammar. Computational methods have been developed by the framework
functional generative description among others.
Dependency grammar
Dependency grammar (DG) is a class of modern Grammar, grammatical theories that are all based on the dependency relation (as opposed to the ''constituency relation'' of Phrase structure grammar, phrase structure) and that can be traced back prima ...
, created by French
structuralist Lucien Tesnière,
has been used widely in
natural language processing
Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
.
Differential Equations & Multivariate Calculus
The
Fast Fourier Transform
A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). A Fourier transform converts a signal from its original domain (often time or space) to a representation in ...
,
Kalman filters, and
autoencoding are all used in signal processing (advanced phonetics, speech recognition).
Statistics
In linguistics, statistical methods are necessary to describe and validate research results, as well as to understand observations and trends within an area of study.
Corpus statistics
Student's ''t''-test can be used to determine whether the occurrence of a
collocation
In corpus linguistics, a collocation is a series of words or terms that co-occur more often than would be expected by chance. In phraseology, a collocation is a type of compositional phraseme, meaning that it can be understood from the words t ...
in a corpus is statistically significant. For a
bigram , let
be the unconditional probability of occurrence of
in a corpus with size
, and let
be the unconditional probability of occurrence of
in the corpus. The t-score for the bigram
is calculated as:
:
where
is the sample mean of the occurrence of
,
is the number of occurrences of
,
is the probability of
under the null-hypothesis that
and
appear independently in the text, and
is the sample variance. With a large
, the ''t''-test is equivalent to a
''Z''-test.
Lexicostatistics
Lexicostatistics can model the lexical similarities between languages that share a language family,
sprachbund
A sprachbund (, from , 'language federation'), also known as a linguistic area, area of linguistic convergence, or diffusion area, is a group of languages that share areal features resulting from geographical proximity and language contact. Th ...
,
language contact
Language contact occurs when speakers of two or more languages or varieties interact with and influence each other. The study of language contact is called contact linguistics. Language contact can occur at language borders, between adstratum ...
, or other historical connections.
Quantitative linguistics
Quantitative linguistics (QL) deals with language learning, language change, and application as well as structure of natural languages. QL investigates languages using statistical methods; its most demanding objective is the formulation of language laws and, ultimately, of a general
theory of language in the sense of a set of interrelated languages laws.
Synergetic linguistics was from its very beginning specifically designed for this purpose.
[Reinhard Köhler: ''Synergetic linguistics''. In: Reinhard Köhler, Gabriel Altmann, Rajmund G. Piotrowski (Hrsg.): ''Quantitative Linguistik - Quantitative Linguistics. Ein internationales Handbuch.'' de Gruyter, Berlin/ New York 2005, pp. 760–774. .]
QL is empirically based on the results of language statistics, a field which can be interpreted as statistics of languages or as statistics of any linguistic object. This field is not necessarily connected to substantial theoretical ambitions.
Corpus linguistics
Corpus linguistics is an empirical method for the study of language by way of a text corpus (plural ''corpora''). Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a giv ...
and
computational linguistics
Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics ...
are other fields which contribute important
empirical evidence
Empirical evidence is evidence obtained through sense experience or experimental procedure. It is of central importance to the sciences and plays a role in various other fields, like epistemology and law.
There is no general agreement on how the ...
.
Quantitative comparative linguistics
Quantitative comparative linguistics is a subfield of quantitative linguistics which applies
quantitative analysis to
comparative linguistics
Comparative linguistics is a branch of historical linguistics that is concerned with comparing languages to establish their historical relatedness.
Genetic relatedness implies a common origin or proto-language and comparative linguistics aim ...
. It makes use of
lexicostatistics and
glottochronology, and the borrowing of
phylogenetics
In biology, phylogenetics () is the study of the evolutionary history of life using observable characteristics of organisms (or genes), which is known as phylogenetic inference. It infers the relationship among organisms based on empirical dat ...
from biology.
See Also
*
Computational linguistics
Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics ...
*
International Linguistics Olympiad
References
Bibliography
*
*
*
*
{{DEFAULTSORT:Mathematical Linguistics
Mathematical linguistics
Applied mathematics
Linguistics
Formal sciences
Computational linguistics