HOME

TheInfoList



OR:

Mathematical linguistics is the application of
mathematics Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
to model phenomena and solve problems in general
linguistics Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...
and
theoretical linguistics Theoretical linguistics is a term in linguistics that, like the related term general linguistics, can be understood in different ways. Both can be taken as a reference to the theory of language, or the branch of linguistics that inquires into the ...
. Mathematical linguistics has a significant amount of overlap with
computational linguistics Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics ...
.


Discrete Mathematics

Discrete mathematics Discrete mathematics is the study of mathematical structures that can be considered "discrete" (in a way analogous to discrete variables, having a bijection with the set of natural numbers) rather than "continuous" (analogously to continuous f ...
is used in language modeling, including formal grammars, language representation, and historical linguistic trends.


Set Theory

Semantic classes, word classes, natural classes, and the allophonic variations of each
phoneme A phoneme () is any set of similar Phone (phonetics), speech sounds that are perceptually regarded by the speakers of a language as a single basic sound—a smallest possible Phonetics, phonetic unit—that helps distinguish one word fr ...
in a language are all examples of applied
set theory Set theory is the branch of mathematical logic that studies Set (mathematics), sets, which can be informally described as collections of objects. Although objects of any kind can be collected into a set, set theory – as a branch of mathema ...
. Set theory and concatenation theory are used extensively in phonetics and phonology.


Combinatorics

In
phonotactics Phonotactics (from Ancient Greek 'voice, sound' and 'having to do with arranging') is a branch of phonology that deals with restrictions in a language on the permissible combinations of phonemes. Phonotactics defines permissible syllable struc ...
,
combinatorics Combinatorics is an area of mathematics primarily concerned with counting, both as a means and as an end to obtaining results, and certain properties of finite structures. It is closely related to many other areas of mathematics and has many ...
is useful for determining which sequences of phonemes are permissible in a given language, and for calculating the total number of possible syllables or words, based on a given set of phonological constraints. Combinatorics on words can reveal patterns within words, morphemes, and sentences.


Finite-State Transducers

Context-sensitive rewriting rules of the form ''a'' → ''b'' / ''c'' _ ''d'', used in
linguistics Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...
to model
phonological rule A phonological rule is a formal way of expressing a systematic phonological or morphophonological process in linguistics. Phonological rules are commonly used in generative phonology as a notation to capture sound-related operations and computati ...
s and sound change, are computationally equivalent to finite-state transducers, provided that application is nonrecursive, i.e. the rule is not allowed to rewrite the same substring twice. Weighted FSTs found applications in
natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
, including machine translation, and in
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
. An implementation for
part-of-speech tagging In corpus linguistics, part-of-speech tagging (POS tagging, PoS tagging, or POST), also called grammatical tagging, is the process of marking up a word in a text ( corpus) as corresponding to a particular part of speech, based on both its defini ...
can be found as one component of the OpenGrm library.


Algorithms

Optimality theory Optimality theory (frequently abbreviated OT) is a linguistic model proposing that the observed forms of language arise from the optimal satisfaction of conflicting constraints. OT differs from other approaches to phonological analysis, which ty ...
(OT) and maximum entropy (Maxent) phonotactics use
algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
ic approaches when evaluating candidate forms (phoneme strings) for determining the phonotactic constraints of a language.


Graph Theory

Trees have several applications in linguistics, including: * Parsing trees * Sentence diagrams * Semantic networks *
Language family A language family is a group of languages related through descent from a common ancestor, called the proto-language of that family. The term ''family'' is a metaphor borrowed from biology, with the tree model used in historical linguistics ...
trees *
Etymology Etymology ( ) is the study of the origin and evolution of words—including their constituent units of sound and meaning—across time. In the 21st century a subfield within linguistics, etymology has become a more rigorously scientific study. ...
trees Other graphs that are used in linguistics include: * Weighted graphs, which are used to model the lexical similarity between different languages (after computing lexicostatistics). * Lattice graphs, which can model
optimality theory Optimality theory (frequently abbreviated OT) is a linguistic model proposing that the observed forms of language arise from the optimal satisfaction of conflicting constraints. OT differs from other approaches to phonological analysis, which ty ...
.


Formal linguistics

Formal linguistics is the branch of
linguistics Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...
which uses
formal language In logic, mathematics, computer science, and linguistics, a formal language is a set of strings whose symbols are taken from a set called "alphabet". The alphabet of a formal language consists of symbols that concatenate into strings (also c ...
s,
formal grammar A formal grammar is a set of Terminal and nonterminal symbols, symbols and the Production (computer science), production rules for rewriting some of them into every possible string of a formal language over an Alphabet (formal languages), alphabe ...
s and
first-order logic First-order logic, also called predicate logic, predicate calculus, or quantificational logic, is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantified variables over ...
al expressions for the analysis of
natural language A natural language or ordinary language is a language that occurs naturally in a human community by a process of use, repetition, and change. It can take different forms, typically either a spoken language or a sign language. Natural languages ...
s. Since the 1980s, the term is often used to refer to Chomskyan linguistics. Generative models of formal linguistics, such as
head-driven phrase structure grammar Head-driven phrase structure grammar (HPSG) is a highly lexicalized, constraint-based grammar developed by Carl Pollard and Ivan Sag. It is a type of phrase structure grammar, as opposed to a dependency grammar, and it is the immediate successor t ...
, have also been used in natural language processing.


Logic

Logic is used to model
syntax In linguistics, syntax ( ) is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituenc ...
, formal semantics, and
pragmatics In linguistics and the philosophy of language, pragmatics is the study of how Context (linguistics), context contributes to meaning. The field of study evaluates how human language is utilized in social interactions, as well as the relationship ...
.
Modal logic Modal logic is a kind of logic used to represent statements about Modality (natural language), necessity and possibility. In philosophy and related fields it is used as a tool for understanding concepts such as knowledge, obligation, and causality ...
can model syntax that employs different
grammatical mood In linguistics, grammatical mood is a grammatical feature of verbs, used for signaling modality. That is, it is the use of verbal inflections that allow speakers to express their attitude toward what they are saying (for example, a statement ...
s. Most linguistic universals (e.g. Greenberg's linguistic universals) employ
propositional logic The propositional calculus is a branch of logic. It is also called propositional logic, statement logic, sentential calculus, sentential logic, or sometimes zeroth-order logic. Sometimes, it is called ''first-order'' propositional logic to contra ...
. Lexical relations between words can be determined based on whether a pair of words satisfies conditional propositions.


Semiotics

Methods of formal linguistics were introduced by semioticians such as
Charles Sanders Peirce Charles Sanders Peirce ( ; September 10, 1839 – April 19, 1914) was an American scientist, mathematician, logician, and philosopher who is sometimes known as "the father of pragmatism". According to philosopher Paul Weiss (philosopher), Paul ...
and Louis Hjelmslev. Building on the work of
David Hilbert David Hilbert (; ; 23 January 1862 – 14 February 1943) was a German mathematician and philosopher of mathematics and one of the most influential mathematicians of his time. Hilbert discovered and developed a broad range of fundamental idea ...
and
Rudolf Carnap Rudolf Carnap (; ; 18 May 1891 – 14 September 1970) was a German-language philosopher who was active in Europe before 1935 and in the United States thereafter. He was a major member of the Vienna Circle and an advocate of logical positivism. ...
, Hjelmslev proposed the use of formal grammars to analyse, generate and explain language in his 1943 book ''Prolegomena to a Theory of Language''. In this view, language is regarded as arising from a mathematical relationship between meaning and form. The formal description of language was further developed by linguists including J. R. Firth and Simon Dik, giving rise to modern grammatical frameworks such as
systemic functional linguistics Systemic functional linguistics (SFL) is an approach to linguistics, among functional linguistics, that considers language as a social semiotic system. It was devised by Michael Halliday, who took the notion of system from J. R. Firth, his ...
and functional discourse grammar. Computational methods have been developed by the framework functional generative description among others.
Dependency grammar Dependency grammar (DG) is a class of modern Grammar, grammatical theories that are all based on the dependency relation (as opposed to the ''constituency relation'' of Phrase structure grammar, phrase structure) and that can be traced back prima ...
, created by French structuralist Lucien Tesnière, has been used widely in
natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
.


Differential Equations & Multivariate Calculus

The
Fast Fourier Transform A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). A Fourier transform converts a signal from its original domain (often time or space) to a representation in ...
, Kalman filters, and autoencoding are all used in signal processing (advanced phonetics, speech recognition).


Statistics

In linguistics, statistical methods are necessary to describe and validate research results, as well as to understand observations and trends within an area of study.


Corpus statistics

Student's ''t''-test can be used to determine whether the occurrence of a
collocation In corpus linguistics, a collocation is a series of words or terms that co-occur more often than would be expected by chance. In phraseology, a collocation is a type of compositional phraseme, meaning that it can be understood from the words t ...
in a corpus is statistically significant. For a bigram w_1w_2, let P(w_1) = \frac be the unconditional probability of occurrence of w_1 in a corpus with size N, and let P(w_2) = \frac be the unconditional probability of occurrence of w_2 in the corpus. The t-score for the bigram w_1w_2 is calculated as: : t = \frac, where \bar = \frac is the sample mean of the occurrence of w_1w_2, \#w_1w_2 is the number of occurrences of w_1w_2, \mu = P(w_i)P(w_j) is the probability of w_1w_2 under the null-hypothesis that w_1 and w_2 appear independently in the text, and s^2 = \bar(1-\bar) \approx \bar is the sample variance. With a large N, the ''t''-test is equivalent to a ''Z''-test.


Lexicostatistics

Lexicostatistics can model the lexical similarities between languages that share a language family,
sprachbund A sprachbund (, from , 'language federation'), also known as a linguistic area, area of linguistic convergence, or diffusion area, is a group of languages that share areal features resulting from geographical proximity and language contact. Th ...
,
language contact Language contact occurs when speakers of two or more languages or varieties interact with and influence each other. The study of language contact is called contact linguistics. Language contact can occur at language borders, between adstratum ...
, or other historical connections.


Quantitative linguistics

Quantitative linguistics (QL) deals with language learning, language change, and application as well as structure of natural languages. QL investigates languages using statistical methods; its most demanding objective is the formulation of language laws and, ultimately, of a general theory of language in the sense of a set of interrelated languages laws. Synergetic linguistics was from its very beginning specifically designed for this purpose.Reinhard Köhler: ''Synergetic linguistics''. In: Reinhard Köhler, Gabriel Altmann, Rajmund G. Piotrowski (Hrsg.): ''Quantitative Linguistik - Quantitative Linguistics. Ein internationales Handbuch.'' de Gruyter, Berlin/ New York 2005, pp. 760–774. . QL is empirically based on the results of language statistics, a field which can be interpreted as statistics of languages or as statistics of any linguistic object. This field is not necessarily connected to substantial theoretical ambitions.
Corpus linguistics Corpus linguistics is an empirical method for the study of language by way of a text corpus (plural ''corpora''). Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a giv ...
and
computational linguistics Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics ...
are other fields which contribute important
empirical evidence Empirical evidence is evidence obtained through sense experience or experimental procedure. It is of central importance to the sciences and plays a role in various other fields, like epistemology and law. There is no general agreement on how the ...
.


Quantitative comparative linguistics

Quantitative comparative linguistics is a subfield of quantitative linguistics which applies quantitative analysis to
comparative linguistics Comparative linguistics is a branch of historical linguistics that is concerned with comparing languages to establish their historical relatedness. Genetic relatedness implies a common origin or proto-language and comparative linguistics aim ...
. It makes use of lexicostatistics and glottochronology, and the borrowing of
phylogenetics In biology, phylogenetics () is the study of the evolutionary history of life using observable characteristics of organisms (or genes), which is known as phylogenetic inference. It infers the relationship among organisms based on empirical dat ...
from biology.


See Also

*
Computational linguistics Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics ...
* International Linguistics Olympiad


References


Bibliography

* * * * {{DEFAULTSORT:Mathematical Linguistics Mathematical linguistics Applied mathematics Linguistics Formal sciences Computational linguistics