Data-oriented parsing (DOP, also data-oriented processing) is a
probabilistic
Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...
model in
computational linguistics
Computational linguistics is an Interdisciplinarity, interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, comput ...
. DOP was conceived by
Remko Scha in 1990 with the aim of developing a
performance
A performance is an act of staging or presenting a play, concert, or other form of entertainment. It is also defined as the action or process of carrying out or accomplishing an action, task, or function.
Management science
In the work place ...
-oriented grammar framework. Unlike other probabilistic models, DOP takes into account all subtrees contained in a
treebank
In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empir ...
rather than being restricted to, for example, 2-level subtrees (like
PCFGs), thus allowing for more context-sensitive information.
Several variants of DOP have been developed. The initial version developed by Rens Bod in 1992 was based on
tree-substitution grammar,
[R. Bod, A computational model of language performance: Data oriented parsing, in: COLING 1992 Volume 3: The 15th International Conference on Computational Linguistics, https://www.aclweb.org/anthology/C92-3126.pdf] while more recently, DOP has been combined with
lexical-functional grammar (LFG). The resulting DOP-LFG finds an application in
machine translation
Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation or interactive translation), is a sub-field of computational linguistics that investigates t ...
. Other work on learning and
parameter estimation
Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their val ...
for DOP has also found its way into machine translation.
References
External links
Remko Scha Research on DOP Khalil Sima'an: Learning DOP models from treebanks; Computational Complexity* Andy Way (1999). A hybrid architecture for robust MT using LFG-DOP.
Journal of Experimental and Theoretical Artificial Intelligence 11(3):441–471.
{{comp-ling-stub
Grammar frameworks
Natural language parsing