Machine-readable dictionary
   HOME

TheInfoList



OR:

Machine-readable dictionary (''MRD'') is a dictionary stored as machine (computer) data instead of being printed on paper. It is an
electronic dictionary An electronic dictionary is a dictionary whose data exists in digital form and can be accessed through a number of different media. Electronic dictionaries can be found in several forms, including software installed on tablet or desktop computers ...
and
lexical database In digital lexicography, natural language processing, and digital humanities, a lexical resource is a language resource consisting of data regarding the lexemes of the lexicon of one or more languages e.g., in the form of a database. Character ...
. A machine-readable dictionary is a dictionary in an electronic form that can be loaded in a database and can be queried via application software. It may be a single language explanatory dictionary or a multi-language dictionary to support translations between two or more languages or a combination of both. Translation software between multiple languages usually apply bidirectional dictionaries. An MRD may be a dictionary with a proprietary structure that is queried by dedicated software (for example online via internet) or it can be a dictionary that has an open structure and is available for loading in computer databases and thus can be used via various software applications. Conventional dictionaries contain a lemma with various descriptions. A machine-readable dictionary may have additional capabilities and is therefore sometimes called a smart dictionary. An example of a smart dictionary is the Open Source
Gellish English dictionary The Gellish English Dictionary-Taxonomy is an example of an open-source “smart” electronic dictionary, in which concepts are arranged in a subtype-supertype hierarchy, thus forming a taxonomy. The dictionary-taxonomy is machine readable. It i ...
.
The term dictionary is also used to refer to an electronic
vocabulary A vocabulary is a set of familiar words within a person's language. A vocabulary, usually developed with age, serves as a useful and fundamental tool for communication and acquiring knowledge. Acquiring an extensive vocabulary is one of the la ...
or lexicon as used for example in spelling checkers. If dictionaries are arranged in a subtype-supertype hierarchy of concepts (or terms) then it is called a
taxonomy Taxonomy is the practice and science of categorization or classification. A taxonomy (or taxonomical classification) is a scheme of classification, especially a hierarchical classification, in which things are organized into groups or types. ...
. If it also contains other relations between the concepts, then it is called an
ontology In metaphysics, ontology is the philosophical study of being, as well as related concepts such as existence, becoming, and reality. Ontology addresses questions like how entities are grouped into categories and which of these entities exi ...
. Search engines may use either a vocabulary, a taxonomy or an ontology to optimise the search results. Specialised electronic dictionaries are morphological dictionaries or syntactic dictionaries. The term MRD is often contrasted with NLP dictionary, in the sense that an MRD is the electronic form of a dictionary which was printed before on paper. Although being both used by programs, in contrast, the term NLP dictionary is preferred when the dictionary was built from scratch with NLP in mind. An ISO standard for MRD and NLP is able to represent both structures and is called
Lexical Markup Framework Language resource management - Lexical markup framework (LMF; ISO 24613:2008), is the International Organization for Standardization ISO/TC37 standard for natural language processing (NLP) and machine-readable dictionary (MRD) lexicons. The scope ...
.Gil Francopoulo (edited by) LMF Lexical Markup Framework, ISTE / Wiley 2013 ()


History

The first widely distributed MRDs were the Merriam-Webster Seventh Collegiate (W7) and the Merriam-Webster New Pocket Dictionary (MPD). Both were produced by a government-funded project at
System Development Corporation System Development Corporation (SDC) was a computer software company based in Santa Monica, California. Founded in 1955, it is considered the first company of its kind. History SDC began as the systems engineering group for the SAGE air-defens ...
under the direction of John Olney. They were manually keyboarded as no typesetting tapes of either book were available. Originally each was distributed on multiple reels of magnetic tape as card images with each separate word of each definition on a separate punch card with numerous special codes indicating the details of its usage in the printed dictionary. Olney outlined a grand plan for the analysis of the definitions in the dictionary, but his project expired before the analysis could be carried out. Robert Amsler at the University of Texas at Austin resumed the analysis and completed a taxonomic description of the Pocket Dictionary under
National Science Foundation The National Science Foundation (NSF) is an independent agency of the United States government that supports fundamental research and education in all the non-medical fields of science and engineering. Its medical counterpart is the National ...
funding, however his project expired before the taxonomic data could be distributed. Roy Byrd et al. at IBM Yorktown Heights resumed analysis of the Webster's Seventh Collegiate following Amsler's work. Finally, in the 1980s starting with initial support from Bellcore and later funded by various U.S. federal agencies, including NSF, ARDA,
DARPA The Defense Advanced Research Projects Agency (DARPA) is a research and development agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military. Originally known as the Ad ...
, DTO, and REFLEX,
George Armitage Miller George Armitage Miller (February 3, 1920 – July 22, 2012) was an American psychologist who was one of the founders of cognitive psychology, and more broadly, of cognitive science. He also contributed to the birth of psycholinguistics. Mille ...
and Christiane Fellbaum at Princeton University completed the creation and wide distribution of a dictionary and its taxonomy in the
WordNet WordNet is a lexical database of semantic relations between words in more than 200 languages. WordNet links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into '' synsets'' with short defin ...
project, which today stands as the most widely distributed computational lexicology resource.


References

{{Natural language processing Computational linguistics Dictionaries by type Lexicography