In digital
lexicography
Lexicography is the study of lexicons and the art of compiling dictionaries. It is divided into two separate academic disciplines:
* Practical lexicography is the art or craft of compiling, writing and editing dictionaries.
* Theoretical le ...
,
natural language processing
Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
, and
digital humanities
Digital humanities (DH) is an area of scholarly activity at the intersection of computing or Information technology, digital technologies and the disciplines of the humanities. It includes the systematic use of digital resources in the humanitie ...
, a lexical resource is a
language resource consisting of
data
Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted for ...
regarding the
lexeme
A lexeme () is a unit of lexical meaning that underlies a set of words that are related through inflection. It is a basic abstract unit of meaning, a unit of morphological analysis in linguistics that roughly corresponds to a set of forms ta ...
s of the
lexicon
A lexicon (plural: lexicons, rarely lexica) is the vocabulary of a language or branch of knowledge (such as nautical or medical). In linguistics, a lexicon is a language's inventory of lexemes. The word ''lexicon'' derives from Greek word () ...
of one or more
language
Language is a structured system of communication that consists of grammar and vocabulary. It is the primary means by which humans convey meaning, both in spoken and signed language, signed forms, and may also be conveyed through writing syste ...
s e.g., in the form of a
database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
.
Characteristics
Different standards for the machine-readable edition of lexical resources exist, e.g.,
Lexical Markup Framework (LMF) an
ISO standard
The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries.
Me ...
for encoding lexical resources, comprising an abstract data model and an
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
serialization, and
OntoLex-Lemon, an
RDF vocabulary for publishing lexical resources as
knowledge graph
In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a Graph (discrete mathematics), graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interl ...
s on the web, e.g., as
Linguistic Linked Open Data.
Depending on the type of languages that are addressed, a lexical resource may be qualified as
monolingual
Monoglottism ( Greek μόνος ''monos'', "alone, solitary", + γλῶττα , "tongue, language") or, more commonly, monolingualism or unilingualism, is the condition of being able to speak only a single language, as opposed to multilingualism. ...
,
bilingual
Multilingualism is the use of more than one language, either by an individual speaker or by a group of speakers. When the languages are just two, it is usually called bilingualism. It is believed that multilingual speakers outnumber monolin ...
or
multilingual
Multilingualism is the use of more than one language, either by an individual speaker or by a group of speakers. When the languages are just two, it is usually called bilingualism. It is believed that multilingual speakers outnumber monolin ...
. For bilingual and multilingual lexical resources, the words may be connected or not connected from one language to another. When connected, the
equivalence from a language to another is performed through a bilingual link (for bilingual lexical resources, e.g., using the relation ''vartrans:translatableAs'' in
OntoLex-Lemon) or through multilingual notations (for multilingual lexical resources, e.g., by reference to the same ''ontolex:Concept'' in OntoLex-Lemon).
It is possible also to build and manage a lexical resource consisting of different lexicons of the same language, for instance, one dictionary for general words and one or several dictionaries for different specialized domains.
Machine-readable dictionary vs. NLP dictionary
Lexical resources in digital
lexicography
Lexicography is the study of lexicons and the art of compiling dictionaries. It is divided into two separate academic disciplines:
* Practical lexicography is the art or craft of compiling, writing and editing dictionaries.
* Theoretical le ...
are often referred to as machine-readable dictionary (''MRD''), a
dictionary
A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged Alphabetical order, alphabetically (or by Semitic root, consonantal root for Semitic languages or radical-and-stroke sorting, radical an ...
stored as machine (computer) data instead of being printed on paper. It is an
electronic dictionary
An electronic dictionary is a dictionary whose data exists in digital form and can be accessed through a number of different media. Electronic dictionaries can be found in several forms, including software installed on tablet or desktop computer ...
and lexical database. The term MRD is often contrasted with
NLP dictionary, in the sense that an MRD is the electronic form of a dictionary which was printed before on paper. Although being both used by programs, in contrast, the term NLP dictionary is preferred when the dictionary was built from scratch with NLP in mind.
[Gil Francopoulo (edited by) LMF Lexical Markup Framework, ISTE / Wiley 2013 ()]
Lexical database
A lexical database is a lexical resource which has an associated software environment
database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
which permits access to its contents. The database may be custom-designed for the lexical information or a general-purpose database into which lexical information has been entered.
Information typically stored in a lexical database includes
spelling
Spelling is a set of conventions for written language regarding how graphemes should correspond to the sounds of spoken language. Spelling is one of the elements of orthography, and highly standardized spelling is a prescriptive element.
Spelli ...
,
lexical category and
synonyms
A synonym is a word, morpheme, or phrase that means precisely or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are a ...
of words, as well as
semantic
Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...
and
phonological
Phonology (formerly also phonemics or phonematics: "phonemics ''n.'' 'obsolescent''1. Any procedure for identifying the phonemes of a language from a corpus of data. 2. (formerly also phonematics) A former synonym for phonology, often prefer ...
relations between different words or sets of words.
See also
*
Lexical Markup Framework (LMF),
ISO standard
The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries.
Me ...
for encoding lexical resources, comprising an abstract data model and an
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
serialization
*
OntoLex-Lemon,
RDF vocabulary for publishing lexical resources on the web, e.g., as
Linguistic Linked Open Data
*
LREC conference series
*
Machine-readable dictionary
*
WordNet
WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into ''synsets'' with short definitions and usage examples. It can thu ...
*
Arabic Ontology
Arabic Ontology is a linguistic ontology for the Arabic language, which can be used as an Arabic WordNet with ontologically clean content. People use it also as a tree (i.e. classification) of the concepts/meanings of the Arabic terms. It is a f ...
References
External links
Open English WordNet— Open source fork of the Princeton WordNet
Wordnets in the worlda
Global WordNet AssociationWordNetat
Princeton University
Princeton University is a private university, private Ivy League research university in Princeton, New Jersey, United States. Founded in 1746 in Elizabeth, New Jersey, Elizabeth as the College of New Jersey, Princeton is the List of Colonial ...
(no longer maintained)
Arabic Ontologyat
Birzeit University
Birzeit University () is a public university in the West Bank, Palestine, registered by the Palestinian Ministry of Social Affairs as a charitable organization. It is accredited by the Palestinian Ministry of Education and Higher Education, Mini ...
{{Natural language processing
Lexis (linguistics)
Translation databases
Computational linguistics
Dictionaries by type