In digital
lexicography
Lexicography is the study of lexicons, and is divided into two separate academic disciplines. It is the art of compiling dictionaries.
* Practical lexicography is the art or craft of compiling, writing and editing dictionaries.
* Theoret ...
,
natural language processing
Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
, and
digital humanities
Digital humanities (DH) is an area of scholarly activity at the intersection of computing or digital technologies and the disciplines of the humanities. It includes the systematic use of digital resources in the humanities, as well as the analy ...
, a
lexical resource is a
language resource consisting of
data
In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
regarding the
lexemes of the
lexicon of one or more
language
Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of ...
s e.g., in the form of a
database
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
.
Characteristics
Different standards for the machine-readable edition of lexical resources exist, e.g.,
Lexical Markup Framework (LMF) an
ISO standard for encoding lexical resources, comprising an abstract data model and an
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. ...
serialization, and
OntoLex-Lemon
OntoLex is the short name of a vocabulary for lexical resources in the web of data (OntoLex-Lemon) and the short name of the W3C community group that created it (W3C Ontology-Lexica Community Group).
OntoLex-Lemon vocabulary
The OntoLex-Lemon v ...
, an
RDF vocabulary for publishing lexical resources as
knowledge graph
The Google Knowledge Graph is a knowledge base from which Google serves relevant information in an infobox beside its search results. This allows the user to see the answer in a glance. The data is generated automatically from a variety of so ...
s on the web, e.g., as
Linguistic Linked Open Data.
Depending on the type of languages that are addressed, a lexical resource may be qualified as
monolingual,
bilingual
Multilingualism is the use of more than one language, either by an individual speaker or by a group of speakers. It is believed that multilingual speakers outnumber monolingual speakers in the world's population. More than half of all Eu ...
or
multilingual
Multilingualism is the use of more than one language, either by an individual speaker or by a group of speakers. It is believed that multilingual speakers outnumber monolingual speakers in the world's population. More than half of all E ...
. For bilingual and multilingual lexical resources, the words may be connected or not connected from one language to another. When connected, the
equivalence
Equivalence or Equivalent may refer to:
Arts and entertainment
*Album-equivalent unit, a measurement unit in the music industry
*Equivalence class (music)
*''Equivalent VIII'', or ''The Bricks'', a minimalist sculpture by Carl Andre
*'' Equival ...
from a language to another is performed through a bilingual link (for bilingual lexical resources, e.g., using the relation ''vartrans:translatableAs'' in
OntoLex-Lemon
OntoLex is the short name of a vocabulary for lexical resources in the web of data (OntoLex-Lemon) and the short name of the W3C community group that created it (W3C Ontology-Lexica Community Group).
OntoLex-Lemon vocabulary
The OntoLex-Lemon v ...
) or through multilingual notations (for multilingual lexical resources, e.g., by reference to the same ''ontolex:Concept'' in OntoLex-Lemon).
It is possible also to build and manage a lexical resource consisting of different lexicons of the same language, for instance, one dictionary for general words and one or several dictionaries for different specialized domains.
Machine-readable dictionary vs. NLP dictionary
Lexical resources in digital
lexicography
Lexicography is the study of lexicons, and is divided into two separate academic disciplines. It is the art of compiling dictionaries.
* Practical lexicography is the art or craft of compiling, writing and editing dictionaries.
* Theoret ...
are often referred to as machine-readable dictionary (''MRD''), a
dictionary stored as machine (computer) data instead of being printed on paper. It is an
electronic dictionary and lexical database. The term MRD is often contrasted with
NLP dictionary, in the sense that an MRD is the electronic form of a dictionary which was printed before on paper. Although being both used by programs, in contrast, the term NLP dictionary is preferred when the dictionary was built from scratch with NLP in mind.
[Gil Francopoulo (edited by) LMF Lexical Markup Framework, ISTE / Wiley 2013 ()]
Lexical database
A lexical database is a lexical resource which has an associated software environment
database
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
which permits access to its contents. The database may be custom-designed for the lexical information or a general-purpose database into which lexical information has been entered.
Information typically stored in a lexical database includes
spelling
Spelling is a set of conventions that regulate the way of using graphemes (writing system) to represent a language in its written form. In other words, spelling is the rendering of speech sound (phoneme) into writing (grapheme). Spelling is one ...
,
lexical category and
synonyms
A synonym is a word, morpheme, or phrase that means exactly or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are ...
of words, as well as
semantic
Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and comput ...
and
phonological
Phonology is the branch of linguistics that studies how languages or dialects systematically organize their sounds or, for sign languages, their constituent parts of signs. The term can also refer specifically to the sound or sign system of a ...
relations between different words or sets of words.
See also
*
Lexical Markup Framework (LMF),
ISO standard for encoding lexical resources, comprising an abstract data model and an
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. ...
serialization
*
OntoLex-Lemon
OntoLex is the short name of a vocabulary for lexical resources in the web of data (OntoLex-Lemon) and the short name of the W3C community group that created it (W3C Ontology-Lexica Community Group).
OntoLex-Lemon vocabulary
The OntoLex-Lemon v ...
,
RDF vocabulary for publishing lexical resources on the web, e.g., as
Linguistic Linked Open Data
*
LREC conference series
*
Machine-readable dictionary
*
WordNet
WordNet is a lexical database of semantic relations between words in more than 200 languages. WordNet links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into ''synsets'' with short definit ...
*
Arabic Ontology
References
External links
The WordNet Home PageLexicographic Search Engine
{{Natural language processing
Lexis (linguistics)
Translation databases
Computational linguistics
Dictionaries by type