HOME

TheInfoList



OR:

In digital
lexicography Lexicography is the study of lexicons and the art of compiling dictionaries. It is divided into two separate academic disciplines: * Practical lexicography is the art or craft of compiling, writing and editing dictionaries. * Theoretical le ...
,
natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
, and
digital humanities Digital humanities (DH) is an area of scholarly activity at the intersection of computing or Information technology, digital technologies and the disciplines of the humanities. It includes the systematic use of digital resources in the humanitie ...
, a lexical resource is a language resource consisting of
data Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted for ...
regarding the
lexeme A lexeme () is a unit of lexical meaning that underlies a set of words that are related through inflection. It is a basic abstract unit of meaning, a unit of morphological analysis in linguistics that roughly corresponds to a set of forms ta ...
s of the
lexicon A lexicon (plural: lexicons, rarely lexica) is the vocabulary of a language or branch of knowledge (such as nautical or medical). In linguistics, a lexicon is a language's inventory of lexemes. The word ''lexicon'' derives from Greek word () ...
of one or more
language Language is a structured system of communication that consists of grammar and vocabulary. It is the primary means by which humans convey meaning, both in spoken and signed language, signed forms, and may also be conveyed through writing syste ...
s e.g., in the form of a
database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
.


Characteristics

Different standards for the machine-readable edition of lexical resources exist, e.g., Lexical Markup Framework (LMF) an
ISO standard The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries. Me ...
for encoding lexical resources, comprising an abstract data model and an
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
serialization, and OntoLex-Lemon, an RDF vocabulary for publishing lexical resources as
knowledge graph In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a Graph (discrete mathematics), graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interl ...
s on the web, e.g., as Linguistic Linked Open Data. Depending on the type of languages that are addressed, a lexical resource may be qualified as
monolingual Monoglottism ( Greek μόνος ''monos'', "alone, solitary", + γλῶττα , "tongue, language") or, more commonly, monolingualism or unilingualism, is the condition of being able to speak only a single language, as opposed to multilingualism. ...
,
bilingual Multilingualism is the use of more than one language, either by an individual speaker or by a group of speakers. When the languages are just two, it is usually called bilingualism. It is believed that multilingual speakers outnumber monolin ...
or
multilingual Multilingualism is the use of more than one language, either by an individual speaker or by a group of speakers. When the languages are just two, it is usually called bilingualism. It is believed that multilingual speakers outnumber monolin ...
. For bilingual and multilingual lexical resources, the words may be connected or not connected from one language to another. When connected, the equivalence from a language to another is performed through a bilingual link (for bilingual lexical resources, e.g., using the relation ''vartrans:translatableAs'' in OntoLex-Lemon) or through multilingual notations (for multilingual lexical resources, e.g., by reference to the same ''ontolex:Concept'' in OntoLex-Lemon). It is possible also to build and manage a lexical resource consisting of different lexicons of the same language, for instance, one dictionary for general words and one or several dictionaries for different specialized domains.


Machine-readable dictionary vs. NLP dictionary

Lexical resources in digital
lexicography Lexicography is the study of lexicons and the art of compiling dictionaries. It is divided into two separate academic disciplines: * Practical lexicography is the art or craft of compiling, writing and editing dictionaries. * Theoretical le ...
are often referred to as machine-readable dictionary (''MRD''), a
dictionary A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged Alphabetical order, alphabetically (or by Semitic root, consonantal root for Semitic languages or radical-and-stroke sorting, radical an ...
stored as machine (computer) data instead of being printed on paper. It is an
electronic dictionary An electronic dictionary is a dictionary whose data exists in digital form and can be accessed through a number of different media. Electronic dictionaries can be found in several forms, including software installed on tablet or desktop computer ...
and lexical database. The term MRD is often contrasted with NLP dictionary, in the sense that an MRD is the electronic form of a dictionary which was printed before on paper. Although being both used by programs, in contrast, the term NLP dictionary is preferred when the dictionary was built from scratch with NLP in mind.Gil Francopoulo (edited by) LMF Lexical Markup Framework, ISTE / Wiley 2013 ()


Lexical database

A lexical database is a lexical resource which has an associated software environment
database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
which permits access to its contents. The database may be custom-designed for the lexical information or a general-purpose database into which lexical information has been entered. Information typically stored in a lexical database includes
spelling Spelling is a set of conventions for written language regarding how graphemes should correspond to the sounds of spoken language. Spelling is one of the elements of orthography, and highly standardized spelling is a prescriptive element. Spelli ...
, lexical category and
synonyms A synonym is a word, morpheme, or phrase that means precisely or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are a ...
of words, as well as
semantic Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...
and
phonological Phonology (formerly also phonemics or phonematics: "phonemics ''n.'' 'obsolescent''1. Any procedure for identifying the phonemes of a language from a corpus of data. 2. (formerly also phonematics) A former synonym for phonology, often prefer ...
relations between different words or sets of words.


See also

* Lexical Markup Framework (LMF),
ISO standard The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries. Me ...
for encoding lexical resources, comprising an abstract data model and an
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
serialization * OntoLex-Lemon, RDF vocabulary for publishing lexical resources on the web, e.g., as Linguistic Linked Open Data * LREC conference series * Machine-readable dictionary *
WordNet WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into ''synsets'' with short definitions and usage examples. It can thu ...
*
Arabic Ontology Arabic Ontology is a linguistic ontology for the Arabic language, which can be used as an Arabic WordNet with ontologically clean content. People use it also as a tree (i.e. classification) of the concepts/meanings of the Arabic terms. It is a f ...


References


External links


Open English WordNet
— Open source fork of the Princeton WordNet
Wordnets in the world
a
Global WordNet Association

WordNet
at
Princeton University Princeton University is a private university, private Ivy League research university in Princeton, New Jersey, United States. Founded in 1746 in Elizabeth, New Jersey, Elizabeth as the College of New Jersey, Princeton is the List of Colonial ...
(no longer maintained)
Arabic Ontology
at
Birzeit University Birzeit University () is a public university in the West Bank, Palestine, registered by the Palestinian Ministry of Social Affairs as a charitable organization. It is accredited by the Palestinian Ministry of Education and Higher Education, Mini ...
{{Natural language processing Lexis (linguistics) Translation databases Computational linguistics Dictionaries by type