HOME

TheInfoList



OR:

In digital
lexicography Lexicography is the study of lexicons, and is divided into two separate academic disciplines. It is the art of compiling dictionaries. * Practical lexicography is the art or craft of compiling, writing and editing dictionaries. * Theoret ...
,
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
, and
digital humanities Digital humanities (DH) is an area of scholarly activity at the intersection of computing or digital technologies and the disciplines of the humanities. It includes the systematic use of digital resources in the humanities, as well as the analy ...
, a lexical resource is a language resource consisting of
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
regarding the lexemes of the lexicon of one or more
language Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of ...
s e.g., in the form of a
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
.


Characteristics

Different standards for the machine-readable edition of lexical resources exist, e.g., Lexical Markup Framework (LMF) an ISO standard for encoding lexical resources, comprising an abstract data model and an
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. ...
serialization, and
OntoLex-Lemon OntoLex is the short name of a vocabulary for lexical resources in the web of data (OntoLex-Lemon) and the short name of the W3C community group that created it (W3C Ontology-Lexica Community Group). OntoLex-Lemon vocabulary The OntoLex-Lemon v ...
, an RDF vocabulary for publishing lexical resources as
knowledge graph The Google Knowledge Graph is a knowledge base from which Google serves relevant information in an infobox beside its search results. This allows the user to see the answer in a glance. The data is generated automatically from a variety of so ...
s on the web, e.g., as Linguistic Linked Open Data. Depending on the type of languages that are addressed, a lexical resource may be qualified as monolingual,
bilingual Multilingualism is the use of more than one language, either by an individual speaker or by a group of speakers. It is believed that multilingual speakers outnumber monolingual speakers in the world's population. More than half of all Eu ...
or
multilingual Multilingualism is the use of more than one language, either by an individual speaker or by a group of speakers. It is believed that multilingual speakers outnumber monolingual speakers in the world's population. More than half of all E ...
. For bilingual and multilingual lexical resources, the words may be connected or not connected from one language to another. When connected, the
equivalence Equivalence or Equivalent may refer to: Arts and entertainment *Album-equivalent unit, a measurement unit in the music industry *Equivalence class (music) *''Equivalent VIII'', or ''The Bricks'', a minimalist sculpture by Carl Andre *'' Equival ...
from a language to another is performed through a bilingual link (for bilingual lexical resources, e.g., using the relation ''vartrans:translatableAs'' in
OntoLex-Lemon OntoLex is the short name of a vocabulary for lexical resources in the web of data (OntoLex-Lemon) and the short name of the W3C community group that created it (W3C Ontology-Lexica Community Group). OntoLex-Lemon vocabulary The OntoLex-Lemon v ...
) or through multilingual notations (for multilingual lexical resources, e.g., by reference to the same ''ontolex:Concept'' in OntoLex-Lemon). It is possible also to build and manage a lexical resource consisting of different lexicons of the same language, for instance, one dictionary for general words and one or several dictionaries for different specialized domains.


Machine-readable dictionary vs. NLP dictionary

Lexical resources in digital
lexicography Lexicography is the study of lexicons, and is divided into two separate academic disciplines. It is the art of compiling dictionaries. * Practical lexicography is the art or craft of compiling, writing and editing dictionaries. * Theoret ...
are often referred to as machine-readable dictionary (''MRD''), a dictionary stored as machine (computer) data instead of being printed on paper. It is an electronic dictionary and lexical database. The term MRD is often contrasted with NLP dictionary, in the sense that an MRD is the electronic form of a dictionary which was printed before on paper. Although being both used by programs, in contrast, the term NLP dictionary is preferred when the dictionary was built from scratch with NLP in mind.Gil Francopoulo (edited by) LMF Lexical Markup Framework, ISTE / Wiley 2013 ()


Lexical database

A lexical database is a lexical resource which has an associated software environment
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
which permits access to its contents. The database may be custom-designed for the lexical information or a general-purpose database into which lexical information has been entered. Information typically stored in a lexical database includes
spelling Spelling is a set of conventions that regulate the way of using graphemes (writing system) to represent a language in its written form. In other words, spelling is the rendering of speech sound (phoneme) into writing (grapheme). Spelling is one ...
, lexical category and
synonyms A synonym is a word, morpheme, or phrase that means exactly or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are ...
of words, as well as
semantic Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and comput ...
and
phonological Phonology is the branch of linguistics that studies how languages or dialects systematically organize their sounds or, for sign languages, their constituent parts of signs. The term can also refer specifically to the sound or sign system of a ...
relations between different words or sets of words.


See also

* Lexical Markup Framework (LMF), ISO standard for encoding lexical resources, comprising an abstract data model and an
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. ...
serialization *
OntoLex-Lemon OntoLex is the short name of a vocabulary for lexical resources in the web of data (OntoLex-Lemon) and the short name of the W3C community group that created it (W3C Ontology-Lexica Community Group). OntoLex-Lemon vocabulary The OntoLex-Lemon v ...
, RDF vocabulary for publishing lexical resources on the web, e.g., as Linguistic Linked Open Data * LREC conference series * Machine-readable dictionary *
WordNet WordNet is a lexical database of semantic relations between words in more than 200 languages. WordNet links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into ''synsets'' with short definit ...
* Arabic Ontology


References


External links


The WordNet Home Page

Lexicographic Search Engine
{{Natural language processing Lexis (linguistics) Translation databases Computational linguistics Dictionaries by type