HOME

TheInfoList



OR:

In
information extraction Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. In most of the cases this activity concer ...
, a named entity is a real-world object, such as a person, location, organization, product, etc., that can be denoted with a
proper name A proper noun is a noun that identifies a single entity and is used to refer to that entity (''Africa'', ''Jupiter'', ''Sarah'', ''Microsoft)'' as distinguished from a common noun, which is a noun that refers to a class of entities (''continent, ...
. It can be abstract or have a physical existence. Examples of named entities include
Barack Obama Barack Hussein Obama II ( ; born August 4, 1961) is an American politician who served as the 44th president of the United States from 2009 to 2017. A member of the Democratic Party (United States), Democratic Party, Obama was the first Af ...
,
New York City New York, often called New York City or NYC, is the List of United States cities by population, most populous city in the United States. With a 2020 population of 8,804,190 distributed over , New York City is also the L ...
,
Volkswagen Golf The Volkswagen Golf () is a compact car/small family car (C-segment) produced by the German automotive manufacturer Volkswagen since 1974, marketed worldwide across eight generations, in various body configurations and under various nameplates ...
, or anything else that can be named. Named entities can simply be viewed as entity instances (e.g.,
New York City New York, often called New York City or NYC, is the List of United States cities by population, most populous city in the United States. With a 2020 population of 8,804,190 distributed over , New York City is also the L ...
is an instance of a
city A city is a human settlement of notable size.Goodall, B. (1987) ''The Penguin Dictionary of Human Geography''. London: Penguin.Kuper, A. and Kuper, J., eds (1996) ''The Social Science Encyclopedia''. 2nd edition. London: Routledge. It can be def ...
). From a historical perspective, the term ''Named Entity'' was coined during the MUC-6 evaluation campaign and contained ENAMEX (entity name expressions e.g. persons, locations and organizations) and NUMEX (numerical expression). A more formal definition can be derived from the
rigid designator In modal logic and the philosophy of language, a term is said to be a rigid designator or absolute substantial term when it designates (picks out, denotes, refers to) the same thing in ''all possible worlds'' in which that thing exists. A designat ...
by
Saul Kripke Saul Aaron Kripke (; November 13, 1940 – September 15, 2022) was an American philosopher and logician in the analytic tradition. He was a Distinguished Professor of Philosophy at the Graduate Center of the City University of New York and e ...
. In the expression "Named Entity", the word "Named" aims to restrict the possible set of entities to only those for which one or many rigid designators stands for the referent. A designator is rigid when it designates the same thing in every possible world. On the contrary, flaccid designators may designate different things in different possible worlds. As an example, consider the sentence, "Biden is the president of the United States". Both "Biden" and the "United States" are named entities since they refer to specific objects ( Joe Biden and
United States The United States of America (U.S.A. or USA), commonly known as the United States (U.S. or US) or America, is a country Continental United States, primarily located in North America. It consists of 50 U.S. state, states, a Washington, D.C., ...
). However, "president" is not a named entity since it can be used to refer to many different objects in different worlds (in different presidential periods referring to different persons, or even in different countries or organizations referring to different people). Rigid designators usually include proper names as well as certain natural terms like biological species and substances. There is also a general agreement in the
Named Entity Recognition Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre ...
community to consider temporal and numerical expressions as named entities, such as amounts of money and other types of units, which may violate the rigid designator perspective. The task of recognizing named entities in text is
Named Entity Recognition Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre ...
while the task of determining the identity of the named entities mentioned in text is called Named Entity Disambiguation. Both tasks require dedicated algorithms and resources to be addressed.


See also

* Named-entity recognition (also referred to as entity identification, entity chunking and entity extraction) * Entity linking (also referred to as named entity linking (NEL), named entity disambiguation (NED), named entity recognition and disambiguation (NERD) or named entity normalization) *
Information extraction Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. In most of the cases this activity concer ...
* Knowledge extraction *
Text mining Text mining, also referred to as ''text data mining'', similar to text analytics, is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extract ...
(also referred to as text data mining) * Truecasing *
Apache OpenNLP The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as language detection, tokenization, sentence segmentation, part-of-speech tagging, named en ...
* spaCy *
General Architecture for Text Engineering General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for many nat ...
*
Natural Language Toolkit The Natural Language Toolkit, or more commonly NLTK, is a suite of Library (computer science), libraries and programs for symbolic and statistical natural language processing (NLP) for English written in the Python (programming language), Python ...


References

{{Reflist zh-yue:有名實體 Natural language processing Computational linguistics