information retrieval Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form ...

, an index term (also known as subject term, subject heading, descriptor, or keyword) is a term that captures the essence of the topic of a document. Index terms make up a

controlled vocabulary A controlled vocabulary provides a way to organize knowledge for subsequent retrieval. Controlled vocabularies are used in subject indexing schemes, subject headings, thesauri, taxonomies and other knowledge organization systems. Controlled v ...

for use in

bibliographic record A bibliographic record is an entry in a bibliographic index (or a library catalog) which represents and describes a specific resource. A bibliographic record contains the data elements necessary to help users identify and retrieve that resource, as ...

s. They are an integral part of bibliographic control, which is the function by which libraries collect, organize and disseminate documents. They are used as keywords to retrieve documents in an information system, for instance, a catalog or a

search engine A search engine is a software system that provides hyperlinks to web pages, and other relevant information on World Wide Web, the Web in response to a user's web query, query. The user enters a query in a web browser or a mobile app, and the sea ...

. A popular form of keywords on the web are tags, which are directly visible and can be assigned by non-experts. Index terms can consist of a word, phrase, or alphanumerical term. They are created by analyzing the document either manually with

subject indexing Subject indexing is the act of describing or classifying a document A document is a writing, written, drawing, drawn, presented, or memorialized representation of thought, often the manifestation of nonfiction, non-fictional, as well as ...

or automatically with automatic indexing or more sophisticated methods of keyword extraction. Index terms can either come from a controlled vocabulary or be freely assigned. Keywords are stored in a search index. Common words like articles (a, an, the) and conjunctions (and, or, but) are not treated as keywords because it's inefficient. Almost every English-language site on the Internet has the article "''the''", and so it makes no sense to search for it. The most popular search engine,

Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...

removed

stop words Stop words are the words in a stop list (or ''stoplist'' or ''negative dictionary'') which are filtered out ("stopped") before or after processing of natural language data (i.e. text) because they are deemed to have little semantic value or are ot ...

such as "the" and "a" from its indexes for several years, but then re-introduced them, making certain types of precise search possible again. The term "descriptor" was by Calvin Mooers in 1948. It is in particular used about a preferred term from a

thesaurus A thesaurus (: thesauri or thesauruses), sometimes called a synonym dictionary or dictionary of synonyms, is a reference work which arranges words by their meanings (or in simpler terms, a book where one can find different words with similar me ...

. The Simple Knowledge Organization System language (SKOS) provides a way to express index terms with

Resource Description Framework The Resource Description Framework (RDF) is a method to describe and exchange graph data. It was originally designed as a data model for metadata by the World Wide Web Consortium (W3C). It provides a variety of syntax notations and formats, of whi ...

for use in the context of the

Semantic Web The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable. To enable the encoding o ...

In web search engines

Most

web search engine A search engine is a software system that provides hyperlinks to web pages, and other relevant information on World Wide Web, the Web in response to a user's web query, query. The user enters a query in a web browser or a mobile app, and the sea ...

s are designed to search for words anywhere in a document—the title, the body, and so on. This being the case, a keyword can be any term that exists within the document. However, priority is given to words that occur in the title, words that recur numerous times, and words that are explicitly assigned as keywords within the coding. Index terms can be further refined using Boolean operators such as "AND, OR, NOT." "AND" is normally unnecessary as most search engines infer it. "OR" will search for results with one search term or another or both. "NOT" eliminates a word or phrase from the search, getting rid of any results that include it. Multiple words can also be enclosed in quotation marks to turn the individual index terms into a specific index ''phrase''. These modifiers and methods all help to refine search terms, to better maximize the accuracy of search results.CLIO. ''Keyword search''. Columbia University Libraries. Retrieved from http://www.columbia.edu/cu/lweb/help/clio/keyword.html

Author keywords

Author keywords are an integral part of literature. Many journals and databases provide access to index terms made by authors of the respective articles. How qualified the provider is decides the quality of both indexer-provided index terms and author-provided index terms. The quality of these two types of index terms is of research interest, particularly in relation to

. In general, an author will have difficulty providing indexing terms that characterize his or her document ''relative'' to other documents in the database.

Examples

* Canadian Subject Headings (CS) *

Library of Congress Subject Headings The Library of Congress Subject Headings (LCSH) comprise a thesaurus (information retrieval), thesaurus (in the information science sense, a controlled vocabulary) of subject headings, maintained by the United States Library of Congress, for use ...

(LCSH) *

Medical Subject Headings Medical Subject Headings (MeSH) is a comprehensive controlled vocabulary for the purpose of indexing Academic journal, journal articles and books in the Life science, life sciences. It serves as a thesaurus of index terms that facilitates searc ...

(MeSH) * Polythematic Structured Subject Heading System (PSH) *

Subject Headings Authority File The or SWD (translated as ''Subject Headings Authority File'') is a controlled vocabulary index term system used primarily for subject indexing in library catalogs. The SWD is managed by the German National Library (DNB) in cooperation with var ...

(SWD)

In web search engines

Author keywords

Examples

See also

References

Further reading