HOME

TheInfoList



OR:

Query understanding is the process of inferring the intent of a
search engine A search engine is a software system that provides hyperlinks to web pages, and other relevant information on World Wide Web, the Web in response to a user's web query, query. The user enters a query in a web browser or a mobile app, and the sea ...
user by extracting semantic meaning from the searcher’s keywords. Query understanding methods generally take place before the search engine retrieves and ranks results. It is related to
natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
but specifically focused on the understanding of search queries.


Methods


Stemming and lemmatization

Many languages inflect words to reflect their role in the utterance they appear in. The variation between various forms of a word is likely to be of little importance for the relatively coarse-grained model of meaning involved in a retrieval system, and for this reason the task of conflating the various forms of a word is a potentially useful technique to increase recall of a retrieval system. Stemming algorithms, also known as stemmers, typically use a collection of simple rules to remove
suffix In linguistics, a suffix is an affix which is placed after the stem of a word. Common examples are case endings, which indicate the grammatical case of nouns and adjectives, and verb endings, which form the conjugation of verbs. Suffixes can ca ...
es intended to model the language’s inflection rules. For some languages, there are simple lemmatisation methods to reduce a word in query to its lemma or
root In vascular plants, the roots are the plant organ, organs of a plant that are modified to provide anchorage for the plant and take in water and nutrients into the plant body, which allows plants to grow taller and faster. They are most often bel ...
form or its stem; for others, this operation involves non-trivial string processing and may require recognizing the word's part of speech or referencing a lexical database. The effectiveness of stemming and lemmatization varies across languages.


Query Segmentation

Query segmentation is a key component of query understanding, aiming to divide a query into meaningful segments. Traditional approaches, such as the bag-of-words model, treat individual words as independent units, which can limit interpretative accuracy. For languages like Chinese, where words are not separated by spaces, segmentation is essential, as individual characters often lack standalone meaning. Even in English, the BOW model may not capture the full meaning, as certain phrases—such as "New York"—carry significance as a whole rather than as isolated terms. By identifying phrases or entities within queries, query segmentation enhances interpretation, enabling search engines to apply proximity and ordering constraints, ultimately improving search accuracy and user satisfaction.


Entity recognition

Entity recognition is the process of locating and classifying entities within a text string. Named-entity recognition specifically focuses on named entities, such as names of people, places, and organizations. In addition, entity recognition includes identifying concepts in queries that may be represented by multi-word phrases. Entity recognition systems typically use grammar-based linguistic techniques or statistical
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
models.


Query rewriting

Query rewriting is the process of automatically reformulating a search query to more accurately capture its intent. Query expansion adds additional query terms, such as synonyms, in order to retrieve more documents and thereby increase recall. Query relaxation removes query terms to reduce the requirements for a document to match the query, thereby also increasing recall. Other forms of query rewriting, such as automatically converting consecutive query terms into
phrase In grammar, a phrasecalled expression in some contextsis a group of words or singular word acting as a grammatical unit. For instance, the English language, English expression "the very happy squirrel" is a noun phrase which contains the adject ...
s and restricting query terms to specific fields, aim to increase precision.


Spelling Correction

Automatic spelling correction is a critical feature of modern search engines, designed to address common spelling errors in user queries. Such errors are especially frequent as users often search for unfamiliar topics. By correcting misspelled queries, search engines enhance their understanding of user intent, thereby improving the relevance and quality of search results and overall user experience.


External links


Proceedings of ACM SIGIR 2011 Workshop on Query Representation and Understanding

Query Understanding for Search Engines (Yi Chang and Hongbo Deng, Eds.)


References

{{reflist Information retrieval techniques Natural language processing