HOME

TheInfoList



OR:

Compound-term processing, in information-retrieval, is search result matching on the basis of compound terms. Compound terms are built by combining two or more simple terms; for example, "triple" is a single word term, but "triple heart bypass" is a compound term. Compound-term processing is a new approach to an old problem: how can one improve the relevance of search results while maintaining ease of use? Using this technique, a search for ''survival rates following a triple heart bypass in elderly people'' will locate documents about this topic even if this precise phrase is not contained in any document. This can be performed by a concept search, which itself uses compound-term processing. This will extract the key concepts automatically (in this case "survival rates", "triple heart bypass" and "elderly people") and use these concepts to select the most relevant documents.


Techniques

In August 2003,
Concept Searching Limited Concept Searching Limited is a software company that specializes in information retrieval software. It has products for Enterprise search, Taxonomy Management and Statistical classification. History Concept Searching was founded in 2002 in th ...
introduced the idea of using statistical compound-term processing. CLAMOUR is a European collaborative project which aims to find a better way to classify when collecting and disseminating industrial information and statistics. CLAMOUR appears to use a linguistic approach, rather than one based on
statistical modelling A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized for ...
.


History

Techniques for probabilistic weighting of single word terms date back to at least 1976 in the landmark publication by Stephen E. Robertson and Karen Spärck Jones. Robertson stated that the assumption of word independence is not justified and exists as a matter of mathematical convenience. His objection to the term independence is not a new idea, dating back to at least 1964 when H. H. Williams stated that " e assumption of independence of words in a document is usually made as a matter of mathematical convenience". In 2004, Anna Lynn Patterson filed patents on "phrase-based searching in an information retrieval system" to which
Google Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
subsequently acquired the rights.Google Acquires Cuil Patent Applications
/ref>


Adaptability

Statistical compound-term processing is more adaptable than the process described by Patterson. Her process is targeted at searching the
World Wide Web The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet. Documents and downloadable media are made available to the network through web ...
where an extensive statistical knowledge of common searches can be used to identify candidate phrases. Statistical compound term processing is more suited to
enterprise search Enterprise search is the practice of making content from multiple enterprise-type sources, such as databases and intranets, searchable to a defined audience. "Enterprise search" is used to describe the software of search information within an ente ...
applications where such
a priori ("from the earlier") and ("from the later") are Latin phrases used in philosophy to distinguish types of knowledge, justification, or argument by their reliance on empirical evidence or experience. knowledge is independent from current ex ...
knowledge is not available. Statistical compound-term processing is also more adaptable than the linguistic approach taken by the CLAMOUR project, which must consider the syntactic properties of the terms (i.e. part of speech, gender, number, etc.) and their combinations. CLAMOUR is highly language-dependent, whereas the statistical approach is language-independent.


Applications

Compound-term processing allows information-retrieval applications, such as
search engines A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in ...
, to perform their matching on the basis of multi-word concepts, rather than on single words in isolation which can be highly ambiguous. Early search engines looked for documents containing the words entered by the user into the search box . These are known as keyword search engines.
Boolean search In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original text ...
engines add a degree of sophistication by allowing the user to specify additional requirements. For example, "Tiger NEAR Woods AND (golf OR golfing) NOT Volkswagen" uses the operators "NEAR", "AND", "OR" and "NOT" to specify that these words must follow certain requirements. A phrase search is simpler to use, but requires that the exact phrase specified appear in the results.


See also

*
Concept Searching Limited Concept Searching Limited is a software company that specializes in information retrieval software. It has products for Enterprise search, Taxonomy Management and Statistical classification. History Concept Searching was founded in 2002 in th ...
*
Enterprise search Enterprise search is the practice of making content from multiple enterprise-type sources, such as databases and intranets, searchable to a defined audience. "Enterprise search" is used to describe the software of search information within an ente ...
*
Information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other c ...


References

{{DEFAULTSORT:Compound Term Processing Information retrieval techniques