RetrievalWare
   HOME

TheInfoList



OR:

RetrievalWare is an
enterprise search engine Enterprise (or the archaic spelling Enterprize) may refer to: Business and economics Brands and enterprises * Enterprise GP Holdings, an energy holding company * Enterprise plc, a UK civil engineering and maintenance company * Enterprise ...
emphasizing
natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
and
semantic networks A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, ...
which was commercially available from 1992 to 2007 and is especially known for its use by government intelligence agencies.


History

RetrievalWare was initially created by Paul Nelson, Kenneth Clark, and Edwin Addison as part of ConQuest Software. Development began in 1989, but the software was not commercially available on a wide scale until 1992. Early funding was provided by
Rome Laboratory Rome Laboratory (Rome Air Development Center until 1991) is a U.S. Air Force research laboratory for " command, control, and communications" research and development and is responsible for planning and executing the USAF science and technology pr ...
via a
Small Business Innovation Research The Small Business Innovation Research (or SBIR) program is a U.S. government funding program, coordinated by the Small Business Administration, intended to help certain small businesses conduct research and development (R&D). Funding takes the fo ...
grant. On July 6, 1995, ConQuest Software was merged with the NASDAQ company, Excalibur Technologies and the product was rebranded as RetrievalWare. On December 21, 2000, Excalibur Technologies was combined with
Intel Corporation Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. Intel designs, manufactures, and sells computer components such as central processing ...
's Interactive Media Services division to form the
Convera Corporation Convera was formed in December 2000 by the merger of Intel's Interactive Services division and Excalibur Technologies Corporation. Until 2007, Convera's primary focus was the enterprise search market through its flagship product, RetrievalWare, ...
. Finally, on April 9, 2007, the RetrievalWare software and business was purchased by
Fast Search & Transfer Microsoft Development Center Norway (known as FAST (Fast Search & Transfer ASA) before 2010) is a Norway, Norwegian company, founded in 1997 and based in Oslo, with offices located in Germany, Italy, Sri Lanka, France, Japan, the United Kingdom, ...
at which point the product was officially retired.
Microsoft Corporation Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
continues to maintain the product for its existing customer base. Annual revenues for RetrievalWare peaked in 2001 at around $40 million US dollars.


Use of natural language techniques

RetrievalWare is a relevancy ranking text search system with processing enhancements drawn from the fields of natural language processing (NLP) and
semantic networks A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, ...
. NLP algorithms include dictionary-based
stemming In linguistic morphology and information retrieval, stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form. The stem need not be identical to the morphologic ...
(also known as
lemmatisation Lemmatization (or less commonly lemmatisation) in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form. In computational lingui ...
) and dictionary-based phrase identification. Semantic networks are used by RetrievalWare to expand the query words entered by the user to related terms with terms weights determined by the distance from the user's original terms. In addition to automatic expansion, a feedback-mode whereby users could choose the meaning of the word before performing the expansion was available. The first semantic networks were built using
WordNet WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into ''synsets'' with short definitions and usage examples. It can thu ...
. In addition, RetrievalWare implemented a form of
n-gram An ''n''-gram is a sequence of ''n'' adjacent symbols in particular order. The symbols may be ''n'' adjacent letter (alphabet), letters (including punctuation marks and blanks), syllables, or rarely whole words found in a language dataset; or ...
search (branded as APRP - Adaptive Pattern Recognition Processing), designed to search over documents with OCR errors. Query terms are divided into sets of 2-grams which are used to locate similarly matching terms from the
inverted index In computer science, an inverted index (also referred to as a postings list, postings file, or inverted file) is a database index storing a mapping from content, such as words or numbers, to its locations in a table, or in a document or a set of d ...
. The resulting matches are weighted based on similarly measures and then used to search for documents. All of these features were available no later than 1993Site Report for the Text REtrieval Conference by ConQuest Software Inc. (TREC2)
- Find the complete proceeding

/ref> and ConQuest software has claimed that it was the first commercial text-search system to implement these techniques.


Other notable features

Other notable features of RetrievalWare include distributed search servers, synchronizers for indexing external
content management system A content management system (CMS) is computer software used to manage the creation and modification of digital content ( content management).''Managing Enterprise Content: A Unified Content Strategy''. Ann Rockley, Pamela Kostur, Steve Manning. New ...
s and
relational database A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
s, a heterogeneous security model, document categorization, real-time document-query matching (profiling), multi-lingual searches (queries containing terms from multiple languages searching for documents containing terms from multiple languages), and cross-lingual searches (queries in one language searching for documents in a different language).


Participation in TREC

RetrievalWare participated in the
Text REtrieval Conference The Text REtrieval Conference (TREC) is an ongoing series of workshops focusing on a list of different information retrieval (IR) research areas, or ''tracks.'' It is co-sponsored by the National Institute of Standards and Technology (NIST) and ...
in 1992 (TREC-1), 1993 (TREC-2), and 1995 (TREC-4). In TREC-1 Site Report for the Text REtrieval Conference by ConQuest Software Inc. (TREC-1)
- Find the complete proceeding

/ref> and TREC-4,The Excalibur TREC-4 System, Preparations, and Results
- A PDF version of which can be foun
here
and the complete proceedings can be foun

/ref> the RetrievalWare runs for manually entered queries produced the best results based on the 11-point averages over all search engines which participated in the ''ad hoc'' category where search engines are allowed a single opportunity to process previously unknown queries against an existing database.


References


External links

*
Marketing presentation on RetrievalWare semantic networks and adaptive pattern recognition algorithms
{{DEFAULTSORT:Retrievalware Information retrieval systems