The Text REtrieval Conference (TREC) is an ongoing series of
workshop
Beginning with the Industrial Revolution era, a workshop may be a room, rooms or building which provides both the area and tools (or machinery) that may be required for the manufacture or repair of manufactured goods. Workshops were the on ...
s focusing on a list of different
information retrieval (IR) research areas, or ''tracks.'' It is co-sponsored by the
National Institute of Standards and Technology
The National Institute of Standards and Technology (NIST) is an agency of the United States Department of Commerce whose mission is to promote American innovation and industrial competitiveness. NIST's activities are organized into Outline of p ...
(NIST) and the
Intelligence Advanced Research Projects Activity
The Intelligence Advanced Research Projects Activity (IARPA) is an organization within the Office of the Director of National Intelligence responsible for leading research to overcome difficult challenges relevant to the United States Intellige ...
(part of the office of the
Director of National Intelligence
The director of national intelligence (DNI) is a senior, cabinet-level United States government official, required by the Intelligence Reform and Terrorism Prevention Act of 2004 to serve as executive head of the United States Intelligence Comm ...
), and began in 1992 as part of the
TIPSTER Text program. Its purpose is to support and encourage research within the information retrieval community by providing the infrastructure necessary for large-scale ''evaluation'' of
text retrieval Document retrieval is defined as the matching of some stated user query against a set of free-text records. These records could be any type of mainly unstructured text, such as newspaper articles, real estate records or paragraphs in a manual. Us ...
methodologies and to increase the speed of lab-to-product
transfer of technology.
TREC's evaluation protocols have improved many search technologies. A 2010 study estimated that "without TREC, U.S. Internet users would have spent up to 3.15 billion additional hours using web search engines between 1999 and 2009."
Hal Varian
Hal Ronald Varian (born March 18, 1947 in Wooster, Ohio) is Chief Economist at Google and holds the title of emeritus professor at the University of California, Berkeley where he was founding dean of the School of Information. Varian is an econom ...
the Chief Economist at
Google
Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
wrote that "The TREC data revitalized research on information retrieval. Having a standard, widely available, and carefully constructed set of data laid the groundwork for further innovation in this field."
Each track has a challenge wherein NIST provides participating groups with data sets and test problems. Depending on track, test problems might be questions, topics, or target extractable
features. Uniform scoring is performed so the systems can be fairly evaluated. After evaluation of the results, a workshop provides a place for participants to collect together thoughts and ideas and present current and future research work.Text Retrieval Conference started in 1992, funded by DARPA (US Defense Advanced Research Project) and run by NIST. Its purpose was to support research within the information retrieval community by providing the infrastructure necessary for large-scale evaluation of text retrieval methodologies.
Goals
* Encourage retrieval search based on large text collections
* Increase communication among industry, academia, and government by creating an open forum for the exchange of research ideas
* Speed the transfer of technology from research labs into commercial products by demonstrating substantial improvements retrieval methodologies on real world problems
* To increase the availability of appropriate evaluation techniques for use by industry and academia including development of new evaluation techniques more applicable to current systems
TREC is overseen by a program committee consisting of representatives from government, industry, and academia. For each TREC, NIST provide a set of documents and questions. Participants run their own retrieval system on the data and return to NIST a list of retrieved top-ranked documents .NIST pools the individual result judges the retrieved documents for correctness and evaluates the results. The TREC cycle ends with a workshop that is a forum for participants to share their experiences.
Relevance judgments in TREC
TREC uses binary relevance criterion that is either the document is relevant or not relevant. Since size of TREC collection is large, it is impossible to calculate the absolute recall for each query. In order to assess the relevance of documents in relation to a query, TREC uses a specific method call pooling for calculating relative recall. All the relevant documents that occurred in the top 100 documents for each system and for each query are combined to produce a pool of relevant documents. Recall being the proportion of the pool of relevant documents that a single system retrieved for a query topic.
Various TRECs
In 1992 TREC-1 was held at NIST. The first conference attracted 28 groups of researchers from academia and industry. It demonstrated a wide range of different approaches to the retrieval of text from large document collections .Finally TREC1 revealed the facts that automatic construction of queries from natural language query statements seems to work. Techniques based on natural language processing were no better no worse than those based on vector or probabilistic approach.
TREC2 Took place in August 1993. 31 group of researchers participated in this. Two types of retrieval were examined. Retrieval using an ‘ad hoc’ query and retrieval using a ‘routing' query
In TREC-3 a small group experiments worked with Spanish language collection and others dealt with interactive query formulation in multiple databases
TREC-4 they made even shorter to investigate the problems with very short user statements
TREC-5 includes both short and long versions of the topics with the goal of carrying out deeper investigation into which types of techniques work well on various lengths of topics
In TREC-6 Three new tracks speech, cross language, high precision information retrieval were introduced. The goal of cross language information retrieval is to facilitate research on system that are able to retrieve relevant document regardless of language of the source document
TREC-7 contained seven tracks out of which two were new Query track and very large corpus track. The goal of the query track was to create a large query collection
TREC-8 contain seven tracks out of which two –question answering and web tracks were new. The objective of QA query is to explore the possibilities of providing answers to specific natural language queries
TREC-9 Includes seven tracks
In TREC-10 Video tracks introduced Video tracks design to promote research in content based retrieval from digital video
In TREC-11 Novelty tracks introduced. The goal of novelty track is to investigate systems abilities to locate relevant and new information within the ranked set of documents returned by a traditional document retrieval system
TREC-12 held in 2003 added three new tracks; Genome track, robust retrieval track, HARD (Highly Accurate Retrieval from Documents)
Tracks
Current tracks
''New tracks are added as new research needs are identified, this list is current for TREC 2018.''
CENTRE Track- Goal: run in parallel CLEF 2018, NTCIR-14, TREC 2018 to develop and tune an IR reproducibility evaluation protocol (new track for 2018).
Common Core Track- Goal: an ad hoc search task over news documents.
Complex Answer Retrieval (CAR)- Goal: to develop systems capable of answering complex information needs by collating information from an entire corpus.
Incident Streams Track- Goal: to research technologies to automatically process social media streams during emergency situations (new track for TREC 2018).
The News Track- Goal: partnership with
The Washington Post
''The Washington Post'' (also known as the ''Post'' and, informally, ''WaPo'') is an American daily newspaper published in Washington, D.C. It is the most widely circulated newspaper within the Washington metropolitan area and has a large n ...
to develop test collections in news environment (new for 2018).
Precision Medicine Track- Goal: a specialization of the Clinical Decision Support track to focus on linking oncology patient data to clinical trials.
Real-Time Summarization Track (RTS)- Goal: to explore techniques for real-time update summaries from social media streams.
Past tracks
* Chemical Track - Goal: to develop and evaluate technology for large scale search in
chemistry-related documents, including academic papers and patents, to better meet the needs of professional searchers, and specifically
patent searchers and chemists.
Clinical Decision Support Track- Goal: to investigate techniques for linking medical cases to information relevant for patient care
Contextual Suggestion Track- Goal: to investigate search techniques for complex information needs that are highly dependent on context and user interests.
*
Crowdsourcing
Crowdsourcing involves a large group of dispersed participants contributing or producing goods or services—including ideas, votes, micro-tasks, and finances—for payment or as volunteers. Contemporary crowdsourcing often involves digita ...
Track - Goal: to provide a collaborative venue for exploring
crowdsourcing
Crowdsourcing involves a large group of dispersed participants contributing or producing goods or services—including ideas, votes, micro-tasks, and finances—for payment or as volunteers. Contemporary crowdsourcing often involves digita ...
methods both for evaluating search and for performing search tasks.
*
Genomics Track - Goal: to study the retrieval of
genomic
Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
data, not just gene sequences but also supporting documentation such as research papers, lab reports, etc. Last ran on TREC 2007.
Dynamic Domain Track- Goal: to investigate domain-specific search algorithms that adapt to the dynamic information needs of professional users as they explore in complex domains.
*
Enterprise Track - Goal: to study search over the data of an organization to complete some task. Last ran on TREC 2008.
*
Entity
An entity is something that exists as itself, as a subject or as an object, actually or potentially, concretely or abstractly, physically or not. It need not be of material existence. In particular, abstractions and legal fictions are usually r ...
Track - Goal: to perform entity-related search on Web data. These search tasks (such as finding entities and properties of entities) address common information needs that are not that well modeled as ad hoc document search.
*
Cross-Language Track - Goal: to investigate the ability of retrieval systems to find documents topically regardless of source language. After 1999, this track spun off into
CLEF
A clef (from French: 'key') is a musical symbol used to indicate which notes are represented by the lines and spaces on a musical stave. Placing a clef on a stave assigns a particular pitch to one of the five lines, which defines the pit ...
.
*
FedWeb Track - Goal: to select best resources to forward a query to, and merge the results so that most relevant are on the top.
* Federated Web Search Track - Goal: to investigate techniques for the selection and combination of search results from a large number of real on-line web search services.
* Filtering Track - Goal: to binarily decide retrieval of new incoming documents given a stable
information need
The term information need is often understood as an individual or group's desire to locate and obtain information to satisfy a conscious or unconscious need. Rarely mentioned in general literature about needs, it is a common term in information sc ...
.
* HARD Track - Goal: to achieve High Accuracy Retrieval from Documents by leveraging additional information about the searcher and/or the search context.
* Interactive Track - Goal: to study user
interaction
Interaction is action that occurs between two or more objects, with broad use in philosophy and the sciences. It may refer to:
Science
* Interaction hypothesis, a theory of second language acquisition
* Interaction (statistics)
* Interaction ...
with text retrieval systems.
Knowledge Base Acceleration (KBA)Track - Goal: to develop techniques to dramatically improve the efficiency of (human) knowledge base curators by having the system suggest modifications/extensions to the KB based on its monitoring of the data streams, created th
organized by
Diffeo.
* Legal Track - Goal: to develop search technology that meets the needs of lawyers to engage in effective
discovery
Discovery may refer to:
* Discovery (observation), observing or finding something unknown
* Discovery (fiction), a character's learning something unknown
* Discovery (law), a process in courts of law relating to evidence
Discovery, The Discover ...
in
digital document collections.
LiveQA Track- Goal: to generate answers to real questions originating from real users via a live question stream, in real time.
* Medical Records Track - Goal: to explore methods for searching unstructured information found in patient medical records.
*
Microblog
Microblogging is a form of social network that permits only short po