The Information Retrieval Facility (IRF), founded 2006 and located in
Vienna
en, Viennese
, iso_code = AT-9
, registration_plate = W
, postal_code_type = Postal code
, postal_code =
, timezone = CET
, utc_offset = +1
, timezone_DST ...
,
Austria
Austria, , bar, Östareich officially the Republic of Austria, is a country in the southern part of Central Europe, lying in the Eastern Alps. It is a federation of nine states, one of which is the capital, Vienna, the most populous ...
, was a research platform for networking and collaboration for professionals in the field of
information retrieval. It ceased operations in 2012.
The IRF had members in the following categories:
* Researchers in
information retrieval (IR) or related scientific areas
* Industrial/corporate information management professionals
* Patent authorities and governmental institutions
* Students of one of the above
Scientific Board
*Maristella Agosti, Professor
Department of Information Engineering, University of Padova*Gerhard Budin, Director of th
Center of Translation Studies at the University of Vienna Director of th
Department of Corpuslinguistics and Text Technology, Austrian Academy of Sciences*Jamie Callan, Professor
*Yves Chiaramella, Professor Emeritus
Department of Computer Science and Applied Mathematics, Joseph Fourier University*Kilnam Chon, Professor, Computer Science Department
(
Korea Advanced Institute of Science and Technology
The Korea Advanced Institute of Science and Technology (KAIST) is a national research university located in Daedeok Innopolis, Daejeon, South Korea. KAIST was established by the Korean government in 1971 as the nation's first public, research ...
)
*
W. Bruce Croft
W. Bruce Croft is a distinguished professor of computer science at the University of Massachusetts Amherst whose work focuses on information retrieval.
He is the founder of the Center for Intelligent Information Retrieval and served as the ed ...
, Distinguished Professor
Department of Computer Science and Director Center for Intelligent IRUniversity of Massachusetts Amherst
The University of Massachusetts Amherst (UMass Amherst, UMass) is a public research university in Amherst, Massachusetts and the sole public land-grant university in Commonwealth of Massachusetts. Founded in 1863 as an agricultural college, it ...
*Hamish Cunningham, Research Professor
Computer Science Department University Sheffield*Norbert Fuhr, Chairman of the Scientific Board, Professor
*David Hawking, Science Leader, Project Leader
CSIRO ICT Centre*Noriko Kando, Professor
Software Engineering Research, Software Research Division, National Institute of Informatics (NII)*Arcot Desai Narasimhalu, Associate Dean
School of Information Systems Singapore Management University*John Tait, Chief Scientific Officer of the IRF
Until July 2007 Professor of Intelligent Information Systems and Associate Dean of the School of Computing and Technology*Benjamin T'sou, Director
Language Information Sciences Research Centre, City University of Hong Kong*
C. J. van Rijsbergen
C. J. "Keith" van Rijsbergen FREng (Cornelis Joost van Rijsbergen; born 1943) is a professor of computer science at the University of Glasgow, where he founded the Glasgow Information Retrieval Group. He is one of the founders of modern Inf ...
Dept. Computer Science at the University of Glasgow
Scientific goals
* Modeling innovative and specialized information retrieval systems for global patent document collections.
* Investigating and developing an adequate technical infrastructure that allows interactive experimentation with formal, mathematical retrieval concepts for very large-scale document collections.<
* Studying the usability of multimodal user interfaces to very large-scale information retrieval systems.
* Integrating real users with actual information needs into the research process of modeling information retrieval systems to allow accurate performance evaluation.
* Ability to create different views of patent data depending on the focus of the information needed.
* Defining standardized methods for benchmarking the information retrieval process in patent document collections.
* Ability to handle text and non-text parts of a patent coherently.
* Designing, experimenting and evaluating search engines able to retrieve structured and semi-structured documents in very large-scale patent collections.
* Integrating the temporal dimension of patent documents in retrieval strategies.
* Improving effectiveness and precision of patent retrieval, based on ontologies and natural-language understanding techniques.
* Refining IR methods that allow unstructured querying by exploiting available structures within the patent documents.
* Formal (mathematical) identification and specification of relevant business information needs in the field of intellectual property information.
* Investigating efficient scaling mechanisms for information retrieval taking into account the characteristics of patent data.
* Investigating and experimenting with computing architectures for very high-capacity information management.
* Establishing an open
eScience
E-Science or eScience is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable dist ...
platform that enables a standardized and easy way of creating and performing IR experiments on a common research infrastructure.
* Discovering and investigating novel use cases and business applications deriving from intellectual property information.
* Enabling formal information retrieval, natural language and semantic processing research to grow into the field of applied sciences in the global, industrial context.
* Development and integration of different information access methods.
* Research on effective methods for interactive information retrieval.
Semantic supercomputing
Current technologies to extract concepts from unstructured documents are extremely computationally intensive. To allow interactive experimentation with rich and huge text corpora, the IRF has built a high-performance computing environment, into which the latest technological advances have been implemented:
* multi-node clusters (currently 80 cores, up to 1024)
* The highest speed interconnect technology
* single system image with large compound memory (currently 320 GB, up to 4 TB)
* fully integrated configurable computing (currently 4
FPGA
A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturinghence the term ''Field-programmability, field-programmable''. The FPGA configuration is generally specifi ...
cores, up to 256)
The combination of these HPC features to accelerate
text mining
Text mining, also referred to as ''text data mining'', similar to text analytics, is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extract ...
represents the IRF implementation of semantic supercomputing.
The World Patent Corpus
The IRF aims to bring state-of-the-art information retrieval technology to the community of patent information professionals. The IRF expects information retrieval (IR) technology to become the focus of information technology very soon. All industry sectors can profit from applying modern and future text mining processes to the special requirements of patent research. Although all ideas and concepts are universally applicable to all sorts of intellectual property information, patents require the most sophistication, and confront us with challenging technical and organizational problems.
The entire body of patent-related documents possibly constitutes the largest corpus of compound documents, making it a rewarding target for text mining scientists and end-users alike. What’s more, patents have become a crucial issue, in particular for large global corporations and universities. The industrial users of patent data are among the most demanding and important information professionals. As a consequence, they could benefit the most from technology that relieves the burden of researching the large body of patent information.
Research collections
The IRF provides several test data collections that have either been developed by the IRF, by one of its members or by third parties. These data collections can be used freely for scientific experimentation.
The Matrixware Research Collection (
MAREC Marec may refer to:
* MAREC, a patent information query tool
*Michigan Alternative and Renewable Energy Center
The Michigan Alternative and Renewable Energy Center (MAREC) was a facility located in Muskegon, Michigan that promoted research, educat ...
) is the first standardized patent data corpus for research purposes. It consists of 19 million patent documents in different languages, normalized to a highly specific XML format. The collection has been developed by Matrixware for the IRF.
The ClueWeb09 collection is a 25 terabyte dataset of about 1 billion web pages crawled in January and February 2009. It has been created by the Language Technologies Institute at
Carnegie Mellon University
Carnegie Mellon University (CMU) is a private research university in Pittsburgh, Pennsylvania. One of its predecessors was established in 1900 by Andrew Carnegie as the Carnegie Technical Schools; it became the Carnegie Institute of Technology ...
to support research on information retrieval and related human language technologies.
References
{{Citation style, date=October 2022
Patent medicine for information retrievers, Information World Review
External links
Official site: ir-facility.orgYouTube: The future of information retrieval Part1
YouTube: The future of information retrieval Part2
Organizations established in 2006
Computer science organizations
Information retrieval organizations
Education in Vienna