HOME
*





SMART Information Retrieval System
The SMART (System for the Mechanical Analysis and Retrieval of Text) Information Retrieval System is an information retrieval system developed at Cornell University in the 1960s. Many important concepts in information retrieval were developed as part of research on the SMART system, including the vector space model, relevance feedback, and Rocchio classification. Gerard Salton led the group that developed SMART. Other contributors included Mike Lesk. The SMART system also provides a set of corpora, queries and reference rankings, taken from different subjects, notably * ADI: publications from information science reviews * Computer science * Cranfield collection: publications from aeronautic reviews * Forensic science: library science * MEDLARS collection: publications from medical reviews * Time magazine collection: archives of the generalist review ''Time'' in 1963 To the legacy of the SMART system belongs the so-called SMART triple notation, a mnemonic scheme for denoting tf-i ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Information Retrieval
Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds. Automated information retrieval systems are used to reduce what has been called information overload. An IR system is a software system that provides access to books, journals and other documents; stores and manages those documents. Web search engines are the most visible IR applications. Overview An information retrieval process begins when a user or searcher enters a query into the system. Queries are formal statements of information needs, for example search strings in web search engines. In ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Cornell University
Cornell University is a private statutory land-grant research university based in Ithaca, New York. It is a member of the Ivy League. Founded in 1865 by Ezra Cornell and Andrew Dickson White, Cornell was founded with the intention to teach and make contributions in all fields of knowledge—from the classics to the sciences, and from the theoretical to the applied. These ideals, unconventional for the time, are captured in Cornell's founding principle, a popular 1868 quotation from founder Ezra Cornell: "I would found an institution where any person can find instruction in any study." Cornell is ranked among the top global universities. The university is organized into seven undergraduate colleges and seven graduate divisions at its main Ithaca campus, with each college and division defining its specific admission standards and academic programs in near autonomy. The university also administers three satellite campuses, two in New York City and one in Education Ci ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Vector Space Model
Vector space model or term vector model is an algebraic model for representing text documents (and any objects, in general) as vectors of identifiers (such as index terms). It is used in information filtering, information retrieval, indexing and relevancy rankings. Its first use was in the SMART Information Retrieval System. Definitions Documents and queries are represented as vectors. :d_j = ( w_ ,w_ , \dotsc ,w_ ) :q = ( w_ ,w_ , \dotsc ,w_ ) Each dimension corresponds to a separate term. If a term occurs in the document, its value in the vector is non-zero. Several different ways of computing these values, also known as (term) weights, have been developed. One of the best known schemes is tf-idf weighting (see the example below). The definition of ''term'' depends on the application. Typically terms are single words, keywords, or longer phrases. If words are chosen to be the terms, the dimensionality of the vector is the number of words in the vocabulary (the number of d ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Relevance Feedback
Relevance feedback is a feature of some information retrieval systems. The idea behind relevance feedback is to take the results that are initially returned from a given query, to gather user feedback, and to use information about whether or not those results are relevant to perform a new query. We can usefully distinguish between three types of feedback: explicit feedback, implicit feedback, and blind or "pseudo" feedback. Explicit feedback Explicit feedback is obtained from assessors of relevance indicating the relevance of a document retrieved for a query. This type of feedback is defined as explicit only when the assessors (or other users of a system) know that the feedback provided is interpreted as relevance judgments. Users may indicate relevance explicitly using a ''binary'' or ''graded'' relevance system. Binary relevance feedback indicates that a document is either relevant or irrelevant for a given query. Graded relevance feedback indicates the relevance of a documen ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Nearest Centroid Classifier
In machine learning, a nearest centroid classifier or nearest prototype classifier is a classification model that assigns to observations the label of the class of training samples whose mean (centroid) is closest to the observation. When applied to text classification using word vectors containing tf*idf weights to represent documents, the nearest centroid classifier is known as the Rocchio classifier because of its similarity to the Rocchio algorithm for relevance feedback. An extended version of the nearest centroid classifier has found applications in the medical domain, specifically classification of tumors. Algorithm Training Given labeled training samples \textstyle\ with class labels y_i \in \mathbf, compute the per-class centroids \textstyle\vec_\ell = \frac\underset \vec_i where C_\ell is the set of indices of samples belonging to class \ell \in \mathbf. Prediction The class assigned to an observation \vec is \hat = _ \, \vec_\ell - \vec{x}\, . See also * Cluster ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Gerard Salton
Gerard A. "Gerry" Salton (8 March 1927 in Nuremberg – 28 August 1995) was a Professor of Computer Science at Cornell University. Salton was perhaps the leading computer scientist working in the field of information retrieval during his time, and "the father of Information Retrieval". His group at Cornell developed the SMART Information Retrieval System, which he initiated when he was at Harvard. It was the very first system to use the now popular vector space model for Information Retrieval. Salton was born Gerhard Anton Sahlmann on March 8, 1927 in Nuremberg, Germany. He received a Bachelor's (1950) and Master's (1952) degree in mathematics from Brooklyn College, and a Ph.D. from Harvard in applied mathematics in 1958, the last of Howard Aiken's doctoral students, and taught there until 1965, when he joined Cornell University and co-founded its department of Computer Science. Salton was perhaps most well known for developing the now widely used vector space model for Inform ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Mike Lesk
Michael E. Lesk (born 1945) is an American computer scientist. Biography In the 1960s, Michael Lesk worked for the SMART Information Retrieval System project, wrote much of its retrieval code and did many of the retrieval experiments, as well as obtaining a BA degree in Physics and Chemistry from Harvard College in 1964 and a PhD from Harvard University in Chemical Physics in 1969. From 1970 to 1984, Lesk worked at Bell Labs in the group that built Unix. Lesk wrote Unix tools for word processing (''tbl'', ''Refer (software), refer'', and the standard ''ms'' macro package, all for ''troff''), for compiling (''Lex (software), Lex''), and for networking (''uucp''). He also wrote the Portable I/O Library (the predecessor to stdio.h in C (programming language), C) and contributed significantly to the development of the C (programming language), C language preprocessor. In 1984, he left to work for Bellcore, where he managed the computer science research group. There, Lesk worked on s ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Association For Information Science And Technology
The Association for Information Science and Technology (ASIS&T) is a nonprofit membership organization for information professionals that sponsors an annual conference as well as several serial publications, including the '' Journal of the Association for Information Science and Technology'' (JAsIST). The organization provides administration and communications support for its various divisions, known as special-interest groups or SIGs; provides administration for geographically defined chapters; connects job seekers with potential employers; and provides organizational support for continuing education programs for information professionals. Founded as the American Documentation Institute (ADI) in 1937, the group became the American Society for Information Science (ASIS) in 1968 to reflect the organization's interest in "all aspects of the information transfer process" such as, "designing, managing and using information systems and technology." Updating its name in 2000, the Am ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Computer Science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (including the design and implementation of hardware and software). Computer science is generally considered an area of academic research and distinct from computer programming. Algorithms and data structures are central to computer science. The theory of computation concerns abstract models of computation and general classes of problems that can be solved using them. The fields of cryptography and computer security involve studying the means for secure communication and for preventing security vulnerabilities. Computer graphics and computational geometry address the generation of images. Programming language theory considers different ways to describe computational processes, and database theory concerns the management of repositories o ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Cranfield Experiments
The Cranfield experiments were a series of experimental studies in information retrieval conducted by Cyril W. Cleverdon at the College of Aeronautics, today known as Cranfield University, in the 1960s to evaluate the efficiency of indexing systems. The experiments were broken into two main phases, neither of which was computerized. The entire collection of abstracts, resulting indexes and results were later distributed in electronic format and were widely used for decades. In the first series of experiments, several existing indexing methods were compared to test their efficiency. The queries were generated by the authors of the papers in the collection and then translated into index lookups by experts in those systems. In this series, one method went from least efficient to most efficient after making minor changes to the arrangement of the way the data was recorded on the index cards. The conclusion appeared to be that the underlying methodology seemed less important than speci ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Forensic Science
Forensic science, also known as criminalistics, is the application of science to criminal and civil laws, mainly—on the criminal side—during criminal investigation, as governed by the legal standards of admissible evidence and criminal procedure. Forensic science is a broad field that includes; DNA analysis, fingerprint analysis, blood stain pattern analysis, firearms examination and ballistics, tool mark analysis, serology, toxicology, hair and fiber analysis, entomology, questioned documents, anthropology, odontology, pathology, epidemiology, footwear and tire tread analysis, drug chemistry, paint and glass analysis, digital audio video and photo analysis. Forensic scientists collect, preserve, and analyze scientific evidence during the course of an investigation. While some forensic scientists travel to the scene of the crime to collect the evidence themselves, others occupy a laboratory role, performing analysis on objects brought to them by other individuals. Still ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


MEDLINE
MEDLINE (Medical Literature Analysis and Retrieval System Online, or MEDLARS Online) is a bibliographic database of life sciences and biomedical information. It includes bibliographic information for articles from academic journals covering medicine, nursing, pharmacy, dentistry, veterinary medicine, and health care. MEDLINE also covers much of the literature in biology and biochemistry, as well as fields such as molecular evolution. Compiled by the United States National Library of Medicine (NLM), MEDLINE is freely available on the Internet and searchable via PubMed and NLM's National Center for Biotechnology Information's Entrez system. History MEDLARS (Medical Literature Analysis and Retrieval System) is a computerised biomedical bibliographic retrieval system. It was launched by the National Library of Medicine in 1964 and was the first large scale, computer based, retrospective search service available to the general public. Initial development of MEDLARS Since 1 ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]