Extended Boolean Model
The Extended Boolean model was described in a Communications of the ACM article appearing in 1983, by Gerard Salton, Edward A. Fox, and Harry Wu. The goal of the Extended Boolean model is to overcome the drawbacks of the Boolean model that has been used in information retrieval. The Boolean model doesn't consider term weights in queries, and the result set of a Boolean query is often either too small or too big. The idea of the extended model is to make use of partial matching and term weights as in the vector space model. It combines the characteristics of the Vector Space Model with the properties of Boolean algebra and ranks the similarity between queries and documents. This way a document may be somewhat relevant if it matches some of the queried terms and will be returned as a result, whereas in the Standard Boolean model it wasn't. Thus, the extended Boolean model can be considered as a generalization of both the Boolean and vector space models; those two are special cases if ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Information Retrieval
Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form of a search query. In the case of document retrieval, queries can be based on full-text search, full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds. Automated information retrieval systems are used to reduce what has been called information overload. An IR system is a software system that provides access to books, journals and other documents; it also stores and manages those documents. Web search engines are the most visible IR applications. Overview An information retrieval process begins when a user enters a query into the sys ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Vector Space Model
Vector space model or term vector model is an algebraic model for representing text documents (or more generally, items) as vector space, vectors such that the distance between vectors represents the relevance between the documents. It is used in information filtering, information retrieval, index (search engine), indexing and relevancy rankings. Its first use was in the SMART Information Retrieval System. Definitions In this section we consider a particular vector space model based on the Bag-of-words model, bag-of-words representation. Documents and queries are represented as vectors. :d_j = ( w_ ,w_ , \dotsc ,w_ ) :q = ( w_ ,w_ , \dotsc ,w_ ) Each Dimension (vector space), dimension corresponds to a separate term. If a term occurs in the document, its value in the vector is non-zero. Several different ways of computing these values, also known as (term) weights, have been developed. One of the best known schemes is tf-idf weighting (see the example below). The definition of ' ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Boolean Algebra (logic)
In mathematics and mathematical logic, Boolean algebra is a branch of algebra. It differs from elementary algebra in two ways. First, the values of the variable (mathematics), variables are the truth values ''true'' and ''false'', usually denoted by 1 and 0, whereas in elementary algebra the values of the variables are numbers. Second, Boolean algebra uses logical operators such as Logical conjunction, conjunction (''and'') denoted as , disjunction (''or'') denoted as , and negation (''not'') denoted as . Elementary algebra, on the other hand, uses arithmetic operators such as addition, multiplication, subtraction, and division. Boolean algebra is therefore a formal way of describing logical operations in the same way that elementary algebra describes numerical operations. Boolean algebra was introduced by George Boole in his first book ''The Mathematical Analysis of Logic'' (1847), and set forth more fully in his ''An Investigation of the Laws of Thought'' (1854). According to ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Standard Boolean Model
The (standard) Boolean model of information retrieval (BIR) is a classical information retrieval (IR) model and, at the same time, the first and most-adopted one. The BIR is based on Boolean logic and classical set theory in that both the documents to be searched and the user's query are conceived as sets of terms (a bag-of-words model). Retrieval is based on whether or not the documents contain the query terms and whether they satisfy the boolean conditions described by the query. Definitions An ''index term'' is a word or expression'','' which may be stemmed, describing or characterizing a document, such as a keyword given for a journal article. LetT = \be the set of all such index terms. A ''document'' is any subset of T. LetD = \be the set of all documents. T is a series of words or small phrases (index terms). Each of those words or small phrases are named t_n, where n is the number of the term in the series/list. You can think of T as "Terms" and t_n as "index term ' ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Relevance Feedback
Relevance feedback is a feature of some information retrieval and recommender systems. The idea behind relevance feedback is to take the results that are initially returned from a given query, to gather user feedback, and to use information about whether or not those results are relevant to perform a new query. We can usefully distinguish between three types of feedback: explicit feedback, implicit feedback, and blind or "pseudo" feedback. Explicit feedback Explicit feedback is obtained from assessors of relevance indicating the relevance of a document retrieved for a query. This type of feedback is defined as explicit only when the assessors (or other users of a system) know that the feedback provided is interpreted as relevance judgments. Users may indicate relevance explicitly using a ''binary'' or ''graded'' relevance system. Binary relevance feedback indicates that a document is either relevant or irrelevant for a given query. Graded relevance feedback indicates the rele ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Query Expansion
Query expansion (QE) is the process of reformulating a given query to improve retrieval performance in information retrieval operations, particularly in the context of query understanding. In the context of search engines, query expansion involves evaluating a user's input (what words were typed into the search query area, and sometimes other types of data) and expanding the search query to match additional documents. Query expansion involves techniques such as: * Finding synonyms of words, and searching for the synonyms as well * Finding semantically related words (e.g. antonyms, meronyms, hyponyms, hypernyms) * Finding all the various morphological forms of words by stemming each word in the search query * Fixing spelling errors and automatically searching for the corrected form or suggesting it in the results * Re-weighting the terms in the original query Query expansion is a methodology studied in the field of computer science, particularly within the realm of natural lan ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Dimension (vector Space)
In mathematics, the dimension of a vector space ''V'' is the cardinality (i.e., the number of vectors) of a Basis (linear algebra), basis of ''V'' over its base Field (mathematics), field. p. 44, ยง2.36 It is sometimes called Hamel dimension (after Georg Hamel) or algebraic dimension to distinguish it from other types of dimension. For every vector space there exists a basis, and all bases of a vector space have equal cardinality; as a result, the dimension of a vector space is uniquely defined. We say V is if the dimension of V is wiktionary:finite, finite, and if its dimension is infinity, infinite. The dimension of the vector space V over the field F can be written as \dim_F(V) or as [V : F], read "dimension of V over F". When F can be inferred from context, \dim(V) is typically written. Examples The vector space \R^3 has \left\ as a standard basis, and therefore \dim_(\R^3) = 3. More generally, \dim_(\R^n) = n, and even more generally, \dim_(F^n) = n for any Field (mathe ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Term Frequency
Term may refer to: Language *Terminology, context-specific nouns or compound words **Technical term (or ''term of art''), used by specialists in a field ***Scientific terminology, used by scientists *Term (argumentation), part of an argument in debate theory Law and finance *Contractual term, a provision in a contract **Credit repayment terms **Payment terms, "net ''D''" on a trade invoice **Purchase order, invoice terms more generally * Term life insurance Lengths of time *Academic term, part of a year at school or university *Term of office, a set period a person serves in an elected office *Term of patent, the period of enforcement of patent rights *Term of a pregnancy *Prison sentence Mathematics and physics *Term (logic), a component of a logical or mathematical expression (not to be confused with term logic, or Aristotelian logic) **Ground term, a term with no variables *Term (arithmetic), or addend, an operand to the addition operator **Term of a summation, a polynomial, ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Inverse Document Frequency
Inverse or invert may refer to: Science and mathematics * Inverse (logic), a type of conditional sentence which is an immediate inference made from another conditional sentence * Additive inverse, the inverse of a number that, when added to the original number, yields zero * Compositional inverse, a function that "reverses" another function * Inverse element * Inverse function, a function that "reverses" another function **Generalized inverse, a matrix that has some properties of the inverse matrix but not necessarily all of them * Multiplicative inverse (reciprocal), a number which when multiplied by a given number yields the multiplicative identity, 1 ** Inverse matrix of an Invertible matrix Other uses * Invert level, the base interior level of a pipe, trench or tunnel * ''Inverse'' (website), an online magazine * An outdated term for an LGBT person; see Sexual inversion (sexology) See also * Inversion (other) * Inverter (other) * Opposite (disambiguation ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
P-norm
In mathematics, the spaces are function spaces defined using a natural generalization of the -norm for finite-dimensional vector spaces. They are sometimes called Lebesgue spaces, named after Henri Lebesgue , although according to the Bourbaki group they were first introduced by Frigyes Riesz . spaces form an important class of Banach spaces in functional analysis, and of topological vector spaces. Because of their key role in the mathematical analysis of measure and probability spaces, Lebesgue spaces are used also in the theoretical discussion of problems in physics, statistics, economics, finance, engineering, and other disciplines. Preliminaries The -norm in finite dimensions The Euclidean length of a vector x = (x_1, x_2, \dots, x_n) in the n-dimensional real vector space \Reals^n is given by the Euclidean norm: \, x\, _2 = \left(^2 + ^2 + \dotsb + ^2\right)^. The Euclidean distance between two points x and y is the length \, x - y\, _2 of the straight line between t ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Fuzzy Retrieval
Fuzzy retrieval techniques are based on the Extended Boolean model and the Fuzzy set theory. There are two classical fuzzy retrieval models: Mixed Min and Max (MMM) and the Paice model. Both models do not provide a way of evaluating query weights, however this is considered by the P-norms algorithm. Mixed Min and Max model (MMM) In fuzzy-set theory, an element has a varying degree of membership, say ''dA'', to a given set ''A'' instead of the traditional membership choice (is an element/is not an element). In MMM each index term has a fuzzy set associated with it. A document's weight with respect to an index term ''A'' is considered to be the degree of membership of the document in the fuzzy set associated with ''A''. The degree of membership for union and intersection are defined as follows in Fuzzy set theory: :d_= min(d_A, d_B) :d_= max(d_A,d_B) According to this, documents that should be retrieved for a query of the form ''A or B'', should be in the fuzzy set associated with ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Information Retrieval
Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form of a search query. In the case of document retrieval, queries can be based on full-text search, full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds. Automated information retrieval systems are used to reduce what has been called information overload. An IR system is a software system that provides access to books, journals and other documents; it also stores and manages those documents. Web search engines are the most visible IR applications. Overview An information retrieval process begins when a user enters a query into the sys ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |