Extended Boolean Model
The Extended Boolean model was described in a Communications of the ACM article appearing in 1983, by Gerard Salton, Edward A. Fox, and Harry Wu. The goal of the Extended Boolean model is to overcome the drawbacks of the Boolean model that has been used in information retrieval. The Boolean model doesn't consider term weights in queries, and the result set of a Boolean query is often either too small or too big. The idea of the extended model is to make use of partial matching and term weights as in the vector space model. It combines the characteristics of the Vector Space Model with the properties of Boolean algebra and ranks the similarity between queries and documents. This way a document may be somewhat relevant if it matches some of the queried terms and will be returned as a result, whereas in the Standard Boolean model it wasn't. Thus, the extended Boolean model can be considered as a generalization of both the Boolean and vector space models; those two are special cases if ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
|
![]() |
Information Retrieval
Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds. Automated information retrieval systems are used to reduce what has been called information overload. An IR system is a software system that provides access to books, journals and other documents; stores and manages those documents. Web search engines are the most visible IR applications. Overview An information retrieval process begins when a user or searcher enters a query into the system. Queries are formal statements of information needs, for example search strings in web search engines. In ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
![]() |
Vector Space Model
Vector space model or term vector model is an algebraic model for representing text documents (and any objects, in general) as vectors of identifiers (such as index terms). It is used in information filtering, information retrieval, indexing and relevancy rankings. Its first use was in the SMART Information Retrieval System. Definitions Documents and queries are represented as vectors. :d_j = ( w_ ,w_ , \dotsc ,w_ ) :q = ( w_ ,w_ , \dotsc ,w_ ) Each dimension corresponds to a separate term. If a term occurs in the document, its value in the vector is non-zero. Several different ways of computing these values, also known as (term) weights, have been developed. One of the best known schemes is tf-idf weighting (see the example below). The definition of ''term'' depends on the application. Typically terms are single words, keywords, or longer phrases. If words are chosen to be the terms, the dimensionality of the vector is the number of words in the vocabulary (the number of ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
Boolean Algebra (logic)
In mathematics and mathematical logic, Boolean algebra is a branch of algebra. It differs from elementary algebra in two ways. First, the values of the variables are the truth values ''true'' and ''false'', usually denoted 1 and 0, whereas in elementary algebra the values of the variables are numbers. Second, Boolean algebra uses logical operators such as conjunction (''and'') denoted as ∧, disjunction (''or'') denoted as ∨, and the negation (''not'') denoted as ¬. Elementary algebra, on the other hand, uses arithmetic operators such as addition, multiplication, subtraction and division. So Boolean algebra is a formal way of describing logical operations, in the same way that elementary algebra describes numerical operations. Boolean algebra was introduced by George Boole in his first book ''The Mathematical Analysis of Logic'' (1847), and set forth more fully in his '' An Investigation of the Laws of Thought'' (1854). According to Huntington, the term "Boolean algebra ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
|
Standard Boolean Model
The (standard) Boolean model of information retrieval (BIR) is a classical information retrieval (IR) model and, at the same time, the first and most-adopted one. It is used by many IR systems to this day. The BIR is based on Boolean logic and classical set theory in that both the documents to be searched and the user's query are conceived as sets of terms (a bag-of-words model). Retrieval is based on whether or not the documents contain the query terms. Definitions An ''index term'' is a word or expression'','' which may be stemmed, describing or characterizing a document, such as a keyword given for a journal article. LetT = \be the set of all such index terms. A ''document'' is any subset of T. LetD = \be the set of all documents. A ''query'' is a Boolean expression Q in normal form:Q = (W_1\ \or\ W_2\ \or\ \cdots) \and\ \cdots\ \and\ (W_i\ \or\ W_\ \or\ \cdots)where W_i is true for D_j when t_i \in D_j. (Equivalently, Q could be expressed in disjunctive normal form.) We ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
|
Relevance Feedback
Relevance feedback is a feature of some information retrieval systems. The idea behind relevance feedback is to take the results that are initially returned from a given query, to gather user feedback, and to use information about whether or not those results are relevant to perform a new query. We can usefully distinguish between three types of feedback: explicit feedback, implicit feedback, and blind or "pseudo" feedback. Explicit feedback Explicit feedback is obtained from assessors of relevance indicating the relevance of a document retrieved for a query. This type of feedback is defined as explicit only when the assessors (or other users of a system) know that the feedback provided is interpreted as relevance judgments. Users may indicate relevance explicitly using a ''binary'' or ''graded'' relevance system. Binary relevance feedback indicates that a document is either relevant or irrelevant for a given query. Graded relevance feedback indicates the relevance of a docume ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
|
Query Expansion
Query expansion (QE) is the process of reformulating a given query to improve retrieval performance in information retrieval operations, particularly in the context of query understanding. In the context of search engines, query expansion involves evaluating a user's input (what words were typed into the search query area, and sometimes other types of data) and expanding the search query to match additional documents. Query expansion involves techniques such as: * Finding synonyms of words, and searching for the synonyms as well * Finding semantically related words (e.g. antonyms, meronyms, hyponyms, hypernyms) * Finding all the various morphological forms of words by stemming each word in the search query * Fixing spelling errors and automatically searching for the corrected form or suggesting it in the results * Re-weighting the terms in the original query Query expansion is a methodology studied in the field of computer science, particularly within the realm of natural la ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
|
Dimension (vector Space)
In mathematics, the dimension of a vector space ''V'' is the cardinality (i.e., the number of vectors) of a basis of ''V'' over its base field. p. 44, §2.36 It is sometimes called Hamel dimension (after Georg Hamel) or algebraic dimension to distinguish it from other types of dimension. For every vector space there exists a basis, and all bases of a vector space have equal cardinality; as a result, the dimension of a vector space is uniquely defined. We say V is if the dimension of V is finite, and if its dimension is infinite. The dimension of the vector space V over the field F can be written as \dim_F(V) or as : F read "dimension of V over F". When F can be inferred from context, \dim(V) is typically written. Examples The vector space \R^3 has \left\ as a standard basis, and therefore \dim_(\R^3) = 3. More generally, \dim_(\R^n) = n, and even more generally, \dim_(F^n) = n for any field F. The complex numbers \Complex are both a real and complex vector space; we ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
|
Term Frequency
Term may refer to: *Terminology, or term, a noun or compound word used in a specific context, in particular: **Technical term, part of the specialized vocabulary of a particular field, specifically: ***Scientific terminology, terms used by scientists Law *Contractual term, a legally binding provision ** Payment (or credit) terms, a part of an invoice; when you'll have to pay and what discount you'll get by paying early. Like "2/10 net 30". Lengths of time *Academic term, a division of the academic year in which classes are held. For English-speaking university academic terms, see: **Easter term **Hilary term ** Lent term **Michaelmas term **Summer term ** Trinity term *Term of office, the length of time a person serves in a particular office * Term of patent, the maximum period during which a patent can be maintained in force *Term of a pregnancy *Prison sentence, or term, a time served in a prison Mathematics and physics *Term (logic), a component of a logical or mathematical ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
|
Inverse Document Frequency
Inverse or invert may refer to: Science and mathematics * Inverse (logic), a type of conditional sentence which is an immediate inference made from another conditional sentence * Additive inverse (negation), the inverse of a number that, when added to the original number, yields zero * Compositional inverse, a function that "reverses" another function * Inverse element * Inverse function, a function that "reverses" another function **Generalized inverse, a matrix that has some properties of the inverse matrix but not necessarily all of them * Multiplicative inverse (reciprocal), a number which when multiplied by a given number yields the multiplicative identity, 1 ** Inverse matrix of an Invertible matrix Other uses * Invert level, the base interior level of a pipe, trench or tunnel * ''Inverse'' (website), an online magazine * An outdated term for an LGBT person; see Sexual inversion (sexology) See also * Inversion (other) * Inverter (other) * Opposite (dis ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
|
![]() |
P-norm
In mathematics, the spaces are function spaces defined using a natural generalization of the -norm for finite-dimensional vector spaces. They are sometimes called Lebesgue spaces, named after Henri Lebesgue , although according to the Bourbaki group they were first introduced by Frigyes Riesz . spaces form an important class of Banach spaces in functional analysis, and of topological vector spaces. Because of their key role in the mathematical analysis of measure and probability spaces, Lebesgue spaces are used also in the theoretical discussion of problems in physics, statistics, economics, finance, engineering, and other disciplines. Applications Statistics In statistics, measures of central tendency and statistical dispersion, such as the mean, median, and standard deviation, are defined in terms of metrics, and measures of central tendency can be characterized as solutions to variational problems. In penalized regression, "L1 penalty" and "L2 penalty" refer to pe ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
Fuzzy Retrieval
Fuzzy retrieval techniques are based on the Extended Boolean model and the Fuzzy set theory. There are two classical fuzzy retrieval models: Mixed Min and Max (MMM) and the Paice model. Both models do not provide a way of evaluating query weights, however this is considered by the P-norms algorithm. Mixed Min and Max model (MMM) In fuzzy-set theory, an element has a varying degree of membership, say ''dA'', to a given set ''A'' instead of the traditional membership choice (is an element/is not an element). In MMM each index term has a fuzzy set associated with it. A document's weight with respect to an index term ''A'' is considered to be the degree of membership of the document in the fuzzy set associated with ''A''. The degree of membership for union and intersection are defined as follows in Fuzzy set theory: :d_= min(d_A, d_B) :d_= max(d_A,d_B) According to this, documents that should be retrieved for a query of the form ''A or B'', should be in the fuzzy set associated with ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
|
Information Retrieval
Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds. Automated information retrieval systems are used to reduce what has been called information overload. An IR system is a software system that provides access to books, journals and other documents; stores and manages those documents. Web search engines are the most visible IR applications. Overview An information retrieval process begins when a user or searcher enters a query into the system. Queries are formal statements of information needs, for example search strings in web search engines. In ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |