Vector Space Model
   HOME





Vector Space Model
Vector space model or term vector model is an algebraic model for representing text documents (or more generally, items) as vector space, vectors such that the distance between vectors represents the relevance between the documents. It is used in information filtering, information retrieval, index (search engine), indexing and relevancy rankings. Its first use was in the SMART Information Retrieval System. Definitions In this section we consider a particular vector space model based on the Bag-of-words model, bag-of-words representation. Documents and queries are represented as vectors. :d_j = ( w_ ,w_ , \dotsc ,w_ ) :q = ( w_ ,w_ , \dotsc ,w_ ) Each Dimension (vector space), dimension corresponds to a separate term. If a term occurs in the document, its value in the vector is non-zero. Several different ways of computing these values, also known as (term) weights, have been developed. One of the best known schemes is tf-idf weighting (see the example below). The definition of ' ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Vector Space
In mathematics and physics, a vector space (also called a linear space) is a set (mathematics), set whose elements, often called vector (mathematics and physics), ''vectors'', can be added together and multiplied ("scaled") by numbers called scalar (mathematics), ''scalars''. The operations of vector addition and scalar multiplication must satisfy certain requirements, called ''vector axioms''. Real vector spaces and complex vector spaces are kinds of vector spaces based on different kinds of scalars: real numbers and complex numbers. Scalars can also be, more generally, elements of any field (mathematics), field. Vector spaces generalize Euclidean vectors, which allow modeling of Physical quantity, physical quantities (such as forces and velocity) that have not only a Magnitude (mathematics), magnitude, but also a Orientation (geometry), direction. The concept of vector spaces is fundamental for linear algebra, together with the concept of matrix (mathematics), matrices, which ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Orthogonal
In mathematics, orthogonality (mathematics), orthogonality is the generalization of the geometric notion of ''perpendicularity''. Although many authors use the two terms ''perpendicular'' and ''orthogonal'' interchangeably, the term ''perpendicular'' is more specifically used for lines and planes that intersect to form a right angle, whereas ''orthogonal'' is used in generalizations, such as ''orthogonal vectors'' or ''orthogonal curves''. ''Orthogonality'' is also used with various meanings that are often weakly related or not related at all with the mathematical meanings. Etymology The word comes from the Ancient Greek ('), meaning "upright", and ('), meaning "angle". The Ancient Greek (') and Classical Latin ' originally denoted a rectangle. Later, they came to mean a right triangle. In the 12th century, the post-classical Latin word ''orthogonalis'' came to mean a right angle or something related to a right angle. Mathematics Physics Optics In optics, polarization ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Search Engine Optimization
Search engine optimization (SEO) is the process of improving the quality and quantity of Web traffic, website traffic to a website or a web page from web search engine, search engines. SEO targets unpaid search traffic (usually referred to as "Organic search, organic" results) rather than direct traffic, referral traffic, social media traffic, or Online advertising, paid traffic. Unpaid search engine traffic may originate from a variety of kinds of searches, including image search, video search, academic databases and search engines, academic search, news search, and industry-specific vertical search engines. As an Internet marketing strategy, SEO considers how search engines work, the computer-programmed algorithms that dictate search engine results, what people search for, the actual search queries or Keyword research, keywords typed into search engines, and which search engines are preferred by a target audience. SEO is performed because a website will receive more visito ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Random Indexing
Random indexing is a dimensionality reduction method and computational framework for distributional semantics, based on the insight that very-high-dimensional vector space model implementations are impractical, that models need not grow in dimensionality when new items (e.g. new terminology) are encountered, and that a high-dimensional model can be projected into a space of lower dimensionality without compromising L2 distance metrics if the resulting dimensions are chosen appropriately. This is the original point of the random projection approach to dimension reduction first formulated as the Johnson–Lindenstrauss lemma, and locality-sensitive hashing has some of the same starting points. Random indexing, as used in representation of language, originates from the work of Pentti Kanerva on sparse distributed memory, and can be described as an incremental formulation of a random projection. It can be also verified that random indexing is a random projection technique for the c ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Term Model
In first-order logic, a Herbrand structure S is a structure over a vocabulary \sigma (also sometimes called a ''signature'') that is defined solely by the syntactical properties of \sigma. The idea is to take the symbol strings of terms as their values, e.g. the denotation of a constant symbol c is just "c" (the symbol). It is named after Jacques Herbrand. Herbrand structures play an important role in the foundations of logic programming. Herbrand universe Definition The ''Herbrand universe'' serves as the universe in a ''Herbrand structure''. Example Let L^\sigma, be a first-order language with the vocabulary * constant symbols: c * function symbols: f(\cdot), \, g(\cdot) then the Herbrand universe H of L^\sigma (or of \sigma) is H = \ The relation symbols are not relevant for a Herbrand universe since formulas involving only relations do not correspond to elements of the universe. Formulas consisting only of relations R evaluated at a set of constants or variables cor ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Latent Semantic Analysis
Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur in similar pieces of text (the distributional hypothesis). A matrix containing word counts per document (rows represent unique words and columns represent each document) is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns. Documents are then compared by cosine similarity between any two columns. Values close to 1 represent very similar documents while values close to 0 represent very dissimilar documents. An information retrieval technique using latent semantic structure was patented in 1988 by Scott D ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Generalized Vector Space Model
The Generalized vector space model is a generalization of the vector space model used in information retrieval. Wong ''et al.'' presented an analysis of the problems that the pairwise orthogonality assumption of the vector space model (VSM) creates. From here they extended the VSM to the generalized vector space model (GVSM). Definitions GVSM introduces term to term correlations, which deprecate the pairwise orthogonality assumption. More specifically, the factor considered a new space, where each term vector ''ti'' was expressed as a linear combination of ''2n'' vectors ''mr'' where ''r = 1...2n''. For a document ''dk'' and a query ''q'' the similarity function now becomes: :sim(d_k,q) = \frac where ''ti'' and ''tj'' are now vectors of a ''2n'' dimensional space. Term correlation t_i \cdot t_j can be implemented in several ways. For an example, Wong et al. uses the term occurrence frequency matrix obtained from automatic indexing as input to their algorithm. The term occurren ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

WordNet
WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into ''synsets'' with short definitions and usage examples. It can thus be seen as a combination and extension of a dictionary and thesaurus. Its primary use is in automatic natural language processing, text analysis and artificial intelligence applications. It was first created in the English language and the English WordNet database and software tools have been released under a BSD License, BSD style license and are freely available for download. The latest official release from Princeton was released in 2011. Princeton currently has no plans to release any new versions due to staffing and funding issues. New versions are still being released annually through the Open English WordNet website. Until about 2024 an online version was previously available through wordnet.princeton.edu. That version of WordNet h ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Lexical Database
In digital lexicography, natural language processing, and digital humanities, a lexical resource is a language resource consisting of data regarding the lexemes of the lexicon of one or more languages e.g., in the form of a database. Characteristics Different standards for the machine-readable edition of lexical resources exist, e.g., Lexical Markup Framework (LMF) an ISO standard for encoding lexical resources, comprising an abstract data model and an XML serialization, and OntoLex-Lemon, an RDF vocabulary for publishing lexical resources as knowledge graphs on the web, e.g., as Linguistic Linked Open Data. Depending on the type of languages that are addressed, a lexical resource may be qualified as monolingual, bilingual or multilingual. For bilingual and multilingual lexical resources, the words may be connected or not connected from one language to another. When connected, the equivalence from a language to another is performed through a bilingual link (for bilingual ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Singular Value Decomposition
In linear algebra, the singular value decomposition (SVD) is a Matrix decomposition, factorization of a real number, real or complex number, complex matrix (mathematics), matrix into a rotation, followed by a rescaling followed by another rotation. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any matrix. It is related to the polar decomposition#Matrix polar decomposition, polar decomposition. Specifically, the singular value decomposition of an m \times n complex matrix is a factorization of the form \mathbf = \mathbf, where is an complex unitary matrix, \mathbf \Sigma is an m \times n rectangular diagonal matrix with non-negative real numbers on the diagonal, is an n \times n complex unitary matrix, and \mathbf V^* is the conjugate transpose of . Such decomposition always exists for any complex matrix. If is real, then and can be guaranteed to be real orthogonal matrix, orthogonal matrices; in such contexts, the SVD ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Hypercube
In geometry, a hypercube is an ''n''-dimensional analogue of a square ( ) and a cube ( ); the special case for is known as a ''tesseract''. It is a closed, compact, convex figure whose 1- skeleton consists of groups of opposite parallel line segments aligned in each of the space's dimensions, perpendicular to each other and of the same length. A unit hypercube's longest diagonal in ''n'' dimensions is equal to \sqrt. An ''n''-dimensional hypercube is more commonly referred to as an ''n''-cube or sometimes as an ''n''-dimensional cube. The term measure polytope (originally from Elte, 1912) is also used, notably in the work of H. S. M. Coxeter who also labels the hypercubes the Îłn polytopes. The hypercube is the special case of a hyperrectangle (also called an ''n-orthotope''). A ''unit hypercube'' is a hypercube whose side has length one unit. Often, the hypercube whose corners (or ''vertices'') are the 2''n'' points in R''n'' with each coordinate equal to 0 or 1 i ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]