HOME





Key Word In Context
Key Word In Context (KWIC) is the most common format for concordance lines. The term KWIC was coined by Hans Peter Luhn. The system was based on a concept called ''keyword in titles'', which was first proposed for Manchester libraries in 1864 by Andrea Crestadoro. A KWIC index is formed by sorting and aligning the words within an article title to allow each word (except the stop words) in titles to be searchable alphabetically in the index. It was a useful indexing method for technical manuals before computerized full text search became common. For example, a search query including all of the words in an example definition ("KWIC is an acronym for Key Word In Context, the most common format for concordance lines") and the Wikipedia slogan in English ("the free encyclopedia"), searched against a Wikipedia page, might yield a KWIC index as follows. A KWIC index usually uses a wide layout to allow the display of maximum 'in context' information (not shown in the following example). ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Concordance (publishing)
A concordance is an alphabetical list of the principal words used in a book or body of work, listing every instance of each word with its immediate context (language use)#Verbal context, context. Historically, concordances have been compiled only for works of special importance, such as the Vedas, Bible, Qur'an or the works of William Shakespeare, Shakespeare, James Joyce or classical Latin and Greek authors, because of the time, difficulty, and expense involved in creating a concordance in the pre-computer era. A concordance is more than an Subject indexing, index, with additional material such as commentary, definitions and topical cross-indexing which makes producing one a labor-intensive process even when assisted by computers. In the precomputing era, search engine technology, search technology was unavailable, and a concordance offered readers of long works such as the Bible something comparable to search results for every word that they would have been likely to search fo ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Hans Peter Luhn
Hans Peter Luhn (July 1, 1896 – August 19, 1964) was a German-American researcher in the field of computer science and Library & Information Science for IBM, and creator of the Luhn algorithm, KWIC (Key Words In Context) indexing, and selective dissemination of information ("SDI"). His inventions have found applications in diverse areas like computer science, the textile industry, linguistics, and information science. He was awarded over 80 patents. He created one of the earliest practical hash functions in the 1950s. Early life and education Luhn was born in Barmen, Germany (now part of Wuppertal) on July 1, 1896. After he completed secondary school, Luhn moved to Switzerland to learn the printing trade so he could join the family business. His career in printing was halted by his service as a communications officer in the German Army during World War I. Career After the war, Luhn entered the textile field, which eventually led him to the United States, where he inv ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Andrea Crestadoro
Dr. Andrea Crestadoro (1808–1879) was a bibliographer who became Chief Librarian of Manchester Free Library, 1864–1879. He is credited with being the first person to propose that books could be catalogued by using keywords that did not occur in the title of the book. His ideas also included a metallic balloon, reform of the tax system, and improvements to a railway locomotive – the '' Impulsoria'' – that was powered by four horses on a treadmill. Biography Andrea Crestadoro was born in Genoa in 1808 and was educated there before he studied for his doctorate in philosophy at the University of Turin. He came to notice in 1849 when he left his position as Professor of Philosophy at the University of Turin to come to England to further his interest in mechanical devices. In England he took out a number of patents including improvements to the ''Impulsoria''. Crestadoro improved the design of an unusual device called the '' Impulsoria'', which was a mobile treadmill-powered l ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Stop Words
Stop words are the words in a stop list (or ''stoplist'' or ''negative dictionary'') which are filtered out ("stopped") before or after processing of natural language data (i.e. text) because they are deemed to have little semantic value or are otherwise insignificant for the task at hand. There is no single universal list of stop words used by all natural language processing (NLP) tools, nor any agreed upon rules for identifying stop words, and indeed not all tools even use such a list. Therefore, any group of words can be chosen as the stop words for a given purpose. The "general trend in nformation retrievalsystems over time has been from standard use of quite large stop lists (200–300 terms) to very small stop lists (7–12 terms) to no stop list whatsoever". History of stop words A predecessor concept was used in creating some concordances. For example, the first Hebrew concordance, Isaac Nathan ben Kalonymus's , contained a one-page list of unindexed words, with nonsu ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Full Text Search
In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts represented in databases (such as titles, abstracts, selected sections, or bibliographical references). In a full-text search, a search engine examines all of the words in every stored document as it tries to match search criteria (for example, text specified by a user). Full-text-searching techniques appeared in the 1960s, for example IBM STAIRS from 1969, and became common in online bibliographic databases in the 1990s. Many websites and application programs (such as word processing software) provide full-text-search capabilities. Some web search engines, such as the former AltaVista, employ full-text-search techniques, while others index only a portion of the web pages examined by their indexing systems. Indexing When dealing with a sm ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Cyclic Permutation
In mathematics, and in particular in group theory, a cyclic permutation is a permutation consisting of a single cycle. In some cases, cyclic permutations are referred to as cycles; if a cyclic permutation has ''k'' elements, it may be called a ''k''-cycle. Some authors widen this definition to include permutations with fixed points in addition to at most one non-trivial cycle. In cycle notation, cyclic permutations are denoted by the list of their elements enclosed with parentheses, in the order to which they are permuted. For example, the permutation (1 3 2 4) that sends 1 to 3, 3 to 2, 2 to 4 and 4 to 1 is a 4-cycle, and the permutation (1 3 2)(4) that sends 1 to 3, 3 to 2, 2 to 1 and 4 to 4 is considered a 3-cycle by some authors. On the other hand, the permutation (1 3)(2 4) that sends 1 to 3, 3 to 1, 2 to 4 and 4 to 2 is not a cyclic permutation because it separately permutes the pairs and . For the wider definition of a cyclic permutation, allowing fixed points, these fixe ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Manual Page (Unix)
A man page (short for manual page) is a form of software documentation found on Unix and Unix-like operating systems. Topics covered include programs, system libraries, system calls, and sometimes local system details. The local host administrators can create and install manual pages associated with the specific host. A manual end user may invoke a documentation page by issuing the man command followed by the name of the item for which they want the documentation. These manual pages are typically requested by end users, programmers and administrators doing real time work but can also be formatted for printing. By default, man typically uses a formatting program such as nroff with a macro package or mandoc, and also a terminal pager program such as more or less to display its output on the user's screen. Man pages are often referred to as an ''online'' form of software documentation, even though the man command does not require internet access. The environment variable MAN ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




David Parnas
David Lorge Parnas (born February 10, 1941) is a Canadian early pioneer of software engineering, who developed the concept of information hiding in modular programming, which is an important element of object-oriented programming today. He is also noted for his advocacy of precise documentation. Life Parnas earned his PhD at Carnegie Mellon University in electrical engineering. Parnas also earned a professional engineering license in Canada and was one of the first to apply traditional engineering principles to software design. He worked there as a professor for many years. He also taught at the University of North Carolina at Chapel Hill (U.S.), at the Department of Computer Science of the Technische Universität Darmstadt (Germany), the University of Victoria (British Columbia, Canada), Queen's University in Kingston, Ontario, McMaster University in Hamilton, Ontario, and University of Limerick (Ireland). David Parnas received a number of awards and honors: * ACM "Best ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Lemma (morphology)
In morphology and lexicography, a lemma (: lemmas or lemmata) is the canonical form, dictionary form, or citation form of a set of word forms. In English, for example, ''break'', ''breaks'', ''broke'', ''broken'' and ''breaking'' are forms of the same lexeme, with ''break'' as the lemma by which they are indexed. ''Lexeme'', in this context, refers to the set of all the inflected or alternating forms in the paradigm of a single word, and ''lemma'' refers to the particular form that is chosen by convention to represent the lexeme. Lemmas have special significance in highly inflected languages such as Arabic, Turkish, and Russian. The process of determining the ''lemma'' for a given lexeme is called lemmatisation. The lemma can be viewed as the chief of the principal parts, although lemmatisation is at least partly arbitrary. Morphology The form of a word that is chosen to serve as the lemma is usually the least marked form, but there are several exceptions such as the use ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Ptx (Unix)
ptx is a Unix utility, named after the '' permuted index'' algorithm which it uses to produce a search or concordance report in the Keyword in Context (KWIC) format. It is available on most Unix and Unix-like operating systems (e.g. Linux, FreeBSD). The GNU implementation uses extensions that are more powerful than the older SysV implementation. The command is available as a separate package for Microsoft Windows as part of the UnxUtils collection of native Win32 ports of common GNU Unix-like utilities. There is also a corresponding IBM mainframe utility which performs the same function. Permuted indexes are often used in such places as bibliographic or medical databases, documentation, thesauri A thesaurus (: thesauri or thesauruses), sometimes called a synonym dictionary or dictionary of synonyms, is a reference work which arranges words by their meanings (or in simpler terms, a book where one can find different words with similar me ..., or web sites to aid in locati ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Concordancer
A concordancer is a computer program that automatically constructs a concordance. The output of a concordancer may serve as input to a translation memory system for computer-assisted translation, or as an early step in machine translation. Concordancers are also used in corpus linguistics to retrieve alphabetically or otherwise sorted lists of linguistic data from the corpus in question, which the corpus linguist then analyzes. A number of concordancers have been published, notably Oxford Concordance Program (OCP), a concordancer first released in 1981 by Oxford University Computing Services, which claims to be used in over 200 organisations worldwide.
The Oxford Concordance Program Version 2 S. Hockey J. Martin Literary and Linguistic Computing, Volume 2, Issue 2, 1 January 1987, Pages 125–131, https://doi.org/10.1093/llc/2 ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Burrows–Wheeler Transform
The Burrows–Wheeler transform (BWT) rearranges a character string into runs of similar characters, in a manner that can be reversed to recover the original string. Since compression techniques such as move-to-front transform and run-length encoding are more effective when such runs are present, the BWT can be used as a preparatory step to improve the efficiency of a compression algorithm, and is used this way in software such as bzip2. The algorithm can be implemented efficiently using a suffix array thus reaching linear time complexity. It was invented by David Wheeler in 1983, and later published by him and Michael Burrows in 1994. Their paper included a compression algorithm, called the Block-sorting Lossless Data Compression Algorithm or BSLDCA, that compresses data by using the BWT followed by move-to-front coding and Huffman coding or arithmetic coding. Description The transform is done by constructing a matrix (known as the Burrows-Wheeler Matrix) whose rows are the ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]