HOME

TheInfoList



OR:

agrep (approximate
grep grep is a command-line utility for searching plaintext datasets for lines that match a regular expression. Its name comes from the ed command g/re/p (global regular expression search and print), which has the same effect. grep was originally de ...
) is an
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
approximate string matching In computer science, approximate string matching (often colloquially referred to as fuzzy string searching) is the technique of finding strings that match a pattern approximately (rather than exactly). The problem of approximate string matching ...
program, developed by
Udi Manber Udi Manber () is an Israeli computer scientist. He is one of the authors of agrep and GLIMPSE. After a career in engineering and management, he worked on medical research. Education He earned both his bachelor's degree in 1975 in mathematics an ...
and Sun Wu between 1988 and 1991, for use with the
Unix Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
operating system. It was later ported to
OS/2 OS/2 is a Proprietary software, proprietary computer operating system for x86 and PowerPC based personal computers. It was created and initially developed jointly by IBM and Microsoft, under the leadership of IBM software designer Ed Iacobucci, ...
,
DOS DOS (, ) is a family of disk-based operating systems for IBM PC compatible computers. The DOS family primarily consists of IBM PC DOS and a rebranded version, Microsoft's MS-DOS, both of which were introduced in 1981. Later compatible syste ...
, and
Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
. It selects the best-suited algorithm for the current query from a variety of the known fastest (built-in)
string searching algorithm A string-searching algorithm, sometimes called string-matching algorithm, is an algorithm that searches a body of text for portions that match by pattern. A basic example of string searching is when the pattern and the searched text are arrays of ...
s, including Manber and Wu's bitap algorithm based on
Levenshtein distance In information theory, linguistics, and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences. The Levenshtein distance between two words is the minimum number of single-character edits (i ...
s. agrep is also the
search engine A search engine is a software system that provides hyperlinks to web pages, and other relevant information on World Wide Web, the Web in response to a user's web query, query. The user enters a query in a web browser or a mobile app, and the sea ...
in the indexer program GLIMPSE. agrep is under a free ISC License.


Alternative implementations

A more recent agrep is the command-line tool provided with the TRE regular expression library. TRE agrep is more powerful than Wu-Manber agrep since it allows weights and total costs to be assigned separately to individual groups in the pattern. It can also handle Unicode. Unlike Wu-Manber agrep, TRE agrep is licensed under a 2-clause BSD-like license. FREJ (Fuzzy Regular Expressions for Java) open-source library provides command-line interface which could be used in the way similar to agrep. Unlike agrep or TRE it could be used for constructing complex substitutions for matched text. However its syntax and matching abilities differs significantly from ones of ordinary
regular expression A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
s.


See also

* Bitap algorithm * TRE (computing)


References

{{Reflist


External links

* Wu-Manber agrep
AGREP home page

For Unix
(To compile under OSX 10.8, add -Wno-return-type to the CFLAGs = -O line in the Makefile) *See also
TRE regexp matching package

cgrep a defunct command line approximate string matching tool

nrgrep
a command line approximate string matching tool

Information retrieval systems Unix text processing utilities Software using the ISC license