Trigram Search
   HOME

TheInfoList



OR:

Trigram search is a method of searching for text when the exact syntax or spelling of the target object is not precisely known or when queries may be regular expressions. It finds objects which match the maximum number of three consecutive character
string String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
s (i.e. trigrams) in the entered search terms, which are generally near matches. Two strings with many shared trigrams can be expected to be very similar. Trigrams also allow for efficiently creating indexes for searches that are regular expressions or match the text inexactly. Indexes can significantly accelerate searches. A threshold for number of trigram matches can be specified as a cutoff point, after which a result is no longer considered a match. Using trigrams for accelerating searches is a technique used in some systems for code searching, in situations in which queries that are regular expressions may be useful, in search engines such as Elasticsearch, as well as in databases such as PostgreSQL.{{Cite web , date=2022-05-12 , title=F.33. pg_trgm , url=https://www.postgresql.org/docs/14/pgtrgm.html , access-date=2022-05-28 , website=PostgreSQL Documentation , language=en


Examples

Consider the
string String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
"alice". The trigrams of the string would be "ali", "lic", and "ice," not including spaces. Searching for this string in a database with a trigram-based index would involve finding which objects contain as many of the three trigrams as possible. As a concrete example of using trigram search to search for a regular expression query, consider searching for the string ab d, where the brackets denote that the third character in the string being searched for could be c or d. In this situation, one could query the index for objects that have the two trigrams abc and bce or the two trigrams abd and bde. Thus, finding this query would involve no string matching, and could just query the index directly, which can be faster in practice.


See also

* Search engine indexing *
Approximate string matching In computer science, approximate string matching (often colloquially referred to as fuzzy string searching) is the technique of finding strings that match a pattern approximately (rather than exactly). The problem of approximate string matching ...
* Trigram * N-gram *
Regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
* Google Code Search


References

String matching algorithms Search algorithms