A factor oracle is a
finite state automaton
A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
that can efficiently search for factors (
substring
In formal language theory and computer science, a substring is a contiguous sequence of characters within a string. For instance, "''the best of''" is a substring of "''It was the best of times''". In contrast, "''Itwastimes''" is a subsequence ...
s) in a body of text. Older techniques, such as
suffix tree
In computer science, a suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow parti ...
s, were time-efficient but required significant amounts of memory. Factor oracles, by contrast, can be constructed in linear time and space in an incremental fashion.
Overview
Older techniques for matching strings include:
suffix array
In computer science, a suffix array is a sorted array of all suffixes of a string. It is a data structure used in, among others, full-text indices, data-compression algorithms, and the field of bibliometrics.
Suffix arrays were introduced by a ...
s,
suffix tree
In computer science, a suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow parti ...
s,
suffix automata or
directed acyclic word graphs, and factor automata (Allauzen, Crochemore, Raffinot, 1999). In 1999, Allauzen, Crochemore, and Raffinot, presented the factor oracle algorithm as a memory efficient improvement upon these older techniques for string matching and compression. Starting in the mid-2000s, factor oracles have found application in computer music, as well.
[Assayag G., Dubnov S., ''Using Factor Oracles for Machine Improvisation.'' Soft Computing - A Fusion of Foundations, Methodologies and Applications. 2004-09-01. Springer Berlin / Heidelberg]
Implementations
Th
Computer Audition Laboratoryprovides a Matlab implementation of the factor oracle algorithm.
See also
*
Suffix array
In computer science, a suffix array is a sorted array of all suffixes of a string. It is a data structure used in, among others, full-text indices, data-compression algorithms, and the field of bibliometrics.
Suffix arrays were introduced by a ...
*
Generalised suffix tree
In computer science, a generalized suffix tree is a suffix tree for a set of strings. Given the set of strings D=S_1,S_2,\dots,S_d of total length n, it is a Patricia tree containing all n suffixes of the strings. It is mostly used in bioinformati ...
References
Automata (computation)
Substring indices