HOME

TheInfoList



OR:

In
data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...
and the theory of
formal language In logic, mathematics, computer science, and linguistics, a formal language is a set of strings whose symbols are taken from a set called "alphabet". The alphabet of a formal language consists of symbols that concatenate into strings (also c ...
s, the smallest grammar problem is the problem of finding the smallest
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the fo ...
that generates a given
string String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
of characters (but no other string). The size of a grammar is defined by some authors as the number of symbols on the right side of the production rules. Others also add the number of rules to that. A grammar that generates only a single string, as required for the solution to this problem, is called a
straight-line grammar A straight-line grammar (sometimes abbreviated as SLG) is a formal grammar that generates exactly one string.Florian Benz and Timo Kötzing, “An effective heuristic for the smallest grammar problem,” Proceedings of the fifteenth annual conferen ...
. Every binary string of length n has a grammar of length O(n/\log n), as expressed using
big O notation Big ''O'' notation is a mathematical notation that describes the asymptotic analysis, limiting behavior of a function (mathematics), function when the Argument of a function, argument tends towards a particular value or infinity. Big O is a memb ...
. For binary
de Bruijn sequence In combinatorics, combinatorial mathematics, a de Bruijn sequence of order ''n'' on a size-''k'' alphabet (computer science), alphabet ''A'' is a cyclic sequence in which every possible length-''n'' String (computer science)#Formal theory, stri ...
s, no better length is possible. The (decision version of the) smallest grammar problem is
NP-complete In computational complexity theory, NP-complete problems are the hardest of the problems to which ''solutions'' can be verified ''quickly''. Somewhat more precisely, a problem is NP-complete when: # It is a decision problem, meaning that for any ...
. It can be approximated in
polynomial time In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations p ...
to within a logarithmic
approximation ratio In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned sol ...
; more precisely, the ratio is O(\log\tfrac) where n is the length of the given string and g is the size of its smallest grammar. It is hard to approximate to within a constant approximation ratio. An improvement of the approximation ratio to o(\log n/\log\log n) would also improve certain algorithms for approximate addition chains.


See also

*
Grammar-based code Grammar-based codes or grammar-based compression are Data compression, compression algorithms based on the idea of constructing a context-free grammar (CFG) for the string to be compressed. Examples include universal lossless data compression algo ...
*
Kolmogorov complexity In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program (in a predetermined programming language) that prod ...
*
Lossless data compression Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits Redundanc ...


References


External links

* Formal languages Data compression NP-complete problems {{algorithm-stub