Straight-line Grammar
   HOME

TheInfoList



OR:

A straight-line grammar (sometimes abbreviated as SLG) is a
formal grammar A formal grammar is a set of Terminal and nonterminal symbols, symbols and the Production (computer science), production rules for rewriting some of them into every possible string of a formal language over an Alphabet (formal languages), alphabe ...
that generates exactly one string.Florian Benz and Timo Kötzing, “An effective heuristic for the smallest grammar problem,” Proceedings of the fifteenth annual conference on Genetic and evolutionary computation conference - GECCO ’13, 2013. , p. 488 Consequently, it does not branch (every non-terminal has only one associated production rule) nor loop (if non-terminal ''A'' appears in a derivation of ''B'', then ''B'' does not appear in a derivation of ''A'').


Areas of usefulness

Straight-line grammars are widely used in the development of algorithms that execute directly on compressed structures (without prior decompression). SLGs are of interest in fields like
Kolmogorov complexity In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program (in a predetermined programming language) that prod ...
,
Lossless data compression Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits Redundanc ...
, Structure discovery and
Compressed data structure The term compressed data structure arises in the computer science subfields of algorithms, data structures, and theoretical computer science. It refers to a data structure whose operations are roughly as fast as those of a conventional data structu ...
s. The problem of finding a context-free grammar (equivalently: an SLG) of minimal size that generates a given string is called the smallest grammar problem. Straight-line grammars (more precisely: straight-line context-free string grammars) can be generalized to ''Straight-line context-free tree grammars''. The latter can be used conveniently to compress
trees In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only p ...
.


Formal definition

A
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the fo ...
''G'' is an SLG if: 1. for every non-terminal ''N'', there is at most one production rule that has ''N'' as its left-hand side, and 2. the directed graph ''G''=<''V'',''E''>, defined by ''V'' being the set of non-terminals and (''A'',''B'') ∈ ''E'' whenever ''B'' appears at the right-hand side of a production rule for ''A'', is acyclic. A mathematical definition of the more general formalism of straight-line context-free tree grammars can be found in Lohrey et al. An SLG in Chomsky normal form is equivalent to a
straight-line program In computer science, a straight-line program is, informally, a program that does not contain any loop or any test, and is formed by a sequence of steps that apply each an operation to previously computed elements. This article is devoted to the c ...
.


A list of algorithms using SLGs

* The
Sequitur algorithm Sequitur (or Nevill-Manning–Witten algorithm) is a recursive algorithm developed by Craig Nevill-Manning and Ian H. Witten in 1997 that infers a hierarchical structure (context-free grammar) from a sequence of discrete symbols. The algorithm ope ...
constructs a straight-line grammar for a given string. * The Lempel-Ziv-Welch algorithm creates a context-free grammar in such a deterministic way that it is necessary to store only the start rule of the generated grammar. *
Byte pair encoding Byte-pair encoding (also known as BPE, or digram coding) is an algorithm, first described in 1994 by Philip Gage, for encoding strings of text into smaller strings by creating and using a translation table. A slightly modified version of the algori ...


See also

* * Non-recursive grammar - a grammar that does not loop, but may branch; generating a finite rather than a singleton language


References

{{reflist Formal languages