Re-Pair
   HOME
*



picture info

Re-Pair
Re-Pair (short for Recursive Pairing) is a grammar-based compression algorithm that, given an input text, builds a straight-line program, i.e. a context-free grammar generating a single string: the input text. The grammar is built by recursively replacing the most frequent pair of characters occurring in the text. Once there is no pair of characters occurring twice, the resulting string is used as the axiom of the grammar. Therefore, the output grammar is such that all rules but the axiom have two symbols on the right-hand side. How it works Re-Pair was first introduced by NJ. Larsson and A. MoffatLarsson, N. J., & Moffat, A. (2000). Off-line dictionary-based compression. Proceedings of the IEEE, 88(11), 1722-1732. in 1999. In their paper the algorithm is presented together with a detailed description of the data structures required to implement it with linear time and space complexity. The experiments showed that Re-Pair achieves high compression ratios and offers good perf ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Grammar-based Code
Grammar-based codes or Grammar-based compression are compression algorithms based on the idea of constructing a context-free grammar (CFG) for the string to be compressed. Examples include universal lossless data compression algorithms. To compress a data sequence x = x_1 \cdots x_n, a grammar-based code transforms x into a context-free grammar G. The problem of finding a smallest grammar for an input sequence (smallest grammar problem) is known to be NP-hard, so many grammar-transform algorithms are proposed from theoretical and practical viewpoints. Generally, the produced grammar G is further compressed by statistical encoders like arithmetic coding. Examples and characteristics The class of grammar-based codes is very broad. It includes block codes, the multilevel pattern matching (MPM) algorithm, variations of the incremental parsing Lempel-Ziv code, and many other new universal lossless compression algorithms. Grammar-based codes are universal in the sense that they can ac ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Byte Pair Encoding
Byte pair encoding or digram coding is a simple form of data compression in which the most common pair of consecutive bytes of data is replaced with a byte that does not occur within that data. A table of the replacements is required to rebuild the original data. The algorithm was first described publicly by Philip Gage in a February 1994 article "A New Algorithm for Data Compression" in the ''C Users Journal''. A variant of the technique has shown to be useful in several natural language processing (NLP) applications, such as Google's SentencePiece, and OpenAI's GPT-3. Here, the goal is not data compression, but encoding text in a given language as a sequence of 'tokens', using a fixed vocabulary of different tokens. Typically, most words will be encoded as a single token, while rare words will be encoded as a sequence of a few tokens, where these tokens represent meaningful word parts. This translation of text into tokens can be found by a variant of byte pair encoding. Byte pair ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Compression Algorithms
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder. The process of reducing the size of a data file is often referred to as data compression. In the context of data transmission, it is called source coding; encoding done at the source of the data before it is stored or transmitted. Source coding should not be confused with channel coding, for error detection and correction or line coding, the means for mapping data onto a signal. Co ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Straight-line Program
In mathematics, more specifically in computational algebra, a straight-line program (SLP) for a finite group ''G'' = ⟨''S''⟩ is a finite sequence ''L'' of elements of ''G'' such that every element of ''L'' either belongs to ''S'', is the inverse of a preceding element, or the product of two preceding elements. An SLP ''L'' is said to ''compute'' a group element ''g'' ∈ ''G'' if ''g'' ∈ ''L'', where ''g'' is encoded by a word in ''S'' and its inverses. Intuitively, an SLP computing some ''g'' ∈ ''G'' is an ''efficient'' way of storing ''g'' as a group word over ''S''; observe that if ''g'' is constructed in ''i'' steps, the word length of ''g'' may be exponential in ''i'', but the length of the corresponding SLP is linear in ''i''. This has important applications in computational group theory, by using SLPs to efficiently encode group elements as words over a given generating set. Straight-line programs were introduced by Bab ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Context-free Grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empty). A formal grammar is "context-free" if its production rules can be applied regardless of the context of a nonterminal. No matter which symbols surround it, the single nonterminal on the left hand side can always be replaced by the right hand side. This is what distinguishes it from a context-sensitive grammar. A formal grammar is essentially a set of production rules that describe all possible strings in a given formal language. Production rules are simple replacements. For example, the first rule in the picture, :\langle\text\rangle \to \langle\text\rangle = \langle\text\rangle ; replaces \langle\text\rangle with \langle\text\rangle = \langle\text\rangle ;. There can be multiple replacement rules for a given nonterminal symbol. The ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Re Pair Example
Re or RE may refer to: Geography * Re, Norway, a former municipality in Vestfold county, Norway * Re, Vestland, a village in Gloppen municipality, Vestland county, Norway * Re, Piedmont, an Italian municipality * Île de Ré, an island off the west coast of France ** Le Bois-Plage-en-Ré, a commune on that island * Re di Anfo, a torrent (seasonal stream) in Italy * Re di Gianico, Re di Niardo, Re di Sellero, and Re di Tredenus, torrents in the Val Camonica * Réunion (ISO 3166-1 code), a French overseas department and island in the Indian Ocean Music * Re, the second syllable of the scale in solfège ** Re, or D (musical note), the second note of the musical scale in ''fixed do'' solfège * Re: (band), a musical duo based in Canada and the United States Albums * ''Re'' (Café Tacuba album) * ''Re'' (Les Rita Mitsouko album) * ''Re.'' (Aya Ueto album) * ''Re:'' (Kard EP) Other media * Resident Evil, popular video game franchise of survival horror * ''...Re'' (film), a 2016 ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Structure Repair
A structure is an arrangement and organization of interrelated elements in a material object or system, or the object or system so organized. Material structures include man-made objects such as buildings and machines and natural objects such as biological organisms, minerals and chemicals. Abstract structures include data structures in computer science and musical form. Types of structure include a hierarchy (a cascade of one-to-many relationships), a network featuring many-to-many links, or a lattice featuring connections between components that are neighbors in space. Load-bearing Buildings, aircraft, skeletons, anthills, beaver dams, bridges and salt domes are all examples of load-bearing structures. The results of construction are divided into buildings and non-building structures, and make up the infrastructure of a human society. Built structures are broadly divided by their varying design approaches and standards, into categories including building structures, archi ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Slide00
Slide or Slides may refer to: Places * Slide, California, former name of Fortuna, California Arts, entertainment, and media Music Albums * ''Slide'' (Lisa Germano album), 1998 * ''Slide'' (George Clanton album), 2018 *''Slide'', by Patrick Gleeson, 2007 * ''Slide'' (Luna EP), 1993 * ''Slide'' (Madeline Merlo EP), 2022 Songs * "Slide" (The Big Dish song), 1986 * "Slide" (Goo Goo Dolls song), 1998 * "Slide" (Calvin Harris song), 2017 * "Slide" (French Montana song), 2019 * "Slide" (H.E.R. song), 2019 * "Slide" (Slave song), 1977 * "Step Back"/"Slide", by Superheist, 2001 *"Slide", by Dido from ''No Angel'' *"Slide", by Madeline Merlo from ''Slide'', 2022 *"The Slide", by Cowboy Junkies from ''One Soul Now'' Other uses in music * Slide (musical ornament), a musical embellishment found particularly in Baroque music *Slide (tune type), a tune type in Irish traditional music, common to the Sliabh Luachra area *Slide, a 1970s disco side project of Rod McKuen's *''The Slide'', a jukebo ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Slide01
Slide or Slides may refer to: Places *Slide, California, former name of Fortuna, California Arts, entertainment, and media Music Albums * ''Slide'' (Lisa Germano album), 1998 * ''Slide'' (George Clanton album), 2018 *''Slide'', by Patrick Gleeson, 2007 * ''Slide'' (Luna EP), 1993 * ''Slide'' (Madeline Merlo EP), 2022 Songs * "Slide" (The Big Dish song), 1986 * "Slide" (Goo Goo Dolls song), 1998 * "Slide" (Calvin Harris song), 2017 * "Slide" (French Montana song), 2019 * "Slide" (H.E.R. song), 2019 * "Slide" (Slave song), 1977 * "Step Back"/"Slide", by Superheist, 2001 *"Slide", by Dido from ''No Angel'' *"Slide", by Madeline Merlo from '' Slide'', 2022 *"The Slide", by Cowboy Junkies from '' One Soul Now'' Other uses in music * Slide (musical ornament), a musical embellishment found particularly in Baroque music * Slide (tune type), a tune type in Irish traditional music, common to the Sliabh Luachra area *Slide, a 1970s disco side project of Rod McKuen's *''The Slide'', a juke ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Variable-length Code
In coding theory a variable-length code is a code which maps source symbols to a ''variable'' number of bits. Variable-length codes can allow sources to be compressed and decompressed with ''zero'' error (lossless data compression) and still be read back symbol by symbol. With the right coding strategy an independent and identically-distributed source may be compressed almost arbitrarily close to its entropy. This is in contrast to fixed length coding methods, for which data compression is only possible for large blocks of data, and any compression beyond the logarithm of the total number of possibilities comes with a finite (though perhaps arbitrarily small) probability of failure. Some examples of well-known variable-length coding strategies are Huffman coding, Lempel–Ziv coding, arithmetic coding, and context-adaptive variable-length coding. Codes and their extensions The extension of a code is the mapping of finite length source sequences to finite length bit strings ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Sequitur Algorithm
Sequitur (or Nevill-Manning algorithm) is a recursive algorithm developed by Craig Nevill-Manning and Ian H. Witten in 1997 that infers a hierarchical structure (context-free grammar) from a sequence of discrete symbols. The algorithm operates in linear space and time. It can be used in data compression software applications. Constraints The sequitur algorithm constructs a grammar by substituting repeating phrases in the given sequence with new rules and therefore produces a concise representation of the sequence. For example, if the sequence is : S→abcab, the algorithm will produce : S→AcA, A→ab. While scanning the input sequence, the algorithm follows two constraints for generating its grammar efficiently: digram uniqueness and rule utility. Digram uniqueness Whenever a new symbol is scanned from the sequence, it is appended with the last scanned symbol to form a new digram. If this digram has been formed earlier then a new rule is made to replace both occurrences of th ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]