Smallest Grammar Problem

	Smallest Grammar Problem In data compression and the theory of formal languages, the smallest grammar problem is the problem of finding the smallest context-free grammar that generates a given string of characters (but no other string). The size of a grammar is defined by some authors as the number of symbols on the right side of the production rules. Others also add the number of rules to that. A grammar that generates only a single string, as required for the solution to this problem, is called a straight-line grammar. Every binary string of length n has a grammar of length O(n/\log n), as expressed using big O notation. For binary de Bruijn sequences, no better length is possible. The (decision version of the) smallest grammar problem is NP-complete. It can be approximated in polynomial time to within a logarithmic approximation ratio; more precisely, the ratio is O(\log\tfrac) where n is the length of the given string and g is the size of its smallest grammar. It is hard to approximate to within a cons ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Data Compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder. The process of reducing the size of a data file is often referred to as data compression. In the context of data transmission, it is called source coding: encoding is done at the source of the data before it is stored or transmitted. Source coding should not be confused with channel coding, for error detection and correction or line coding, the means for mapping data onto a sig ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Polynomial Time In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to be related by a constant factor. Since an algorithm's running time may vary among different inputs of the same size, one commonly considers the worst-case time complexity, which is the maximum amount of time required for inputs of a given size. Less common, and usually specified explicitly, is the average-case complexity, which is the average of the time taken on inputs of a given size (this makes sense because there are only a finite number of possible inputs of a given size). In both cases, the time complexity is gener ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Formal Languages In logic, mathematics, computer science, and linguistics, a formal language is a set of string (computer science), strings whose symbols are taken from a set called "#Definition, alphabet". The alphabet of a formal language consists of symbols that concatenate into strings (also called "words"). Words that belong to a particular formal language are sometimes called Formal language#Definition, ''well-formed words''. A formal language is often defined by means of a formal grammar such as a regular grammar or context-free grammar. In computer science, formal languages are used, among others, as the basis for defining the grammar of programming languages and formalized versions of subsets of natural languages, in which the words of the language represent concepts that are associated with meanings or semantics. In computational complexity theory, decision problems are typically defined as formal languages, and complexity classes are defined as the sets of the formal languages that ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Lossless Data Compression Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits Redundancy (information theory), statistical redundancy. By contrast, lossy compression permits reconstruction only of an approximation of the original data, though usually with greatly improved Bit rate#Bitrates in multimedia, compression rates (and therefore reduced media sizes). By operation of the pigeonhole principle, no lossless compression algorithm can shrink the size of all possible data: Some data will get longer by at least one symbol or bit. Compression algorithms are usually effective for human- and machine-readable documents and cannot shrink the size of random data that contain no Redundancy (information theory), redundancy. Different algorithms exist that are designed either with a specific type of input data in mind or with speci ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Kolmogorov Complexity In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program (in a predetermined programming language) that produces the object as output. It is a measure of the computational resources needed to specify the object, and is also known as algorithmic complexity, Solomonoff–Kolmogorov–Chaitin complexity, program-size complexity, descriptive complexity, or algorithmic entropy. It is named after Andrey Kolmogorov, who first published on the subject in 1963 and is a generalization of classical information theory. The notion of Kolmogorov complexity can be used to state and prove impossibility results akin to Cantor's diagonal argument, Gödel's incompleteness theorem, and Turing's halting problem. In particular, no program ''P'' computing a lower bound for each text's Kolmogorov complexity can return a value essentially larger than ''P'''s own len ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Grammar-based Code Grammar-based codes or grammar-based compression are Data compression, compression algorithms based on the idea of constructing a context-free grammar (CFG) for the string to be compressed. Examples include universal lossless data compression algorithms. To compress a data sequence x = x_1 \cdots x_n, a grammar-based code transforms x into a context-free grammar G. The problem of finding a smallest grammar for an input sequence (smallest grammar problem) is known to be NP-hard, so many grammar-transform algorithms are proposed from theoretical and practical viewpoints. Generally, the produced grammar G is further compressed by statistical encoders like arithmetic coding. Examples and characteristics The class of grammar-based codes is very broad. It includes block codes, the multilevel pattern matching (MPM) algorithm, variations of the incremental parsing LZ77 and LZ78, Lempel-Ziv code, and many other new universal lossless compression algorithms. Grammar-based codes are univers ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Approximation Ratio In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned solution to the optimal one. Approximation algorithms naturally arise in the field of theoretical computer science as a consequence of the widely believed P ≠ NP conjecture. Under this conjecture, a wide class of optimization problems cannot be solved exactly in polynomial time. The field of approximation algorithms, therefore, tries to understand how closely it is possible to approximate optimal solutions to such problems in polynomial time. In an overwhelming majority of the cases, the guarantee of such algorithms is a multiplicative one expressed as an approximation ratio or approximation factor i.e., the optimal solution is always guaranteed to be within a (predetermined) multiplicative factor of the returned solution. However, there a ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	NP-complete In computational complexity theory, NP-complete problems are the hardest of the problems to which ''solutions'' can be verified ''quickly''. Somewhat more precisely, a problem is NP-complete when: # It is a decision problem, meaning that for any input to the problem, the output is either "yes" or "no". # When the answer is "yes", this can be demonstrated through the existence of a short (polynomial length) ''solution''. # The correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by trying all possible solutions. # The problem can be used to simulate every other problem for which we can verify quickly that a solution is correct. Hence, if we could find solutions of some NP-complete problem quickly, we could quickly find the solutions of every other problem to which a given solution can be easily verified. The name "NP-complete" is short for "nondeterministic polynomial-time complete". In this name, ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Formal Language In logic, mathematics, computer science, and linguistics, a formal language is a set of strings whose symbols are taken from a set called "alphabet". The alphabet of a formal language consists of symbols that concatenate into strings (also called "words"). Words that belong to a particular formal language are sometimes called ''well-formed words''. A formal language is often defined by means of a formal grammar such as a regular grammar or context-free grammar. In computer science, formal languages are used, among others, as the basis for defining the grammar of programming languages and formalized versions of subsets of natural languages, in which the words of the language represent concepts that are associated with meanings or semantics. In computational complexity theory, decision problems are typically defined as formal languages, and complexity classes are defined as the sets of the formal languages that can be parsed by machines with limited computational power. In ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	De Bruijn Sequence In combinatorics, combinatorial mathematics, a de Bruijn sequence of order ''n'' on a size-''k'' alphabet (computer science), alphabet ''A'' is a cyclic sequence in which every possible length-''n'' String (computer science)#Formal theory, string on ''A'' occurs exactly once as a substring (i.e., as a ''contiguous'' subsequence). Such a sequence is denoted by and has length , which is also the number of distinct strings of length ''n'' on ''A''. Each of these distinct strings, when taken as a substring of , must start at a different position, because substrings starting at the same position are not distinct. Therefore, must have ''at least'' symbols. And since has ''exactly'' symbols, de Bruijn sequences are optimally short with respect to the property of containing every string of length ''n'' at least once. The number of distinct de Bruijn sequences is :\dfrac. For a binary alphabet this is 2^, leading to the following sequence for positive n: 1, 1, 2, 16, 2048, ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Big O Notation Big ''O'' notation is a mathematical notation that describes the asymptotic analysis, limiting behavior of a function (mathematics), function when the Argument of a function, argument tends towards a particular value or infinity. Big O is a member of a #Related asymptotic notations, family of notations invented by German mathematicians Paul Gustav Heinrich Bachmann, Paul Bachmann, Edmund Landau, and others, collectively called Bachmann–Landau notation or asymptotic notation. The letter O was chosen by Bachmann to stand for '':wikt:Ordnung#German, Ordnung'', meaning the order of approximation. In computer science, big O notation is used to Computational complexity theory, classify algorithms according to how their run time or space requirements grow as the input size grows. In analytic number theory, big O notation is often used to express a bound on the difference between an arithmetic function, arithmetical function and a better understood approximation; one well-known exam ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]