Data Compression
In information theory, data compression, source coding, or bitrate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder. The process of reducing the size of a data file is often referred to as data compression. In the context of data transmission, it is called source coding; encoding done at the source of the data before it is stored or transmitted. Source coding should not be confused with channel coding, for error detection and correction or line coding, the means for mapping data onto a signal. ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] 

Information Theory
Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. The field is at the intersection of probability theory, statistics, computer science, statistical mechanics, information engineering, and electrical engineering. A key measure in information theory is entropy. Entropy quantifies the amount of uncertainty involved in the value of a random variable or the outcome of a random process. For example, identifying the outcome of a fair coin flip (with two equally likely outcomes) provides less information (lower entropy) than specifying the outcome from a roll of a die (with six equally likely outcomes). Some other important measures in information theory are mutual information, channel capacity, error exponents, and relative entropy. Important subfields of information theory include sourc ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] 

Runlength Encoding
Runlength encoding (RLE) is a form of lossless data compression in which ''runs'' of data (sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most efficient on data that contains many such runs, for example, simple graphic images such as icons, line drawings, Conway's Game of Life, and animations. For files that do not have many runs, RLE could increase the file size. RLE may also be used to refer to an early graphics file format supported by CompuServe for compressing black and white images, but was widely supplanted by their later Graphics Interchange Format (GIF). RLE also refers to a littleused image format in Windows 3.x, with the extension rle, which is a runlength encoded bitmap, used to compress the Windows 3.x startup screen. Example Consider a screen containing plain black text on a solid white background. There will be many long runs of white pi ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] 

Probabilistic Model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form, the datagenerating process. A statistical model is usually specified as a mathematical relationship between one or more random variables and other nonrandom variables. As such, a statistical model is "a formal representation of a theory" ( Herman Adèr quoting Kenneth Bollen). All statistical hypothesis tests and all statistical estimators are derived via statistical models. More generally, statistical models are part of the foundation of statistical inference. Introduction Informally, a statistical model can be thought of as a statistical assumption (or set of statistical assumptions) with a certain property: that the assumption allows us to calculate the probability of any event. As an example, consider a pair of ordinary sixsi ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] 

Burrows–Wheeler Transform
The Burrows–Wheeler transform (BWT, also called blocksorting compression) rearranges a character string into runs of similar characters. This is useful for compression, since it tends to be easy to compress a string that has runs of repeated characters by techniques such as movetofront transform and runlength encoding. More importantly, the transformation is ''reversible'', without needing to store any additional data except the position of the first original character. The BWT is thus a "free" method of improving the efficiency of text compression algorithms, costing only some extra computation. The Burrows–Wheeler transform is an algorithm used to prepare data for use with data compression techniques such as bzip2. It was invented by Michael Burrows and David Wheeler in 1994 while Burrows was working at DEC Systems Research Center in Palo Alto, California. It is based on a previously unpublished transformation discovered by Wheeler in 1983. The algorithm can be imp ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] 

Prediction By Partial Matching
Prediction by partial matching (PPM) is an adaptive statistical data compression technique based on context modeling and prediction. PPM models use a set of previous symbols in the uncompressed symbol stream to predict the next symbol in the stream. PPM algorithms can also be used to cluster data into predicted groupings in cluster analysis. Theory Predictions are usually reduced to symbol rankings. Each symbol (a letter, bit or any other amount of data) is ranked before it is compressed, and the ranking system determines the corresponding codeword (and therefore the compression rate). In many compression algorithms, the ranking is equivalent to probability mass function estimation. Given the previous letters (or given a context), each symbol is assigned with a probability. For instance, in arithmetic coding the symbols are ranked by their probabilities to appear after previous symbols, and the whole sequence is compressed into a single fraction that is computed accordin ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] 

Randomized Algorithm
A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic or procedure. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performance in the "average case" over all possible choices of random determined by the random bits; thus either the running time, or the output (or both) are random variables. One has to distinguish between algorithms that use the random input so that they always terminate with the correct answer, but where the expected running time is finite ( Las Vegas algorithms, for example Quicksort), and algorithms which have a chance of producing an incorrect result ( Monte Carlo algorithms, for example the Monte Carlo algorithm for the MFAS problem) or fail to produce a result either by signaling a failure or failing to terminate. In some cases, probabilistic algorithms are the only practical means of solving a problem. In common practice, randomized a ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] 

RePair
RePair (short for Recursive Pairing) is a grammarbased compression algorithm that, given an input text, builds a straightline program, i.e. a contextfree grammar In formal language theory, a contextfree grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be em ... generating a single string: the input text. The grammar is built by recursively replacing the most frequent pair of characters occurring in the text. Once there is no pair of characters occurring twice, the resulting string is used as the axiom of the grammar. Therefore, the output grammar is such that all rules but the axiom have two symbols on the righthand side. How it works RePair was first introduced by NJ. Larsson and A. MoffatLarsson, N. J., & Moffat, A. (2000). Offline dictionarybased compression. Proceedings of the IEEE, 88(11), 17221732. in 1999. ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] 

Sequitur Algorithm
Sequitur (or NevillManning algorithm) is a recursive algorithm developed by Craig NevillManning and Ian H. Witten in 1997 that infers a hierarchical structure (contextfree grammar) from a sequence of discrete symbols. The algorithm operates in linear space and time. It can be used in data compression software applications. Constraints The sequitur algorithm constructs a grammar by substituting repeating phrases in the given sequence with new rules and therefore produces a concise representation of the sequence. For example, if the sequence is : S→abcab, the algorithm will produce : S→AcA, A→ab. While scanning the input sequence, the algorithm follows two constraints for generating its grammar efficiently: digram uniqueness and rule utility. Digram uniqueness Whenever a new symbol is scanned from the sequence, it is appended with the last scanned symbol to form a new digram. If this digram has been formed earlier then a new rule is made to replace both occurrences of ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] 

Grammarbased Codes
Grammarbased codes or Grammarbased compression are compression algorithms based on the idea of constructing a contextfree grammar (CFG) for the string to be compressed. Examples include universal lossless data compression algorithms. To compress a data sequence x = x_1 \cdots x_n, a grammarbased code transforms x into a contextfree grammar G. The problem of finding a smallest grammar for an input sequence (smallest grammar problem) is known to be NPhard, so many grammartransform algorithms are proposed from theoretical and practical viewpoints. Generally, the produced grammar G is further compressed by statistical encoders like arithmetic coding. Examples and characteristics The class of grammarbased codes is very broad. It includes block codes, the multilevel pattern matching (MPM) algorithm, variations of the incremental parsing LempelZiv code, and many other new universal lossless compression algorithms. Grammarbased codes are universal in the sense that they ca ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] 

Huffman Coding
In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algorithm developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the 1952 paper "A Method for the Construction of MinimumRedundancy Codes". The output from Huffman's algorithm can be viewed as a variablelength code table for encoding a source symbol (such as a character in a file). The algorithm derives this table from the estimated probability or frequency of occurrence (''weight'') for each possible value of the source symbol. As in other entropy encoding methods, more common symbols are generally represented using fewer bits than less common symbols. Huffman's method can be efficiently implemented, finding a code in time linear to the number of input weights if these weights are sorted. However, althou ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] 

Graphics Interchange Format
The Graphics Interchange Format (GIF; or , see pronunciation) is a bitmap image format that was developed by a team at the online services provider CompuServe led by American computer scientist Steve Wilhite and released on 15 June 1987. It is in widespread usage on the World Wide Web due to its wide support and portability between applications and operating systems. The format supports up to 8 bits per pixel for each image, allowing a single image to reference its own palette of up to 256 different colors chosen from the 24bit RGB color space. It also supports animations and allows a separate palette of up to 256 colors for each frame. These palette limitations make GIF less suitable for reproducing color photographs and other images with color gradients, but wellsuited for simpler images such as graphics or logos with solid areas of color. GIF images are compressed using the Lempel–Ziv–Welch (LZW) lossless data compression technique to reduce the file siz ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] 

Lempel–Ziv–Welch
Lempel–Ziv–Welch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978. The algorithm is simple to implement and has the potential for very high throughput in hardware implementations. It is the algorithm of the widely used Unix file compression utility compress and is used in the GIF image format. Algorithm The scenario described by Welch's 1984 paper encodes sequences of 8bit data as fixedlength 12bit codes. The codes from 0 to 255 represent 1character sequences consisting of the corresponding 8bit character, and the codes 256 through 4095 are created in a dictionary for sequences encountered in the data as it is encoded. At each stage in compression, input bytes are gathered into a sequence until the next character would make a sequence with no code yet in the dictionary. The c ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] 