Matt Mahoney (computer Scientist)

	Matt Mahoney (computer Scientist) PAQ is a series of lossless data compression archivers that have gone through collaborative development to top rankings on several benchmarks measuring compression ratio (although at the expense of speed and memory usage). Specialized versions of PAQ have won the Hutter Prize and the Calgary Challenge. PAQ is free software distributed under the GNU General Public License. Algorithm PAQ uses a context mixing algorithm. Context mixing is related to prediction by partial matching (PPM) in that the compressor is divided into a predictor and an arithmetic coder, but differs in that the next-symbol prediction is computed using a weighted combination of probability estimates from a large number of models conditioned on different contexts. Unlike PPM, a context doesn't need to be contiguous. Most PAQ versions collect next-symbol statistics for the following contexts: * ''n''-grams; the context is the last bytes before the predicted symbol (as in PPM); * whole-word ''n''-grams, ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Noisy-channel Coding Theorem In information theory, the noisy-channel coding theorem (sometimes Shannon's theorem or Shannon's limit), establishes that for any given degree of noise contamination of a communication channel, it is possible to communicate discrete data (digital information) nearly error-free up to a computable maximum rate through the channel. This result was presented by Claude Shannon in 1948 and was based in part on earlier work and ideas of Harry Nyquist and Ralph Hartley. The Shannon limit or Shannon capacity of a communication channel refers to the maximum rate of error-free data that can theoretically be transferred over the channel if the link is subject to random data transmission errors, for a particular noise level. It was first described by Shannon (1948), and shortly after published in a book by Shannon and Warren Weaver entitled '' The Mathematical Theory of Communication'' (1949). This founded the modern discipline of information theory. Overview Stated by Claude Shan ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Prediction By Partial Matching Prediction by partial matching (PPM) is an adaptive statistical data compression technique based on context modeling and prediction. PPM models use a set of previous symbols in the uncompressed symbol stream to predict the next symbol in the stream. PPM algorithms can also be used to cluster data into predicted groupings in cluster analysis. Theory Predictions are usually reduced to symbol rankings. Each symbol (a letter, bit or any other amount of data) is ranked before it is compressed, and the ranking system determines the corresponding codeword (and therefore the compression rate). In many compression algorithms, the ranking is equivalent to probability mass function estimation. Given the previous letters (or given a context), each symbol is assigned with a probability. For instance, in arithmetic coding the symbols are ranked by their probabilities to appear after previous symbols, and the whole sequence is compressed into a single fraction that is computed according to ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit of memory in many computer architectures. To disambiguate arbitrarily sized bytes from the common 8-bit definition, network protocol documents such as The Internet Protocol () refer to an 8-bit byte as an octet. Those bits in an octet are usually counted with numbering from 0 to 7 or 7 to 0 depending on the bit endianness. The first bit is number 0, making the eighth bit number 7. The size of the byte has historically been hardware-dependent and no definitive standards existed that mandated the size. Sizes from 1 to 48 bits have been used. The six-bit character code was an often-used implementation in early encoding systems, and computers using six-bit and nine-bit bytes were common in the 1960s. These systems often had memory wo ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Nanosecond A nanosecond (ns) is a unit of time in the International System of Units (SI) equal to one billionth of a second, that is, of a second, or 10 seconds. The term combines the SI prefix ''nano-'' indicating a 1 billionth submultiple of an SI unit (e.g. nanogram, nanometre, etc.) and ''second'', the primary unit of time in the SI. A nanosecond is equal to 1000 picoseconds or microsecond. Time units ranging between 10 and 10 seconds are typically expressed as tens or hundreds of nanoseconds. Time units of this granularity are commonly found in telecommunications, pulsed lasers, and related aspects of electronics. Common measurements * 0.001 nanoseconds – one picosecond * 0.5 nanoseconds – the half-life of beryllium-13. * 0.96 nanoseconds – 100 Gigabit Ethernet Interpacket gap * 1.0 nanosecond – cycle time of an electromagnetic wave with a frequency of 1 GHz (1 hertz). * 1.0 nanosecond – electromagnetic wavelength of 1 light-nanosecond. ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	English Wikipedia The English Wikipedia is, along with the Simple English Wikipedia, one of two English-language editions of Wikipedia, an online encyclopedia. It was founded on January 15, 2001, as Wikipedia's first edition, and, as of , has the most articles of any edition, at . As of , of articles in all Wikipedias belong to the English-language edition; this share was more than 50% in 2003. The edition's one-billionth edit was made on January 13, 2021. Articles The English Wikipedia has pioneered some ideas as conventions, policies or features which were later adopted by Wikipedia editions in some of the other languages. These ideas include "featured articles", the neutral-point-of-view policy, navigation templates, the sorting of short "stub" articles into sub-categories, dispute resolution mechanisms such as mediation and arbitration, and weekly collaborations. It surpassed six million articles on 23 January 2020. In November 2022, the total volume of the compressed texts of it ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Gigabyte The gigabyte () is a multiple of the unit byte for digital information. The prefix '' giga'' means 109 in the International System of Units (SI). Therefore, one gigabyte is one billion bytes. The unit symbol for the gigabyte is GB. This definition is used in all contexts of science (especially data science), engineering, business, and many areas of computing, including storage capacities of hard drives, solid state drives, and tapes, as well as data transmission speeds. However, the term is also used in some fields of computer science and information technology to denote (10243 or 230) bytes, particularly for sizes of RAM. Thus, prior to 1998, some usage of ''gigabyte'' has been ambiguous. To resolve this difficulty, IEC 80000-13 clarifies that a ''gigabyte'' (GB) is 109 bytes and specifies the term ''gibibyte'' (GiB) to denote 230 bytes. These differences are still readily seen for example, when a 400 GB drive's capacity is displayed by Microsoft Windows as 372&nbs ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Hutter Prize The Hutter Prize is a cash prize funded by Marcus Hutter which rewards data compression improvements on a specific 1 GB English text file, with the goal of encouraging research in artificial intelligence (AI). Launched in 2006, the prize awards 5000 euros for each one percent improvement (with 500,000 euros total funding) in the compressed size of the file ''enwik9'', which is the larger of two files used in the Large Text Compression Benchmark; enwik9 consists of the first 1,000,000,000 characters of a specific version of English Wikipedia. The ongoing competition is organized by Hutter, Matt Mahoney, and Jim Bowery. Goals The goal of the Hutter Prize is to encourage research in artificial intelligence (AI). The organizers believe that text compression and AI are equivalent problems. Hutter proved that the optimal behavior of a goal-seeking agent in an unknown but computable environment is to guess at each step that the environment is probably controlled by one of the shortest ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Lookup Table In computer science, a lookup table (LUT) is an array that replaces runtime computation with a simpler array indexing operation. The process is termed as "direct addressing" and LUTs differ from hash tables in a way that, to retrieve a value v with key k, a hash table would store the value v in the slot h(k) where h is a hash function i.e. k is used to compute the slot, while in the case of LUT, the value v is stored in slot k, thus directly addressable. The savings in processing time can be significant, because retrieving a value from memory is often faster than carrying out an "expensive" computation or input/output operation. The tables may be precalculated and stored in static program storage, calculated (or "pre-fetched") as part of a program's initialization phase (memoization), or even stored in hardware in application-specific platforms. Lookup tables are also used extensively to validate input values by matching against a list of valid (or invalid) items in an array an ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Hash Table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', also called a ''hash code'', into an array of ''buckets'' or ''slots'', from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored. Ideally, the hash function will assign each key to a unique bucket, but most hash table designs employ an imperfect hash function, which might cause hash '' collisions'' where the hash function generates the same index for more than one key. Such collisions are typically accommodated in some way. In a well-dimensioned hash table, the average time complexity for each lookup is independent of the number of elements stored in the table. Many hash table designs also allow arbitrary insertions and deletions of key–value pa ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Nonlinear In mathematics and science, a nonlinear system is a system in which the change of the output is not proportional to the change of the input. Nonlinear problems are of interest to engineers, biologists, physicists, mathematicians, and many other scientists because most systems are inherently nonlinear in nature. Nonlinear dynamical systems, describing changes in variables over time, may appear chaotic, unpredictable, or counterintuitive, contrasting with much simpler linear systems. Typically, the behavior of a nonlinear system is described in mathematics by a nonlinear system of equations, which is a set of simultaneous equations in which the unknowns (or the unknown functions in the case of differential equations) appear as variables of a polynomial of degree higher than one or in the argument of a function which is not a polynomial of degree one. In other words, in a nonlinear system of equations, the equation(s) to be solved cannot be written as a linear combination of t ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Root Mean Square In mathematics and its applications, the root mean square of a set of numbers x_i (abbreviated as RMS, or rms and denoted in formulas as either x_\mathrm or \mathrm_x) is defined as the square root of the mean square (the arithmetic mean of the squares) of the set. The RMS is also known as the quadratic mean (denoted M_2) and is a particular case of the generalized mean. The RMS of a continuously varying function (denoted f_\mathrm) can be defined in terms of an integral of the squares of the instantaneous values during a cycle. For alternating electric current, RMS is equal to the value of the constant direct current that would produce the same power dissipation in a resistive load. In estimation theory, the root-mean-square deviation of an estimator is a measure of the imperfection of the fit of the estimator to the data. Definition The RMS value of a set of values (or a continuous-time waveform) is the square root of the arithmetic mean of the squares of the values, ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]