Weissman Score
   HOME

TheInfoList



OR:

The Weissman score is a
performance A performance is an act or process of staging or presenting a play, concert, or other form of entertainment. It is also defined as the action or process of carrying out or accomplishing an action, task, or function. Performance has evolved glo ...
metric Metric or metrical may refer to: Measuring * Metric system, an internationally adopted decimal system of measurement * An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement Mathematics ...
for
lossless compression Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statisti ...
applications. It was developed by
Tsachy Weissman Tsachy (Itschak) Weissman is a professor of Electrical Engineering at Stanford University. He is the founding director of the Stanford Compression Forum. His research interests include information theory, statistical signal processing, their appl ...
, a professor at
Stanford University Leland Stanford Junior University, commonly referred to as Stanford University, is a Private university, private research university in Stanford, California, United States. It was founded in 1885 by railroad magnate Leland Stanford (the eighth ...
, and Vinith Misra, a graduate student, at the request of producers for HBO's television series ''
Silicon Valley Silicon Valley is a region in Northern California that is a global center for high technology and innovation. Located in the southern part of the San Francisco Bay Area, it corresponds roughly to the geographical area of the Santa Clara Valley ...
'', a
television show A television show, TV program (), or simply a TV show, is the general reference to any content produced for viewing on a television set that is broadcast via over-the-air, satellite, and cable, or distributed digitally on streaming platf ...
about a fictional tech start-up working on a data compression algorithm. It compares both required time and
compression ratio The compression ratio is the ratio between the maximum and minimum volume during the compression stage of the power cycle in a piston or Wankel engine. A fundamental specification for such engines, it can be measured in two different ways. Th ...
of measured applications, with those of a ''de facto'' standard according to the
data type In computer science and computer programming, a data type (or simply type) is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these ...
. The formula is the following; where ''r'' is the
compression ratio The compression ratio is the ratio between the maximum and minimum volume during the compression stage of the power cycle in a piston or Wankel engine. A fundamental specification for such engines, it can be measured in two different ways. Th ...
, ''T'' is the time required to compress, the overlined ones are the same metrics for a standard compressor, and alpha is a scaling constant. W = \alpha The Weissman score has been used by Daniel Reiter Horn and Mehant Baid of
Dropbox Dropbox is a file hosting service operated by the American company Dropbox, Inc., headquartered in San Francisco, California, that offers cloud storage, file synchronization, personal cloud, and Client (computing), client software. Dropbox w ...
to explain real-world work on lossless compression. According to the authors it "favors compression speed over ratio in most cases."


Example

This example shows the score for the data of the
Hutter Prize The Hutter Prize is a cash prize funded by Marcus Hutter which rewards data compression improvements on a specific 1 GB English text file, with the goal of encouraging research in artificial intelligence (AI). Launched in 2006, the prize awards 5 ...
, using the paq8f as a standard and 1 as the scaling constant.


Limitations

Although the value is relative to the standards against which it is compared, the
unit Unit may refer to: General measurement * Unit of measurement, a definite magnitude of a physical quantity, defined and adopted by convention or by law **International System of Units (SI), modern form of the metric system **English units, histo ...
used to measure the times changes the score (see examples 1 and 2). This is a consequence of the requirement that the argument of the logarithmic function must be
dimensionless Dimensionless quantities, or quantities of dimension one, are quantities implicitly defined in a manner that prevents their aggregation into units of measurement. ISBN 978-92-822-2272-0. Typically expressed as ratios that align with another sy ...
. The multiplier also can't have a numeric value of 1 or less, because the logarithm of 1 is 0 (examples 3 and 4), and the logarithm of any value less than 1 is negative (examples 5 and 6); that would result in scores of value 0 (even with changes), undefined, or negative (even if better than positive).


See also

*
Benchmark Benchmark may refer to: Business and economics * Benchmarking, evaluating performance within organizations * Benchmark price * Benchmark (crude oil), oil-specific practices Science and technology * Experimental benchmarking, the act of defining a ...
*
Coding theory Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and computer data storage, data sto ...
*
Information theory Information theory is the mathematical study of the quantification (science), quantification, Data storage, storage, and telecommunications, communication of information. The field was established and formalized by Claude Shannon in the 1940s, ...
*
Phred quality score A Phred quality score is a measure of the quality of the identification of the nucleobases generated by automated DNA sequencing. It was originally developed for the computer program Phred to help in the automation of DNA sequencing in the Human ...


References

{{reflist, 2 Benchmarks (computing) Data compression Silicon Valley (TV series) Software metrics