Weissman score
The Weissman score is a performance metric for lossless compression applications. It was developed by Tsachy Weissman, a professor at Stanford University, and Vinith Misra, a graduate student, at the request of producers for HBO's television series Silicon Valley, a television show about a fictional tech start-up working on a data compression algorithm.[1][2][3][4] It compares both required time and compression ratio of measured applications, with those of a de facto standard according to the data type.
The formula is the following; where r is the compression ratio, T is the time required to compress, the overlined ones are the same metrics for a standard compressor, and alpha is a scaling constant.[1]
Weissman score has been used by Daniel Reiter Horn and Mehant Baid of Dropbox to explain real-world work on lossless compression. According to the authors it "favors compression speed over ratio in most cases."[5]
Example
This example shows the score for the data of the Hutter Prize,[6] using the paq8f as a standard and 1 as the scaling constant.
Application | Compression ratio | Compression time [min] | Weissman score |
---|---|---|---|
paq8f | 5.467600 | 300 | 1.000000 |
raq8g | 5.514990 | 420 | 0.720477 |
paq8hkcc | 5.682593 | 300 | 1.039321 |
paq8hp1 | 5.692566 | 300 | 1.041145 |
paq8hp2 | 5.750279 | 300 | 1.051701 |
paq8hp3 | 5.800033 | 300 | 1.060801 |
paq8hp4 | 5.868829 | 300 | 1.073826 |
paq8hp5 | 5.917719 | 300 | 1.082325 |
paq8hp6 | 5.976643 | 300 | 1.093102 |
paq8hp12 | 6.104276 | 540 | 0.620247 |
decomp8 | 6.261574 | 540 | 0.63623 |
decomp8 | 6.276295 | 540 | 0.637726 |
Limitations
Although the value is relative to the standards against which it is compared, the unit used to measure the times changes the score (see examples 1 and 2). This is a consequence of the requirement that the argument of the logarithmic function must be dimensionless. The multiplier also can't have a numeric value of 1 or less, because the logarithm of 1 is 0 (examples 3 and 4), and the logarithm of any value less than 1 is negative (examples 5 and 6); that would result in scores of value 0 (even with changes), undefined, or negative (even if better than positive).
Examples
# | Standard compressor | Scored compressor | Weissman score | Observations | ||||
---|---|---|---|---|---|---|---|---|
Compression ratio | Compression time | Log (compression time) | Compression ratio | Compression time | Log (compression time) | |||
1 | 2.1 | 2 min | 0.30103 | 3.4 | 3 min | 0.477121 | 1×(3.4/2.1)×(0.30103/0.477121)=1.021506 | Change in unit or scale, changes the result. |
2 | 2.1 | 2 min | 2.079181 | 3.4 | 3 min | 2.255273 | 1×(3.4/2.1)×(2.079181/2.255273)=1.492632 | |
3 | 2.2 | 1 min | 0 | 3.3 | 1.5 min | 0.176091 | 1×(3.3/2.2)×(0/0.176091)=0 | If time is 1, its log is 0; then the score can be 0 or infinity. |
4 | 2.2 | 0.667 min | −0.176091 | 3.3 | 1 min | 0 | 1×(3.3/2.2)×(−0.176091/0)=infinity | |
5 | 1.6 | 0.5 h | −0.30103 | 2.9 | 1.1 h | 0.041393 | 1×(2.9/1.6)×(−0.30103/0.041393)=−13.18138 | If time is less than 1, its log is negative; then the score can be negative. |
6 | 1.6 | 1.1 h | 0.041393 | 1.6 | 0.9 h | −0.045757 | 1×(1.6/1.6)×(0.041393/−0.045757)=−0.904627 |
References
- Perry, Tekla (July 28, 2014). "A Fictional Compression Metric Moves Into the Real World". Retrieved January 25, 2016.
- Perry, Tekla (July 25, 2014). "A Made-For-TV Compression Algorithm". Retrieved January 25, 2016.
- Sandberg, Elise (April 12, 2014). "HBO's 'Silicon Valley' Tech Advisor on Realism, Possible Elon Musk Cameo". The Hollywood Reporter. Retrieved June 10, 2014.
- Jurgensen, John; Rusli, Evelyn M. (April 3, 2014). "There's a New Geek in Town: HBO's 'Silicon Valley'". The Wall Street Journal. Retrieved June 10, 2014.
- "Lossless compression with Brotli in Rust for a bit of Pied Piper on the backend". Dropbox Tech Blog. Retrieved 2017-06-24.
- Hutter, Marcus (July 2016). "Contestants". Retrieved January 25, 2016.