By the end of 2024, will there be open source software that I can run on my computer that will find the optimal DEFLATE encoding for any piece of data up to 1000000 bytes that I test it with in under an hour? (i.e. no naive brute force scripts)
https://www.w3.org/Graphics/PNG/RFC-1951
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ33 | |
2 | Ṁ12 | |
3 | Ṁ9 | |
4 | Ṁ6 | |
5 | Ṁ5 |
People are also trading
@ThisProfileDoesntExist Update: It won't let me resolve N/A, so I'm resolving to current probability (10%) instead.
There's past efforts to find better compression
https://developers.googleblog.com/2013/02/compress-data-more-densely-with-zopfli.html
or reason about optimality
https://www.sciencedirect.com/science/article/pii/S1570866713000257
but sticking to DEFLATE compatibility is fundamentally limiting.
On a Hutter prize benchmark (http://mattmahoney.net/dc/text.html) the closest DEFLATE-compatible (pigz in `-11` Zopfli mode) is 2x-3x larger than state of the art. It's ~30% larger than compatibility-breaking LZ77 implementations, or other algorithms with very low cost decompression.
It thus seems like any cloud provider motivated to chase the size-speed tradeoff is more likely to bolt new compression standards onto HTTP2 or play with self-extracting WASM formats than chase diminishing returns (size-bounded, but still) on DEFLATE optimality.
@AdamTreat This is about a specific compression format, not Kolmogorov complexity. The optimal representation is the one with the fewest number of bytes among all possible representations for a given data string in the DEFLATE format.
@ThisProfileDoesntExist I don’t know enough about the DEFLATE format but if it is capable of encoding any arbitrary piece of data just like a bit string and you are asking for an algorithm on a Turing machine to find the optimal representation that sounds an awful lot like you are asking for an algorithm to procure the optimal Kolmogorov complexity of a bit string. Again, I don’t know enough about the DEFLATE so maybe I’m missing something.
@AdamTreat The difference is that DEFLATE is not Turing complete. There's a limited and well defined number of ways that any given piece of data can be encoded. You can trivially write a script to brute force this, it just won't run in a reasonable amount of time.
@AdamTreat For comparison, consider a compression format that is just "use a single byte-by-byte binary prefix code to encode all the data". There are well known algorithms to solve this optimally (https://en.wikipedia.org/wiki/Huffman_coding). DEFLATE is more complicated than that, but it's still simple enough that it could plausibly by solved.