Dictionary compression is a known technique, promising to solve the problem of compressing small inputs. However, it has been only available to implementers since relatively recently, as newer compression algorithms shipped dictionary builders alongside their main codec. Due to this recent timeframe, complexities around deploying this solution at larger scales only start to be appreciated. Understanding these difficulties, and finding ways to harness them, is key to target system's performance and reliability. Yet, the price is big, as dictionary compression not only improves compression, it also offers the potential to redesign systems around their capabilities.
We'll cover the benefits, trade-off and operational difficulties of dictionary compression, as well as their important second-order impacts for systems adopting it.
Yann Collet is a software programmer working at Facebook, and the Tech Lead of its Data Compression team. Several of his open-source projects have achieved mainstream status, such as Zstandard (https://www.zstandard.org), a modern compression algorithm with a wide range of operating trade-offs, LZ4 (https://www.lz4.org), a light and very fast data compression library, xxHash (https://www.xxhash.com), an extremely fast non-cryptographic hash algorithm, and Finite State Entropy (https://github.com/Cyan4973/FiniteStateEntropy), the first known efficient ANS entropy coder (a modern replacement of Huffman). His primary research interest is in applied data compression, with a focus on building solutions which become highly deployable, for cloud data centers, terminals and applications.