Continuing on the same note as the Greenwald-Khanna post, we discuss another novel data structure for order statistics called Q-Digest. Compared to GK, Q-Digest is much simpler, easier to implement and extends to a distributed solution very easily. The data structure was designed with sensor networks in mind, where minimizing radio transmission overhead is of paramount importance. It originally appeared in this paper by Shrivastava et al
Q-Digest is simply a complete binary tree over the range of values, where the leaf nodes are the values themselves. In the example below, we have a digest over the values 1 to 4 inclusive, and we have seen the value 1 once and the value 4 seven times.
The novelty is the compression algorithm which allows values with low frequencies to propagate up the tree and be combined. In the example below, we have some information loss. We know that there are 10 values between 1 and 2 inclusive, but we don’t know the exact counts of each