On Github shovon / cmpt469daala
Get better quality, in fewer bytes
A video codec created to either match or exceed the quality of current-generation of video codecs, such a VP9 and HEVC. Currently an ongoing effort by Xiph and Mozilla to introduce a royalty-free video codec.
Under Shannon Entropy theory, the performance of lossless compression is bounded by the entropy of the input
LZW, RLE, ZIP, etc. work great...
(Behind, you're seeing a 512 by 512 pixel image, not just a white background)
Image Size (bytes) BMP 786554 PNG 1529 JPG 3598Except not so well for more complex images
Image Size (bytes) BMP 786554 PNG 476235 JPG 408932Even worse for noise
Image Size (bytes) BMP 786554 PNG 788485 JPG 770446Humans are very forgiving for loss of some quality
Before DCT
After DCT
The block prior to the DCT: spatial domain. the block after: frequency domain
N.B. The top-left pixel in the frequency domain is called DC, and the rest are called AC
Before Quantization
After Quantization
Encoding f:A→Bf:A \to Bf:A→B
Decoding f:B→Cf:B \to Cf:B→C
Where BBB<>AAA
Where C⊆AC \subseteq AC⊆A
Before Loss
After Loss
However, most modern encoders pretty much use a variation of JPEG
Before we go ahead and use the DCT, we apply a pre-filter
We then overlap the results of the pre-filter onto the DCT
Let's go back to the DCT
Instead of applying an O(nlog(n))O(nlog(n))O(nlog(n)) algorithm, use an O(1)O(1)O(1) approximation
We use lifting
Paper on lifting
Same idea for the prefilter
Result
Paper regarding the lapped transform
A type of vector quantization
Instead of quantizing every scalar elements individually, group adjecent ones and quantize collectively
Problem: lost information + squandered range
All quantized regions are used
We have a function GGG to get k=G(g)k = G(g)k=G(g), k∈Nk \in \mathbb{N}k∈N
With kkk, we get a W={w∈ZN∣∑i=1N∣vi∣=k}W = \{\mathbf{w} \in \mathbb{Z}^N | \sum\limits_{i = 1}^N |\mathbf{v}_i| = k\}W={w∈ZN∣i=1∑N∣vi∣=k}
We then have a function QQQ, such that we can compute Q(w∣k)=qQ(\mathbf{w}|k) = \mathbf{q}Q(w∣k)=q, q∈W\mathbf{q} \in Wq∈W
Fischer, T.R. "A pyramid vector quantizer." IEEE Transactions on Information Theory. Issue 4 Volume 32 (1986): 568—583. Print.
Prediction
We compute a αu\alpha_uαu, αv\alpha_vαv, βu\beta_uβuβv\beta_vβv, by performing a linear regression on the U and V channels
We can then send αu\alpha_uαu, αv\alpha_vαv, βu\beta_uβuβv\beta_vβv, and the decoder should be able to infer the final colour
Algorithm on finding the direction
Enough fun; let's ask "to paint or not to paint"
w=min(1,αQ212σ2)w = \min(1, \alpha\frac{Q^2}{12\sigma^2})w=min(1,α12σ2Q2)
The end-result
https://people.xiph.org/~xiphmont/demo/daala/update1-tool2b.shtml