Using Daala Intra Frames for Still Picture Coding

Get better quality, in fewer bytes

What Is Daala?

A video codec created to either match or exceed the quality of current-generation of video codecs, such a VP9 and HEVC. Currently an ongoing effort by Xiph and Mozilla to introduce a royalty-free video codec.

Under Shannon Entropy theory, the performance of lossless compression is bounded by the entropy of the input

Problem: Compressing Images

LZW, RLE, ZIP, etc. work great...

(Behind, you're seeing a 512 by 512 pixel image, not just a white background)

Image Size (bytes) BMP 786554 PNG 1529 JPG 3598

Except not so well for more complex images

Image Size (bytes) BMP 786554 PNG 476235 JPG 408932

Even worse for noise

Image Size (bytes) BMP 786554 PNG 788485 JPG 770446

Solution: Fool the Human

Humans are very forgiving for loss of some quality

https://youtu.be/Sp7HiqULakk

Let's Look At JPEG

What Does JPEG Do?

Spatial to frequency conversion
- similar to audio but for images, and instead of 1D, we use 2D
Fourier's Theorem
- “A wave is the sum of many sinusoidal waves.”
DFT is often used to extract the sine waves of a wave (FFT for the fast variant)
But we're going to use the DCT
- Benefit over DFT
  - no complex numbers involved
  - wave coefficient packed closer to lower frequencies

Before DCT

After DCT

The block prior to the DCT: spatial domain. the block after: frequency domain

N.B. The top-left pixel in the frequency domain is called DC, and the rest are called AC

Before Quantization

After Quantization

Quantization in a nutshell

Encoding f:A→Bf:A \to Bf:A→B

Decoding f:B→Cf:B \to Cf:B→C

Where BBB<>AAA

Where C⊆AC \subseteq AC⊆A

Run-Length-Encoding (RLE)

We Send This

Before Loss

After Loss

However, most modern encoders pretty much use a variation of JPEG

A contender: Daala

Lapped Transform

Before we go ahead and use the DCT, we apply a pre-filter

We then overlap the results of the pre-filter onto the DCT

Prefilter through lifting

Let's go back to the DCT

Instead of applying an O(nlog(n))O(nlog(n))O(nlog(n)) algorithm, use an O(1)O(1)O(1) approximation

We use lifting

Prefilter and the lifting cont'd

Paper on lifting

http://www.sfu.ca/~jiel/papers/c003-bindct-ieee.pdf

Same idea for the prefilter

Result

Paper regarding the lapped transform

http://thanglong.ece.jhu.edu/Tran/Pub/prepost.pdf

Gain-Shape Quantization

A type of vector quantization

Vector quantization

Instead of quantizing every scalar elements individually, group adjecent ones and quantize collectively

Problem: lost information + squandered range

All quantized regions are used

Gain-Shape Quantization

Given a block, we treat it like a vector
- Let's call it v\mathbf{v}v
- v∈RN\mathbf{v} \in \mathbb{R}^Nv∈RN
Given v\mathbf{v}v, we get
- The length (gain), which we will call ggg
- The direction (shape), which we will call w\mathbf{w}w
We then quantize ggg and w\mathbf{w}w

Pyramid Vector Quantization

Instead of look-up tables, arithmetically group vectors
Saves space

We have a function GGG to get k=G(g)k = G(g)k=G(g), k∈Nk \in \mathbb{N}k∈N

With kkk, we get a W={w∈ZN∣∑i=1N∣vi∣=k}W = \{\mathbf{w} \in \mathbb{Z}^N | \sum\limits_{i = 1}^N |\mathbf{v}_i| = k\}W={w∈ZN∣i=1∑N∣vi∣=k}

We then have a function QQQ, such that we can compute Q(w∣k)=qQ(\mathbf{w}|k) = \mathbf{q}Q(w∣k)=q, q∈W\mathbf{q} \in Wq∈W

Fischer, T.R. "A pyramid vector quantizer." IEEE Transactions on Information Theory. Issue 4 Volume 32 (1986): 568—583. Print.

RFC: https://tools.ietf.org/html/draft-valin-videocodec-pvq

Removing Pixels

Prediction

Compressing Colours

Humans don't notice colours as much
And even then, colours are 3D; you're practically sending the same image three times if you send RGB; better to distinguish chroma from luma
Use YUV

Chroma From Luma Prediction

We compute a αu\alpha_uαu, αv\alpha_vαv, βu\beta_uβuβv\beta_vβv, by performing a linear regression on the U and V channels

We can then send αu\alpha_uαu, αv\alpha_vαv, βu\beta_uβuβv\beta_vβv, and the decoder should be able to infer the final colour

DCu=αu+βuDCyDC_u = \alpha_u + \beta_uDC_yDCu=αu+βuDCy
ACu[x,y]=βuACy[x,y]AC_u[x, y] = \beta_uAC_y[x, y]ACu[x,y]=βuACy[x,y]
DCv=αv+βvDCyDC_v = \alpha_v + \beta_vDC_yDCv=αv+βvDCy
ACv[x,y]=βvACy[x,y]AC_v[x, y] = \beta_vAC_y[x, y]ACv[x,y]=βvACy[x,y]

Paint Deringing

Direction search
Boundary pixel
Painting

Algorithm on finding the direction

http://jmvalin.ca/notes/intra_paint.pdf

Some Fun

Enough fun; let's ask "to paint or not to paint"

w=min(1,αQ212σ2)w = \min(1, \alpha\frac{Q^2}{12\sigma^2})w=min(1,α12σ2Q2)

QQQ is the quantization amount (quality)
α\alphaα is a tunable value between 0 and 1
σ2\sigma^2σ2is the mean squared distance between decoded image and painted image

The end-result

https://people.xiph.org/~xiphmont/demo/daala/update1-tool2b.shtml

Using Daala Intra Frames for Still Picture Coding Get better quality, in fewer bytes

Using Daala Intra Frames for Still Picture Coding – Solution: Fool the Human – Let's Look At JPEG

shovon

Using Daala Intra Frames for Still Picture Coding – Solution: Fool the Human – Let's Look At JPEG

0 0

cmpt469daala

Using Daala Intra Frames for Still Picture Coding

What Is Daala?

Problem: Compressing Images

Solution: Fool the Human

Let's Look At JPEG

What Does JPEG Do?

Quantization in a nutshell

Run-Length-Encoding (RLE)

We Send This

A contender: Daala

Lapped Transform

Prefilter through lifting

Prefilter and the lifting cont'd

Gain-Shape Quantization

Vector quantization

Gain-Shape Quantization

Pyramid Vector Quantization

Removing Pixels

Compressing Colours

Chroma From Luma Prediction

Paint Deringing

Some Fun

Using Daala Intra Frames for Still Picture Coding – Solution: Fool the Human – Let's Look At JPEG

shovon

Using Daala Intra Frames for Still Picture Coding – Solution: Fool the Human – Let's Look At JPEG

0 0 (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/platform.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

cmpt469daala

Using Daala Intra Frames for Still Picture Coding

What Is Daala?

Problem: Compressing Images

Solution: Fool the Human

Let's Look At JPEG

What Does JPEG Do?

Quantization in a nutshell

Run-Length-Encoding (RLE)

We Send This

A contender: Daala

Lapped Transform

Prefilter through lifting

Prefilter and the lifting cont'd

Gain-Shape Quantization

Vector quantization

Gain-Shape Quantization

Pyramid Vector Quantization

Removing Pixels

Compressing Colours

Chroma From Luma Prediction

Paint Deringing

Some Fun

0 0