Using Daala Intra Frames for Still Picture Coding – Solution: Fool the Human – Let's Look At JPEG



Using Daala Intra Frames for Still Picture Coding – Solution: Fool the Human – Let's Look At JPEG

0 0


cmpt469daala

Presentation about Daala for CMPT 469

On Github shovon / cmpt469daala

Using Daala Intra Frames for Still Picture Coding

Get better quality, in fewer bytes

What Is Daala?

A video codec created to either match or exceed the quality of current-generation of video codecs, such a VP9 and HEVC. Currently an ongoing effort by Xiph and Mozilla to introduce a royalty-free video codec.

Under Shannon Entropy theory, the performance of lossless compression is bounded by the entropy of the input

Problem: Compressing Images

LZW, RLE, ZIP, etc. work great...

(Behind, you're seeing a 512 by 512 pixel image, not just a white background)

Image Size (bytes) BMP 786554 PNG 1529 JPG 3598

Except not so well for more complex images

Image Size (bytes) BMP 786554 PNG 476235 JPG 408932

Even worse for noise

Image Size (bytes) BMP 786554 PNG 788485 JPG 770446

Solution: Fool the Human

Humans are very forgiving for loss of some quality

Let's Look At JPEG

What Does JPEG Do?

  • Spatial to frequency conversion
    • similar to audio but for images, and instead of 1D, we use 2D
  • Fourier's Theorem
    • “A wave is the sum of many sinusoidal waves.”
  • DFT is often used to extract the sine waves of a wave (FFT for the fast variant)
  • But we're going to use the DCT
    • Benefit over DFT
      • no complex numbers involved
      • wave coefficient packed closer to lower frequencies

Before DCT

After DCT

The block prior to the DCT: spatial domain. the block after: frequency domain

N.B. The top-left pixel in the frequency domain is called DC, and the rest are called AC

Before Quantization

After Quantization

Quantization in a nutshell

Encoding f:A→Bf:A \to Bf:A→B

Decoding f:B→Cf:B \to Cf:B→C

Where BBB<>AAA

Where C⊆AC \subseteq AC⊆A

Run-Length-Encoding (RLE)

We Send This

Before Loss

After Loss

However, most modern encoders pretty much use a variation of JPEG

A contender: Daala

Lapped Transform

Before we go ahead and use the DCT, we apply a pre-filter

We then overlap the results of the pre-filter onto the DCT

Prefilter through lifting

Let's go back to the DCT

Instead of applying an O(nlog(n))O(nlog(n))O(nlog(n)) algorithm, use an O(1)O(1)O(1) approximation

We use lifting

Prefilter and the lifting cont'd

Paper on lifting

http://www.sfu.ca/~jiel/papers/c003-bindct-ieee.pdf

Same idea for the prefilter

Result

Paper regarding the lapped transform

http://thanglong.ece.jhu.edu/Tran/Pub/prepost.pdf

Gain-Shape Quantization

A type of vector quantization

Vector quantization

Instead of quantizing every scalar elements individually, group adjecent ones and quantize collectively

Problem: lost information + squandered range

All quantized regions are used

Gain-Shape Quantization

  • Given a block, we treat it like a vector
    • Let's call it v\mathbf{v}v
    • v∈RN\mathbf{v} \in \mathbb{R}^Nv∈R​N​​
  • Given v\mathbf{v}v, we get
    • The length (gain), which we will call ggg
    • The direction (shape), which we will call w\mathbf{w}w
  • We then quantize ggg and w\mathbf{w}w

Pyramid Vector Quantization

  • Instead of look-up tables, arithmetically group vectors
  • Saves space

We have a function GGG to get k=G(g)k = G(g)k=G(g), k∈Nk \in \mathbb{N}k∈N

With kkk, we get a W={w∈ZN∣∑i=1N∣vi∣=k}W = \{\mathbf{w} \in \mathbb{Z}^N | \sum\limits_{i = 1}^N |\mathbf{v}_i| = k\}W={w∈Z​N​​∣​i=1​∑​N​​∣v​i​​∣=k}

We then have a function QQQ, such that we can compute Q(w∣k)=qQ(\mathbf{w}|k) = \mathbf{q}Q(w∣k)=q, q∈W\mathbf{q} \in Wq∈W

Fischer, T.R. "A pyramid vector quantizer." IEEE Transactions on Information Theory. Issue 4 Volume 32 (1986): 568—583. Print.

RFC: https://tools.ietf.org/html/draft-valin-videocodec-pvq

Removing Pixels

Prediction

Compressing Colours

  • Humans don't notice colours as much
  • And even then, colours are 3D; you're practically sending the same image three times if you send RGB; better to distinguish chroma from luma
  • Use YUV

Chroma From Luma Prediction

We compute a αu\alpha_uα​u​​, αv\alpha_vα​v​​, βu\beta_uβ​u​​βv\beta_vβ​v​​, by performing a linear regression on the U and V channels

We can then send αu\alpha_uα​u​​, αv\alpha_vα​v​​, βu\beta_uβ​u​​βv\beta_vβ​v​​, and the decoder should be able to infer the final colour

  • DCu=αu+βuDCyDC_u = \alpha_u + \beta_uDC_yDC​u​​=α​u​​+β​u​​DC​y​​
  • ACu[x,y]=βuACy[x,y]AC_u[x, y] = \beta_uAC_y[x, y]AC​u​​[x,y]=β​u​​AC​y​​[x,y]
  • DCv=αv+βvDCyDC_v = \alpha_v + \beta_vDC_yDC​v​​=α​v​​+β​v​​DC​y​​
  • ACv[x,y]=βvACy[x,y]AC_v[x, y] = \beta_vAC_y[x, y]AC​v​​[x,y]=β​v​​AC​y​​[x,y]

Paint Deringing

  • Direction search
  • Boundary pixel
  • Painting

Algorithm on finding the direction

http://jmvalin.ca/notes/intra_paint.pdf

Some Fun

Enough fun; let's ask "to paint or not to paint"

w=min(1,αQ212σ2)w = \min(1, \alpha\frac{Q^2}{12\sigma^2})w=min(1,α​12σ​2​​​​Q​2​​​​)

  • QQQ is the quantization amount (quality)
  • α\alphaα is a tunable value between 0 and 1
  • σ2\sigma^2σ​2​​is the mean squared distance between decoded image and painted image

The end-result

https://people.xiph.org/~xiphmont/demo/daala/update1-tool2b.shtml
Using Daala Intra Frames for Still Picture Coding Get better quality, in fewer bytes