Digital Signal Processing Demystified – Who Am I? – What is DSP?



Digital Signal Processing Demystified – Who Am I? – What is DSP?

0 3


dsp-talk

Material for talk on digital signal processing

On Github boylea / dsp-talk

Digital Signal Processing Demystified

Amy Boyle @amylouboyle

Digital Signal Processing = DSP

  • Who am I?
  • What is DSP?
  • Sound
  • Images
  • Compression

We'll learn how to filter out unwanted sounds from audio files, and how to extract features from images.

Who Am I?

  • Software Engineer
  • Auditory Neuroscience Research

Why am I telling you this?

  • Signal "literacy"
  • Give you a starting point to dive further
I'll be using Python because ...

What is DSP?

El procesamiento digital de señales o DSP (sigla en inglés de digital signal processing) es la manipulación matemática de una señal de información para modificarla o mejorarla en algún sentido. Este está caracterizado por la representación en el dominio del tiempo discreto, en el dominio frecuencia discreta, u otro dominio discreto de señales por medio de una secuencia de números o símbolos y el procesado de esas señales.

If you try to Google DSP, you will be confronted with a barely comprehensible
Googling DSP problems can lead to the situation where authors are barfing math on you. There will be no equations http://25.media.tumblr.com/tumblr_lry7eidRC41qela2qo1_500.png
Take real-world continuous measurement. Turn it into a discrete series of data points. Use an algorithm to manipulate the data samples to do cool things.

Doing Stuff to Points

(of data)

What is a Signal?

A type of physical data that conveys information. A description of how one parameter depends on another

Sound

Light

Temperature

heart beats

hiking trail elevation

http://images6.fanpop.com/image/photos/36300000/German-Shepherds-image-german-shepherds-36311386-1997-1636.jpg http://www.sac.edu/AcademicProgs/ScienceMathHealth/Nursing/emergencymedicaltechnician/Documents/Heart%20Rate.jpg http://www.shyamal.com/Hikes/ElCajonMt/ElCajonMt_elev.jpg

A digital signal is a signal turned into values at points in time or space

In order to be able to hold the signal in computer memory we need to be able to discretize the signal.

Sampling gets values at regular intervals

* We need to pick a time or space interval a which to collect data * This will give us our points along the x-axis * E.g every ms, or mm

Quantization buckets values

* We need to put our sample point into a value bucket * E.g. if we are storing our data in an 8-bit int, we divide up the y-axis range and ask which value it's closest to
We have to do both
Having a low resolution of sampling or quantization can distort the signal
Anytime you are doing anything to that signal, that's DSP
  • Mastering all of DSP is an unreasonable goal
  • get an overview, and focus on an area of interest

All signals are just sin/cos waves

=
	t = np.arange(0.0, 1.0, 0.01)
	signal = np.zeros_like(t)
	for i in range(2, 100, 2):
	    signal = np.sin(i * 2 * t * np.pi) / i

All signals can be composed from scalings of sine waves This is a trivial example We can do this for *any* signal, if we know which waves to add together

Sound Example

Your browser does not support audio

Fourier Transform converts time series data into frequency information

  import numpy as np
  import matplotlib.pyplot as plt

  fs = 44100
  step_size = 1./44100
  t = np.arange(0.0, 1.0, step_size)
  twotone = np.sin(t * 750 * 2 * np.pi) + \
  		   np.sin(t * 1500 * 2 * np.pi)

  spectrum = np.fft.rfft(twotone)
  freqs = np.fft.rfftfreq(len(twotone), 1./fs)

  plt.plot(freqs, abs(spectrum))

Running the FFT on time chunks gives a spectrogram

  import numpy as np
  import matplotlib.pyplot as plt

  fs = 44100
  step_size = 1./44100
  t = np.arange(0.0, 1.0, step_size)
  twotone = np.sin(t * 750 * 2 * np.pi) + \
  		   np.sin(t * 1500 * 2 * np.pi)

  plt.specgram(twotone, Fs=fs, NFFT=512,
  			 noverlap=0, cmap='gnuplot')

Speech is more complex

Your browser does not support audio
	from scipy.io import wavfile
	from scipy import signal

	fs, audio = wavfile.read('audio/distorted.wav')

	passband = 10000. #Hz
	stopband= 11000.
	nyq = 0.5 * fs

	order, normalized_cutoff = signal.buttord(passband/nyq, stopband/nyq, 3, 40)
	b, a = signal.butter(order, normalized_cutoff, btype='low')

	low_passed = np.zeros_like(audio)

	low_passed[:,0] = signal.lfilter(b, a, audio[:,0])
	low_passed[:,1] = signal.lfilter(b, a, audio[:,1])

	wavfile.write('filtered.wav', fs, low_passed)

Filtered

Before
After
Your browser does not support audio

Inadequate sampling rate can cause aliasing

  fs = 500
  step_size = 1./fs
  t = np.arange(0.0, 1.0, step_size)
  freq0 = 120
  freq1 = 110
  twotone = np.sin(t * freq0 * 2 * np.pi) + \
  		  np.sin(t * freq1 * 2 * np.pi)

  spectrum = np.fft.rfft(twotone)
  freqs = np.fft.rfftfreq(len(twotone), 1./fs)

  ax, fig = plot(freqs, abs(spectrum), '
  			  Frequency (Hz)', 'Amplitude')

Inadequate sampling rate can cause aliasing

  fs = 50
  step_size = 1./fs
  t = np.arange(0.0, 1.0, step_size)
  freq0 = 120
  freq1 = 110
  twotone = np.sin(t * freq0 * 2 * np.pi) + \
  		  np.sin(t * freq1 * 2 * np.pi)

  spectrum = np.fft.rfft(twotone)
  freqs = np.fft.rfftfreq(len(twotone), 1./fs)

  ax, fig = plot(freqs, abs(spectrum), '
  			  Frequency (Hz)', 'Amplitude')

CD's are sampled at 44.1 kHz...

An Image is a 2D signal in the spatial domain

46 34 77 96 97 0 0 44 92 112 0 0 52 77 92 0 0 60 71 75 24 43 61 66 63
* Image is a 2D signal in the spatial domain

Brightness shifts pixel values up or down

Brightness shifts pixel values up or down

    image = plt.imread('img/ptown.jpg')
    imshape = image.shape
    flat_image = image.flatten()

    # increase brightness
    brighter = np.arange(256)
    brighter += 100
    brighter[brighter>255] = 255

    bright_adjusted = brighter[flat_image]
    bright_adjusted = bright_adjusted.reshape(*imshape)

    # decrease brightness
    darker = np.arange(256)
    darker -= 100
    darker[darker < 0] = 0

    bright_adjusted = darker[flat_image]
    bright_adjusted = bright_adjusted.reshape(*imshape)

Contrast stretches pixel values

Contrast stretches pixel values

    def adjust_contrast(image, factor, brightness):
        contrast_map = np.arange(256)
        contrast_map = (contrast_map * factor) - (brightness * (factor - 1))

        contrast_map[contrast_map > 255] = 255
        contrast_map[contrast_map < 0] = 0

        return contrast_map[image], contrast_map

    adjusted_image, contrast_map = adjust_contrast(flat_image, factor=3, brightness=128)

    adjusted_image, contrast_map = adjust_contrast(flat_image, factor=1./3, brightness=128)

Histogram equalization stretches & shifts pixel values according to concentration

This allows us to achieve a custom, non-linear mapping

Histogram Equalization

	bin_counts, bin_numbers = np.histogram(flat_image, bins=256, range=(0,256))
	

Histogram Equalization

	bin_counts, bin_numbers = np.histogram(flat_image, bins=256, range=(0,256))
	npx = len(flat_image)
	histo_map = (np.cumsum(bin_counts)/float(npx))*256
	
	import numpy as np
	import matplotlib.pyplot as plt

	image = plt.imread('img/ninja_bw.jpg')
	imshape = ninja_bw.shape
	flat_image = ninja_bw.flatten()

	bin_counts, bin_numbers = np.histogram(flat_image, bins=256,
										   range=(0,256))
	npx = len(flat_image)
	histo_map = (np.cumsum(bin_counts)/float(npx))*256

	contrast_adjusted = histo_map[flat_image]
	contrast_adjusted = contrast_adjusted.reshape(*imshape)
	plt.imshow(contrast_adjusted, cmap=plt.cm.Greys_r)

COLOR!
	from PIL import Image, ImageOps
	import matplotlib.pyplot as plt

	fig = plt.figure(figsize=(6,12))
	image = Image.open("img/foggy_headshot.jpg")
	plt.subplot(2,1,1)
	plt.imshow(image)

	equalized = ImageOps.equalize(image)
	plt.subplot(2,1,2)
	plt.imshow(equalized)

Compression with JPEG

Joint Photographic Experts Group * A compression algorithm, not actually an image format * Most common way to store image (according to wikipedia)

JPEG Compression Steps

Convert RGB to YCbCr Downsample DCT Quantize Encode

Convert RBG to YCbCr

	from PIL import Image
	pimage = Image.open(image_file)
	ycbcr = pimage.convert('YCbCr')
	image_array = np.array(ycbcr)

Downsample

Assign say 4 chroma pixels for each luminance

Divide image into 8x8 chunks

	for row in range(0, ylen, 8):
	    for col in range(0, xlen, 8):
	        chunk = image_array[row:row+8, col:col+8, 0]
	        ...

Shift pixel values to be about 0

Discrete Cosine Transform (DCT)

	from scipy.fftpack import dct

	dct_coefficents = dct(dct(shifted_grid, axis=0), axis=1)

	print dct_coefficents.astype(int)

zig zag image: https://loaighoraba.wordpress.com/2011/07/10/zigzag-scanning-source-code/ http://judge.u-aizu.ac.jp/onlinejudge/description.jsp?id=1007

Quantization

Divide each component in the frequency domain by a constant for that component. Round to the nearest integer

Apply Huffman/run-length encoding

http://www.ual.es/~vruiz/Docencia/Apuntes/Coding/Image/03-JPEG/index.html
Remember...
  • Doing Stuff to Points
  • Signals are composed of sine waves
  • Adequate sampling rate is important
  • Understand the underlying process

Presentation on amyboyle.ninja

Source at github.com/boylea/dsp-talk

@amylouboyle

Digital Signal Processing Demystified Amy Boyle @amylouboyle