Surveillance Event Detection

Austin Blanton

Boston marathon bombings

Goals

Prevention Expedite apprehension process ("human in the loop")

Outline

Background Basics of pattern recognition A simple solution Future

1. Background

Automatic detection of observable events of interest in surveillance video

TRECVid

Large dataset (Gatwick airport)
Event annotations

Event detection is hard

Gatwick dataset: 240 GB
- Dimensionality reduction
Most pixels not useful
- Feature selection
Events are rare
- Good features and/or learning algorithm
Subtle semantics of human interaction
- ???

KTH Human Motion Dataset

Slightly over 1 GB
One event per video
All frames part of event
Human activity recognition is less hard
...but still hard

2. Basics of pattern recognition

A tiny bit of history

Chess is easy for computers to understand
- Deep Blue beat Garry Kasparov, world champion, in 1997
- Evaluated 200 million positions per second
Classify spam emails, music beat detection, face recognition less easy
Instead, learn from data

What is pattern recognition?

Use statistics and probability to detect patterns
Methods often invariant to domain
...but not really (informative features are a prereq)
Approximating the hidden mathematical model of complex domains

Example: text classification

Amazon wants to guess star ratings (1-5) based on review text
Simplest method: Bag of Words (BoW)
- Create dictionary of words in all documents
- Count number of occurrences of each word in each document
- Loses grammatical structure

document cats are cool War and Peace 5 400 10 Moby Dick 0 621 3

Ready to learn!

Bunch of labelled and unlabelled data
Use labelled data to train model
- Naive Bayes popular for text classification
Predict labels

Evaluation and Overfitting

Evaluate
- Split labelled data into training and test sets
Overfitting
- Model too specific to training data
- Does not handle unseen data - generlization error
- Cross validation can help

Example 2: text clustering

Amazon wants to automatically choose novel genre
Unsupervised (no labels)
Divide into a number of clusters based on BoW input

3. A simple solution

SIFT (Scale Invariant Feature Transform)

Interest points
Keypoint descriptors

MoSIFT (Motion SIFT)

SIFT points must have optical flow
Append motion descriptor to SIFT descriptor

Optical flow

MoSIFT points

MoSIFT accuracy

Method Accuracy MoSIFT + SVM 95.83%

Foreground/background segmentation

Gaussian Mixture Model

Bag of Visual Words (BoVW)

Cluster interest points
Frequency count of each cluster in video
Loses spatial and temporal information

4. Future

How do humans do it?

Johansson (1973)
LEDs on key points of human body
Humans recognize human actions from motion ofLEDs

Biologically inspired approach

Detect human Pose estimation Track joints Compute action descriptor (over lifetime of action)

Or maybe not

Recent approaches
- Good accuracy
- Hard datasets
- Not human specific

Speaking of hard datasets...

Turns out KTH isn't very hard
Other new challenging datasets
- More action classes
- Challenging backgrounds
- Varying viewpoints
- BIGGER

Surveillance Event Detection – 1. Background – 2. Basics of pattern recognition

imaus10

Surveillance Event Detection – 1. Background – 2. Basics of pattern recognition

0 0

sed-pres

Surveillance Event Detection

Boston marathon bombings

Goals

Outline

1. Background

TRECVid

Event detection is hard

KTH Human Motion Dataset

2. Basics of pattern recognition

A tiny bit of history

What is pattern recognition?

Example: text classification

Ready to learn!

Evaluation and Overfitting

Example 2: text clustering

3. A simple solution

SIFT (Scale Invariant Feature Transform)

MoSIFT (Motion SIFT)

Optical flow

MoSIFT points

MoSIFT accuracy

Foreground/background segmentation

Bag of Visual Words (BoVW)

4. Future

How do humans do it?

Biologically inspired approach

Or maybe not

Speaking of hard datasets...

Thanks!

Surveillance Event Detection – 1. Background – 2. Basics of pattern recognition

imaus10

Surveillance Event Detection – 1. Background – 2. Basics of pattern recognition

0 0 (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/platform.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

sed-pres

Surveillance Event Detection

Boston marathon bombings

Goals

Outline

1. Background

TRECVid

Event detection is hard

KTH Human Motion Dataset

2. Basics of pattern recognition

A tiny bit of history

What is pattern recognition?

Example: text classification

Ready to learn!

Evaluation and Overfitting

Example 2: text clustering

3. A simple solution

SIFT (Scale Invariant Feature Transform)

MoSIFT (Motion SIFT)

Optical flow

MoSIFT points

MoSIFT accuracy

Foreground/background segmentation

Bag of Visual Words (BoVW)

4. Future

How do humans do it?

Biologically inspired approach

Or maybe not

Speaking of hard datasets...

Thanks!

0 0