Surveillance Event Detection – 1. Background – 2. Basics of pattern recognition



Surveillance Event Detection – 1. Background – 2. Basics of pattern recognition

0 0


sed-pres

Surveillance event detection presentation

On Github imaus10 / sed-pres

Surveillance Event Detection

Austin Blanton

Boston marathon bombings

Goals

Prevention Expedite apprehension process ("human in the loop")

Outline

Background Basics of pattern recognition A simple solution Future

1. Background

Automatic detection of observable events of interest in surveillance video

TRECVid

  • Large dataset (Gatwick airport)
  • Event annotations

Event detection is hard

  • Gatwick dataset: 240 GB
    • Dimensionality reduction
  • Most pixels not useful
    • Feature selection
  • Events are rare
    • Good features and/or learning algorithm
  • Subtle semantics of human interaction
    • ???

KTH Human Motion Dataset

  • Slightly over 1 GB
  • One event per video
  • All frames part of event
  • Human activity recognition is less hard
  • ...but still hard

2. Basics of pattern recognition

A tiny bit of history

  • Chess is easy for computers to understand
    • Deep Blue beat Garry Kasparov, world champion, in 1997
    • Evaluated 200 million positions per second
  • Classify spam emails, music beat detection, face recognition less easy
  • Instead, learn from data

What is pattern recognition?

  • Use statistics and probability to detect patterns
  • Methods often invariant to domain
  • ...but not really (informative features are a prereq)
  • Approximating the hidden mathematical model of complex domains

Example: text classification

  • Amazon wants to guess star ratings (1-5) based on review text
  • Simplest method: Bag of Words (BoW)
    • Create dictionary of words in all documents
    • Count number of occurrences of each word in each document
    • Loses grammatical structure
document cats are cool War and Peace 5 400 10 Moby Dick 0 621 3

Ready to learn!

  • Bunch of labelled and unlabelled data
  • Use labelled data to train model
    • Naive Bayes popular for text classification
  • Predict labels

Evaluation and Overfitting

  • Evaluate
    • Split labelled data into training and test sets
  • Overfitting
    • Model too specific to training data
    • Does not handle unseen data - generlization error
    • Cross validation can help

Example 2: text clustering

  • Amazon wants to automatically choose novel genre
  • Unsupervised (no labels)
  • Divide into a number of clusters based on BoW input

3. A simple solution

SIFT (Scale Invariant Feature Transform)

  • Interest points
  • Keypoint descriptors

MoSIFT (Motion SIFT)

  • SIFT points must have optical flow
  • Append motion descriptor to SIFT descriptor

Optical flow

MoSIFT points

MoSIFT accuracy

Method Accuracy MoSIFT + SVM 95.83%

Foreground/background segmentation

  • Gaussian Mixture Model

Bag of Visual Words (BoVW)

  • Cluster interest points
  • Frequency count of each cluster in video
  • Loses spatial and temporal information

4. Future

How do humans do it?

  • Johansson (1973)
  • LEDs on key points of human body
  • Humans recognize human actions from motion ofLEDs

Biologically inspired approach

Detect human Pose estimation Track joints Compute action descriptor (over lifetime of action)

Or maybe not

  • Recent approaches
    • Good accuracy
    • Hard datasets
    • Not human specific

Speaking of hard datasets...

  • Turns out KTH isn't very hard
  • Other new challenging datasets
    • More action classes
    • Challenging backgrounds
    • Varying viewpoints
    • BIGGER

Thanks!