Trend Estimation in Time Series Signals



Trend Estimation in Time Series Signals

5 16


pydata-seattle-2015

Pydata Seattle 2015 Trend Estimation in Time Series Signals Deck + Notebooks

On Github bugra / pydata-seattle-2015

Trend Estimation in Time Series Signals

Hi!

Bugra Akyildiz

Data Scientist at Axial

@bugraa

Machine Learning Newsletter | mln.io

bugra@nyu.edu
http://bit.ly/pydata-seattle-2015

Axial

A network that brings private companies with investors together

Enables business owners access to private capital markets

We are hiring! | axial.net

Trend Estimation

Family of methods to be able to detect and predict tendencies and regularities in time series signals

  • Depends on problem and domain
  • Medium to Long Term Trend
  • Mitigates seasonality(cycles) from data

Why?

  • Trends are very interpretable
  • Trends are easy to deal with when original signal is not very useful for processing

Trend Estimation Methods

  • Moving average filtering
  • Exponential Weighted Moving Average (EWMA)
  • Median filtering
  • Bandpass filtering
  • Hodrick Prescott Filter
  • l1 trend filtering

Data

  • The S&P 500, or the Standard & Poor's 500, is an American stock market index based on the market capitalizations of 500 large companies having common stock listed on the NYSE or NASDAQ.
  • The S&P 500 index components and their weightings are determined by S&P Dow Jones Indices.
  • The National Bureau of Economic Research has classified common stocks as a leading indicator of business cycles.
  • We will come to these cycles later.
import pandas as pd
df = pd.read_csv(_SNP_500_PATH, parse_dates=['Date'])
df = df.sort(['Date'])

SNP 500 Data

Moving Average Filtering

  • Average the signal over a window

y(t)=w2∑i=−w2x(t+i)w

In Python

import pandas as pd
window = 11
averaged_signal = pd.rolling_mean(df.Close, window)

Good to Know

  • Linear
  • Not really a trend estimation method, but provides baseline
  • If the window size is small, it removes high volatility part in the signal
  • If the window size is large, it exposes the long-term trend
  • Not robust to outliers and abrupt changes for small and medium window sizes

Median Filtering

y(t)=median{x[t−w2,t+w2]} where w is the window size whose median will replace the original data point

In Python

from scipy import signal as sp_signal
window = 11
median_filtered_signal = sp_signal.medfilt(df.Close, window)

Good to Know

  • Nonlinear
  • Very robust to noise
  • If the window size is very large, it could shadow mid-term change
  • Trend signal may not be smooth(actually rarely is in practice)

EWMA

In Python

import pandas as pd
span = 20
ewma_signal = pd.stats.moments.ewma(df.Close, span=span)

Good to Know

  • Linear
  • Could provide a better estimate than a simple moving average because the weights are better distributed
  • Not robust to outliers and abrupt changes
  • Very flexible in terms of weights and puts more emphasis on the spatial window in the signal

Bandpass Filtering

It filters based on frequency response of the signal. It attenuates very low range (long term) and very high frequency(short-term, volatility) and exposes mid-term trend in the signal.

In Python

## Filter Construction
filter_order  = 2
low_cutoff_frequency = 0.001
high_cutoff_frequency = 0.15
b, a = sp_signal.butter(filter_order, [low_cutoff_frequency, high_cutoff_frequency],
                        btype='bandpass', output='ba')
bandpass_filtered = sp_signal.filtfilt(b, a, df.Close.values)

Good to Know

  • Allow certain frequencies of the signal(between low cutoff frequency and high cutoff frequency) and attenuates the other frequencies.
  • This provides a flexible way to remove/attenuate low frequency(very long term) and high frequency(short-term) in the signal.
  • Could prepare different filters to stop a particular band as well(called band-stop filter).
  • Similar to Hodrick-Prescott Filter, it extracts mid-term trend by removing very small changes(bias) and extracting short-term changes(cycle).

Hodrick-Prescott(HP) Filter

  • Decomposes the time-series signal into a trend xt (mid-term growth) and a cyclical component(recurring and seasonal signal) ct.

yt=xt+ct

HP Minimization Function

Good to Know

  • Linear
  • Decomposes the signal into two distinct components(trend and cycle)
  • Cycle part => short term, season
  • Trend part => medium to long term
  • With changing regularizer, smoothing can be adjusted in the signal
  • Bandpass filter is at its heart
  • Perfect for signals that show seasonality
  • Yields good results when noise is normally distributed

In Python

import statsmodels.api as sm
lamb = 10 # Regularizer, lambda
snp_cycle, snp_trend = sm.tsa.filters.hpfilter(df.Close, lamb=lamb)

l1 Trend Filtering

Explanation: Instead of minimizing the mean squared error in HP minimization function, what if we minimize by l1 error? We could get a very robust way to measure trend in the signal.

  • Optimization function: 12∥x−y∥22+λ∥Dx∥1 where x,y∈Rn and D is the second order difference matrix

Good to Know

  • Nonlinear
  • Trend is piecewise linear, generally very smooth
  • The kinks, or changes in slope of the estimated trend show abrupt events
  • Changes in trend could be used for outlier detection
  • Computationally a little bit expensive.
  • Yields good results when noise is exponentially distributed

Get the library

# See the source code: https://github.com/bugra/l1
# PRs are more than welcome!
git clone https://github.com/bugra/l1
cd l1
python setup.py install

In Python

from l1 import l1 # Get the library from: https://github.com/bugra/l1
regularizer = 1
l1_trend = l1(df.Close.values, regularizer)

Questions?