Trend Estimation in Time Series Signals
Hi!
Bugra Akyildiz
Data Scientist at Axial
@bugraa
Machine Learning Newsletter | mln.io
bugra@nyu.edu
http://bit.ly/pydata-seattle-2015
Axial
A network that brings private companies with investors together
Enables business owners access to private capital markets
We are hiring! | axial.net
Trend Estimation
Family of methods to be able to detect and predict tendencies and regularities in time series signals
- Depends on problem and domain
- Medium to Long Term Trend
- Mitigates seasonality(cycles) from data
Why?
- Trends are very interpretable
- Trends are easy to deal with when original signal is not very useful for processing
Trend Estimation Methods
- Moving average filtering
- Exponential Weighted Moving Average (EWMA)
- Median filtering
- Bandpass filtering
- Hodrick Prescott Filter
-
l1 trend filtering
Data
- The S&P 500, or the Standard & Poor's 500, is an American stock market index based on the market capitalizations of 500 large companies having common stock listed on the NYSE or NASDAQ.
- The S&P 500 index components and their weightings are determined by S&P Dow Jones Indices.
- The National Bureau of Economic Research has classified common stocks as a leading indicator of business cycles.
- We will come to these cycles later.
import pandas as pd
df = pd.read_csv(_SNP_500_PATH, parse_dates=['Date'])
df = df.sort(['Date'])
Moving Average Filtering
- Average the signal over a window
y(t)=w2∑i=−w2x(t+i)w
In Python
import pandas as pd
window = 11
averaged_signal = pd.rolling_mean(df.Close, window)
Good to Know
- Linear
- Not really a trend estimation method, but provides baseline
- If the window size is small, it removes high volatility part in the signal
- If the window size is large, it exposes the long-term trend
- Not robust to outliers and abrupt changes for small and medium window sizes
Median Filtering
y(t)=median{x[t−w2,t+w2]}
where w is the window size whose median will replace the original data point
In Python
from scipy import signal as sp_signal
window = 11
median_filtered_signal = sp_signal.medfilt(df.Close, window)
Good to Know
- Nonlinear
- Very robust to noise
- If the window size is very large, it could shadow mid-term change
- Trend signal may not be smooth(actually rarely is in practice)
In Python
import pandas as pd
span = 20
ewma_signal = pd.stats.moments.ewma(df.Close, span=span)
Good to Know
- Linear
- Could provide a better estimate than a simple moving average because the weights
are better distributed
- Not robust to outliers and abrupt changes
- Very flexible in terms of weights and puts more emphasis on the spatial window
in the signal
Bandpass Filtering
It filters based on frequency response of the signal. It attenuates very low range
(long term) and very high frequency(short-term, volatility) and exposes mid-term
trend in the signal.
In Python
## Filter Construction
filter_order = 2
low_cutoff_frequency = 0.001
high_cutoff_frequency = 0.15
b, a = sp_signal.butter(filter_order, [low_cutoff_frequency, high_cutoff_frequency],
btype='bandpass', output='ba')
bandpass_filtered = sp_signal.filtfilt(b, a, df.Close.values)
Good to Know
- Allow certain frequencies of the signal(between low cutoff frequency and high cutoff frequency) and attenuates the other frequencies.
- This provides a flexible way to remove/attenuate low frequency(very long term) and high frequency(short-term) in the signal.
- Could prepare different filters to stop a particular band as well(called band-stop filter).
- Similar to Hodrick-Prescott Filter, it extracts mid-term trend by removing very small changes(bias) and extracting short-term changes(cycle).
Hodrick-Prescott(HP) Filter
- Decomposes the time-series signal into a trend xt (mid-term growth) and a
cyclical component(recurring and seasonal signal) ct.
yt=xt+ct
Good to Know
- Linear
- Decomposes the signal into two distinct components(trend and cycle)
- Cycle part => short term, season
- Trend part => medium to long term
- With changing regularizer, smoothing can be adjusted in the signal
- Bandpass filter is at its heart
- Perfect for signals that show seasonality
- Yields good results when noise is normally distributed
In Python
import statsmodels.api as sm
lamb = 10 # Regularizer, lambda
snp_cycle, snp_trend = sm.tsa.filters.hpfilter(df.Close, lamb=lamb)
l1 Trend Filtering
Explanation: Instead of minimizing the mean squared error in HP minimization
function, what if we minimize by l1 error? We could get a very robust way
to measure trend in the signal.
- Optimization function:
12∥x−y∥22+λ∥Dx∥1
where x,y∈Rn and D is the second order difference matrix
Good to Know
- Nonlinear
- Trend is piecewise linear, generally very smooth
- The kinks, or changes in slope of the estimated trend show abrupt events
- Changes in trend could be used for outlier detection
- Computationally a little bit expensive.
- Yields good results when noise is exponentially distributed
Get the library
# See the source code: https://github.com/bugra/l1
# PRs are more than welcome!
git clone https://github.com/bugra/l1
cd l1
python setup.py install
In Python
from l1 import l1 # Get the library from: https://github.com/bugra/l1
regularizer = 1
l1_trend = l1(df.Close.values, regularizer)