Ensemble methods

for data assimilation

27/11/2014 Sylvain Robert

thank you for coming
give you an idea of what my research is about
what kind of problems and solutions
first: motivating application

Convective storms forecasting

heavy rains, hail, flash flooding, etc.

where and when: short-term
dramatic consequences: heavy rainfalls, flash flooding, tornadoes (not here)
difficult problem!
Where do I come into play? good forecast <- good initial conditions = good DA

Goal: predicting convective scale storms

Very important as these events have potentially huge negative impact, such as:

heavy rainfall
hail (vineyard)
tornadoes (not so much here)
strong winds
flash flooding

short-term prediction: between 1 and 4 hours

Good initial condition = Good forecast
Good data assimilation (DA) = Good initial condition

What is Data Assimilation (DA)?

X_tXt state of the atmosphere at time tt

Y_tYt observation at time tt

discretization of the 3D domain: the atmosphere
Horizontal grid and vertical (pressure)
each cell is itself multidimensional: T, P, etc.
Final state: all that stack into a giant vector
Dynamic: propagate Xt in time with model
best Xt given model and knowledge of how Yt are produced and history of Yt

What is DA for statisticians?

State-space model: Filtering

P(X_t|Y_{1:t}) = ??P(Xt|Y1:t)=??

how do we model that as statisticans?
general state-space model
The true state XX is assumed to be an unobserved Markov process and the measurements Y_tYt are a noisy function of the state
Y_tYt depends on the state only through X_tXt.
State of Xt given history of observations: filtering

Bayes Filter

Recursive solution

Kalman Filter

Exact solution when: \begin{align*} X_{t} &= A X_{t-1} + \epsilon_t \\\ Y_t &= H X_t + \eta_t \end{align*}

There is an exact solution when everything is linear and Gaussian: the Kalman Filter (invented here at ethz)

,,,

There is an exact solution when everything is linear and Gaussian: the Kalman Filter (invented here at ethz)

,pnotes

Bayes Filter

More detail:

\begin{align*} P(X_t | Y_{1:t}) &\propto P(Y_t | X_t) P( X_t | Y_{1:t-1} ) \\\ &\propto P(Y_t | X_t) P( X_{t-1} | Y_{1:t-1}) P(X_t | X_{t-1}) \end{align*}

Forecast

X^f_{t} \sim P(X^u_t | X_{t-1})

Update

X^u_{t} \sim P(X_t | X^f_{t}, Y_t)

Kalman Filter

Linear + Gaussian

Kalman (wikipedia)

Exact solution when: \begin{align*} X_{t} &= A X_{t-1} + \epsilon_t \\\ Y_t &= H X_t + \eta_t \end{align*}

Particle Filter

Non-linear + non-Gaussian

Monte-Carlo approximation with: \begin{align*} X_{t} &= f ( X_{t-1} , \epsilon_t) \\\ Y_t &= h(X_t , \eta_t ) \end{align*}

Large-scale applications?

global atmosphere dynamics
ocean dynamics
reservoir modelling

Here KF and PF fail, so new ensemble based methods have been developed.

Ensemble Kalman Filter

Kalman Filter + Monte-Carlo

developed by Evensen in 1994 for ocean dynamics DA
widely used for large-scale environmental applications
Ensemble (sample size of ~100)
low-rank representation of the covariance

Monte Carlo approximation of the Kalman Filter. Propagate particles using the physical model of the atmosphere (no linearization or any higher-order approximation needed). To do the update: use the sample covariance (low rank representation, as sample size << dimension of X). Draw a sample from the Normal distribution with updated mean and covariance. Stochastic sample or deterministic (with a transform ensuring that the data have exactly the given sample covariance): this is useful as the sample size is very small and Monte Carlo error can be detrimental.

,,,

developed by Evensen in 1994 for ocean dynamics DA
widely used for large-scale environmental applications
Ensemble (sample size of ~100)
low-rank representation of the covariance

,pnotes

Ensemble Kalman Filter

\mu^p = \text{sample mean}\\ P^p = \text{sample covariance (tapered)} \\ K = PH'(HPH' + R)^{-1}\\ \mu^u = \mu^p + K (y - H\mu^p)\\ P^u = (I - KH)P^p

And the update of particle j can be done as:

X^{u, (j)} = X^{p, (j)} + K (y + \epsilon^{(j)} - HX^{p, (j)})\\ \text{where } \epsilon \sim \mathcal{N}(0, R)

What about the Particle Filter?

Curse of dimensionality

PF is great, as it can handle non-linear system with non-Gaussian noise. Unfortunately, the number of particles needed for a good approximation grows exponentially with the dimension of the state-space. For medium-scale applications it is already impossible...

Ensemble Kalman Particle Filter

(Frei & Künsch 2013)

EnKF update + PF correction
progressive correction: split likelihood
Idea: First draw the particles toward the observations, but not as much as in the EnKF case. Then resample particles based on how well they fit the data
avoid sample degeneracy, as particles are drawn in the region of higher density.
keep more non-Gaussian features, as the resampling step doesn't depend on normal assumptions
the first update can be done analytically, which allows for an efficient algorithm. Furthermore it avoids to have ties (particles with exactly the same value)

,,,

EnKF update + PF correction
progressive correction: split likelihood
Idea: First draw the particles toward the observations, but not as much as in the EnKF case. Then resample particles based on how well they fit the data
avoid sample degeneracy, as particles are drawn in the region of higher density.
keep more non-Gaussian features, as the resampling step doesn't depend on normal assumptions
the first update can be done analytically, which allows for an efficient algorithm. Furthermore it avoids to have ties (particles with exactly the same value)

,pnotes

Ensemble Kalman Particle Filter

The update distribution is decomposed as:

\pi^u(dx) \propto l(x | y)^{1-\gamma} \pi^{u,\gamma} (dx), \quad \pi^{u,\gamma} (dx) \propto l(x | y)^{\gamma} \pi^p(dx)

two-stage procedure:

\pi^p \xrightarrow[]{\text{EnKF}} \pi^{u,\gamma} \xrightarrow[]{\text{PF}} \pi^u
First update: draw the particles toward the observations
Second update: resample according to likelihood
The update is done analytically and sampling is postponed until the very end

Very large-scale applications?

dim of X \approx 10^8 - very short-term

convective scale
high-resolution
very short term = very fast computation (3-4 hrs forecast)

Localization

dim of X \approx 10^3 - parallel computation

by going small scale ;)
update every grid-point independently (in parallel)
use only local observations (dim much smaller)
number of particles/dimension is much higher!
efficient computation

The algorithm to beat: LETKF

Localized Ensemble Transform Kalman Filter

Mike Tyson (source)

Super-efficient
Computation in parallel
High-resolution
Proven track record

State-of-the-art method
Transform because it uses a special sqrt scheme (not stochastic)

How can we localize the EnKPF?

Need to localize algorithm
not so straightforward because of the resampling step
We developed 2 new algorithms to do so

Problem with localization

LETKF vs. PF

very schematic (not litterally correct)
local update (in the red circle)
black line is pulled toward the observations
With PF, a new particle is sampled locally => discontinuity

Localized EnKPF (1)

LEnKPF

Pros: straighforward + no discontinuities
Cons: not fully local

\pi^p \xrightarrow[]{\text{Local EnKF}} \pi^{u,\gamma} \xrightarrow[]{\text{Global PF}} \pi^u

Localized EnKF + Global PF
Locally adjust the particles
resample at a global level

Localized EnKPF (2)

Kriging EnKPF (KEnKPF)

Pro: fully localized
Con: more difficult to implement

local EnKPF update
Kriging sim in a buffer zone around
no discontinuities
update blocks in parallel (coloring)

KF, PF, EnKF, EnKPF, LETKF, ETC?

Model hierarchy

zoo of methods
x: small/medium/large scale
y: how much assumption-free
(KF vs. PF) => EnKF and EnKPF
localization: order arbitrary
localized PF? some things going on

Application

Modified SWEQ (Würsch & Craig 2014)

Simplified cloud convection in 1D (dimension: 1000)

medium scale toy model ~ convection
x: location (along a line): dx=500m
cloud => rain, + wind (bottom)
red dots: observations
dynamics from SWEQ
random plumes in wind

,,,

medium scale toy model ~ convection
x: location (along a line): dx=500m
cloud => rain, + wind (bottom)
red dots: observations
dynamics from SWEQ
random plumes in wind

,pnotes

Artificial observations

Designed to mimic radar images

pressure: unobserved
rain + noise: everywhere with threshold
rain noise is skewed: (\sqrt{r} + \epsilon)^2
wind: doppler effect on raindrops (with Gaussian noise)

Application

Modified SWEQ (Würsch & Craig 2014)

Filtering with KEnKPF (ensemble size of 50)

Results: RMSE

root mean-square error of the ensemble median

assimilation for 1h + 1h forecast
RMSE of h,r,u
Ensemble size: 50, nreps: 100

Results: CRPS

maximize sharpness subject to calibration

Calibration refers to the statistical consistency between the distributional forecasts and the observations and is a joint property of the predictions and the events that materialize. Sharpness refers to the concentration of the predictive distributions and is a property of the forecasts only.

,,,

,pnotes

CRPS

Calibration: consistency beteween forecast and observations
Sharpness: concentration of the predictive distribution
Distance between the cdf of the forecast and the observation
Units as original variable

crps(F, x) = \int_{-\infty}^{+\infty} \{ F(y) - 1(y\geq x) \}^2 dy

taken from eumetcal.org

Results: SAL scores

Structure-Amplitude-Location (Wernli & al. 2008)

Next stage?

COSMO case-study (Lange & Craig 2014)

Idealized setup of a convective situation

dimension of X: 10^8

taken from Lange & Craig 2014

almost real size
storm over jura in July 2007
no topography
400km x 400km x50layers x 8vars, 2km resolution
MeteoCH and DWD

Challenges

Efficient implementation: parallel
Scalability of the method

Albis and Lema (CSCS)

efficient+fast
super-computer: in parallel (see pics)
will I be able to implement them?
if yes, will they work at this scale?

Open questions

to explore

Localized resampling with dependencies (MRF, coupling)
Non-Gaussian observations
Parameter estimation
Model error
High-dimensional covariance estimation
...

Thank you

This presentation can be viewed online at http://robertsy.github.com/ensemble-da

Ensemble methods – for data assimilation – Bayes Filter

robertsy

Ensemble methods – for data assimilation – Bayes Filter

0 0 (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/platform.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

ensemble-da

Ensemble methods

for data assimilation

Convective storms forecasting

heavy rains, hail, flash flooding, etc.

Goal: predicting convective scale storms

short-term prediction: between 1 and 4 hours

What is Data Assimilation (DA)?

What is DA for statisticians?

State-space model: Filtering

P(X_t|Y_{1:t}) = ??P(Xt|Y1:t)=??

Bayes Filter

Recursive solution

Kalman Filter

Bayes Filter

More detail:

Forecast

Update

Kalman Filter

Linear + Gaussian

Particle Filter

Non-linear + non-Gaussian

Large-scale applications?

Ensemble Kalman Filter

Kalman Filter + Monte-Carlo

Ensemble Kalman Filter

What about the Particle Filter?

Curse of dimensionality

Ensemble Kalman Particle Filter

(Frei & Künsch 2013)

Ensemble Kalman Particle Filter

Very large-scale applications?

Localization

The algorithm to beat: LETKF

Localized Ensemble Transform Kalman Filter

How can we localize the EnKPF?

Problem with localization

LETKF vs. PF

Localized EnKPF (1)

LEnKPF

Localized EnKPF (2)

Kriging EnKPF (KEnKPF)

KF, PF, EnKF, EnKPF, LETKF, ETC?

Model hierarchy

Application

Modified SWEQ (Würsch & Craig 2014)

Artificial observations

Application

Modified SWEQ (Würsch & Craig 2014)

Results: RMSE

root mean-square error of the ensemble median

Results: CRPS

maximize sharpness subject to calibration

CRPS

Results: SAL scores

Structure-Amplitude-Location (Wernli & al. 2008)

Next stage?

COSMO case-study (Lange & Craig 2014)

Challenges

Open questions

to explore

Thank you

0 0