Ensemble methods – for data assimilation – Bayes Filter



Ensemble methods – for data assimilation – Bayes Filter

0 0


ensemble-da


On Github robertsy / ensemble-da

Ensemble methods

for data assimilation

27/11/2014 Sylvain Robert

  • thank you for coming
  • give you an idea of what my research is about
  • what kind of problems and solutions
  • first: motivating application

Convective storms forecasting

heavy rains, hail, flash flooding, etc.

Copyright MattysFlicks
  • where and when: short-term
  • dramatic consequences: heavy rainfalls, flash flooding, tornadoes (not here)
  • difficult problem!
  • Where do I come into play? good forecast <- good initial conditions = good DA

Goal: predicting convective scale storms

Very important as these events have potentially huge negative impact, such as:

  • heavy rainfall
  • hail (vineyard)
  • tornadoes (not so much here)
  • strong winds
  • flash flooding

short-term prediction: between 1 and 4 hours

  • Good initial condition = Good forecast
  • Good data assimilation (DA) = Good initial condition

What is Data Assimilation (DA)?

X_tXt state of the atmosphere at time tt
Y_tYt observation at time tt
  • discretization of the 3D domain: the atmosphere
  • Horizontal grid and vertical (pressure)
  • each cell is itself multidimensional: T, P, etc.
  • Final state: all that stack into a giant vector
  • Dynamic: propagate Xt in time with model
  • best Xt given model and knowledge of how Yt are produced and history of Yt

What is DA for statisticians?

State-space model: Filtering

P(X_t|Y_{1:t}) = ??P(Xt|Y1:t)=??

  • how do we model that as statisticans?
  • general state-space model
  • The true state XX is assumed to be an unobserved Markov process and the measurements Y_tYt are a noisy function of the state
  • Y_tYt depends on the state only through X_tXt.
  • State of Xt given history of observations: filtering

Bayes Filter

Recursive solution

Kalman Filter

Exact solution when: \begin{align*} X_{t} &= A X_{t-1} + \epsilon_t \\\ Y_t &= H X_t + \eta_t \end{align*}

There is an exact solution when everything is linear and Gaussian: the Kalman Filter (invented here at ethz)

,,,

There is an exact solution when everything is linear and Gaussian: the Kalman Filter (invented here at ethz)

,pnotes

Bayes Filter

More detail:

\begin{align*} P(X_t | Y_{1:t}) &\propto P(Y_t | X_t) P( X_t | Y_{1:t-1} ) \\\ &\propto P(Y_t | X_t) P( X_{t-1} | Y_{1:t-1}) P(X_t | X_{t-1}) \end{align*}

Forecast

X^f_{t} \sim P(X^u_t | X_{t-1})

Update

X^u_{t} \sim P(X_t | X^f_{t}, Y_t)

Kalman Filter

Linear + Gaussian

Kalman (wikipedia)
Exact solution when: \begin{align*} X_{t} &= A X_{t-1} + \epsilon_t \\\ Y_t &= H X_t + \eta_t \end{align*}

Particle Filter

Non-linear + non-Gaussian

Monte-Carlo approximation with: \begin{align*} X_{t} &= f ( X_{t-1} , \epsilon_t) \\\ Y_t &= h(X_t , \eta_t ) \end{align*}

Large-scale applications?

  • global atmosphere dynamics
  • ocean dynamics
  • reservoir modelling

Here KF and PF fail, so new ensemble based methods have been developed.

Ensemble Kalman Filter

Kalman Filter + Monte-Carlo

  • developed by Evensen in 1994 for ocean dynamics DA
  • widely used for large-scale environmental applications
  • Ensemble (sample size of ~100)
  • low-rank representation of the covariance

Monte Carlo approximation of the Kalman Filter. Propagate particles using the physical model of the atmosphere (no linearization or any higher-order approximation needed). To do the update: use the sample covariance (low rank representation, as sample size << dimension of X). Draw a sample from the Normal distribution with updated mean and covariance. Stochastic sample or deterministic (with a transform ensuring that the data have exactly the given sample covariance): this is useful as the sample size is very small and Monte Carlo error can be detrimental.

,,,
  • developed by Evensen in 1994 for ocean dynamics DA
  • widely used for large-scale environmental applications
  • Ensemble (sample size of ~100)
  • low-rank representation of the covariance

Monte Carlo approximation of the Kalman Filter. Propagate particles using the physical model of the atmosphere (no linearization or any higher-order approximation needed). To do the update: use the sample covariance (low rank representation, as sample size << dimension of X). Draw a sample from the Normal distribution with updated mean and covariance. Stochastic sample or deterministic (with a transform ensuring that the data have exactly the given sample covariance): this is useful as the sample size is very small and Monte Carlo error can be detrimental.

,pnotes

Ensemble Kalman Filter

\mu^p = \text{sample mean}\\ P^p = \text{sample covariance (tapered)} \\ K = PH'(HPH' + R)^{-1}\\ \mu^u = \mu^p + K (y - H\mu^p)\\ P^u = (I - KH)P^p
And the update of particle j can be done as:
X^{u, (j)} = X^{p, (j)} + K (y + \epsilon^{(j)} - HX^{p, (j)})\\ \text{where } \epsilon \sim \mathcal{N}(0, R)

What about the Particle Filter?

Curse of dimensionality

PF is great, as it can handle non-linear system with non-Gaussian noise. Unfortunately, the number of particles needed for a good approximation grows exponentially with the dimension of the state-space. For medium-scale applications it is already impossible...

Ensemble Kalman Particle Filter

(Frei & Künsch 2013)

  • EnKF update + PF correction
  • progressive correction: split likelihood
  • Idea: First draw the particles toward the observations, but not as much as in the EnKF case. Then resample particles based on how well they fit the data

  • avoid sample degeneracy, as particles are drawn in the region of higher density.

  • keep more non-Gaussian features, as the resampling step doesn't depend on normal assumptions

  • the first update can be done analytically, which allows for an efficient algorithm. Furthermore it avoids to have ties (particles with exactly the same value)

,,,
  • EnKF update + PF correction
  • progressive correction: split likelihood
  • Idea: First draw the particles toward the observations, but not as much as in the EnKF case. Then resample particles based on how well they fit the data

  • avoid sample degeneracy, as particles are drawn in the region of higher density.

  • keep more non-Gaussian features, as the resampling step doesn't depend on normal assumptions

  • the first update can be done analytically, which allows for an efficient algorithm. Furthermore it avoids to have ties (particles with exactly the same value)

,pnotes

Ensemble Kalman Particle Filter

  • The update distribution is decomposed as:
\pi^u(dx) \propto l(x | y)^{1-\gamma} \pi^{u,\gamma} (dx), \quad \pi^{u,\gamma} (dx) \propto l(x | y)^{\gamma} \pi^p(dx)
  • two-stage procedure:

    \pi^p \xrightarrow[]{\text{EnKF}} \pi^{u,\gamma} \xrightarrow[]{\text{PF}} \pi^u
  • First update: draw the particles toward the observations

  • Second update: resample according to likelihood

  • The update is done analytically and sampling is postponed until the very end

Very large-scale applications?

dim of X \approx 10^8 - very short-term

  • convective scale
  • high-resolution
  • very short term = very fast computation (3-4 hrs forecast)

Localization

dim of X \approx 10^3 - parallel computation

  • by going small scale ;)
  • update every grid-point independently (in parallel)
  • use only local observations (dim much smaller)
  • number of particles/dimension is much higher!
  • efficient computation

The algorithm to beat: LETKF

Localized Ensemble Transform Kalman Filter

Mike Tyson (source)
  • Super-efficient
  • Computation in parallel
  • High-resolution
  • Proven track record
  • State-of-the-art method
  • Transform because it uses a special sqrt scheme (not stochastic)

How can we localize the EnKPF?

  • Need to localize algorithm
  • not so straightforward because of the resampling step
  • We developed 2 new algorithms to do so

Problem with localization

LETKF vs. PF

  • very schematic (not litterally correct)
  • local update (in the red circle)
  • black line is pulled toward the observations
  • With PF, a new particle is sampled locally => discontinuity

Localized EnKPF (1)

LEnKPF

  • Pros: straighforward + no discontinuities
  • Cons: not fully local

\pi^p \xrightarrow[]{\text{Local EnKF}} \pi^{u,\gamma} \xrightarrow[]{\text{Global PF}} \pi^u

  • Localized EnKF + Global PF
  • Locally adjust the particles
  • resample at a global level

Localized EnKPF (2)

Kriging EnKPF (KEnKPF)

  • Pro: fully localized
  • Con: more difficult to implement
  • local EnKPF update
  • Kriging sim in a buffer zone around
  • no discontinuities
  • update blocks in parallel (coloring)

KF, PF, EnKF, EnKPF, LETKF, ETC?

Model hierarchy

  • zoo of methods
  • x: small/medium/large scale
  • y: how much assumption-free
  • (KF vs. PF) => EnKF and EnKPF
  • localization: order arbitrary
  • localized PF? some things going on

Application

Modified SWEQ (Würsch & Craig 2014)

Simplified cloud convection in 1D (dimension: 1000)

  • medium scale toy model ~ convection
  • x: location (along a line): dx=500m
  • cloud => rain, + wind (bottom)
  • red dots: observations
  • dynamics from SWEQ
  • random plumes in wind
,,,
  • medium scale toy model ~ convection
  • x: location (along a line): dx=500m
  • cloud => rain, + wind (bottom)
  • red dots: observations
  • dynamics from SWEQ
  • random plumes in wind
,pnotes

Artificial observations

Designed to mimic radar images

  • pressure: unobserved
  • rain + noise: everywhere with threshold
  • rain noise is skewed: (\sqrt{r} + \epsilon)^2
  • wind: doppler effect on raindrops (with Gaussian noise)

Application

Modified SWEQ (Würsch & Craig 2014)

Filtering with KEnKPF (ensemble size of 50)

Results: RMSE

root mean-square error of the ensemble median

  • assimilation for 1h + 1h forecast
  • RMSE of h,r,u
  • Ensemble size: 50, nreps: 100

Results: CRPS

maximize sharpness subject to calibration

Calibration refers to the statistical consistency between the distributional forecasts and the observations and is a joint property of the predictions and the events that materialize. Sharpness refers to the concentration of the predictive distributions and is a property of the forecasts only.

,,,

Calibration refers to the statistical consistency between the distributional forecasts and the observations and is a joint property of the predictions and the events that materialize. Sharpness refers to the concentration of the predictive distributions and is a property of the forecasts only.

,pnotes

CRPS

  • Calibration: consistency beteween forecast and observations
  • Sharpness: concentration of the predictive distribution
  • Distance between the cdf of the forecast and the observation
  • Units as original variable
crps(F, x) = \int_{-\infty}^{+\infty} \{ F(y) - 1(y\geq x) \}^2 dy
taken from eumetcal.org

Results: SAL scores

Structure-Amplitude-Location (Wernli & al. 2008)

Next stage?

COSMO case-study (Lange & Craig 2014)

Idealized setup of a convective situation

dimension of X: 10^8

taken from Lange & Craig 2014
  • almost real size
  • storm over jura in July 2007
  • no topography
  • 400km x 400km x50layers x 8vars, 2km resolution
  • MeteoCH and DWD

Challenges

  • Efficient implementation: parallel

  • Scalability of the method

Albis and Lema (CSCS)
  • efficient+fast
  • super-computer: in parallel (see pics)
  • will I be able to implement them?
  • if yes, will they work at this scale?

Open questions

to explore

  • Localized resampling with dependencies (MRF, coupling)

  • Non-Gaussian observations

  • Parameter estimation

  • Model error

  • High-dimensional covariance estimation

  • ...

Thank you

This presentation can be viewed online at http://robertsy.github.com/ensemble-da