Robust Bayesian Modeling

Yau Group meeting

Tammo Rukat

February 2, 2016

Bayesian Modeling
Robust Bayesian Modeling
Performance
The posterior predictive
References

Bayesian Modeling

What is a Bayesian Model?

A joint distribution of parameters β and data x.
We usually think of exchangable models p(β,x)=p(β|α)⏟priorn∏i=1p(xi|β)⏟likelihood

Generalise this to accomodate most common models
- conditional models: p(xi|β)→p(xi|yi,β)e.g.=N(xi|wTyi,σ)
- or latent variable models: p(xi|β)=∑zp(xi,zi|β)e.g.=∑kπkN(xi|μk,σk)

data is assumed to be drawn from the parameter
parameter is drawn from the prior.
we condition on data to calculate the posterior distribution of parameters
when the data is small, the prior plays a big role
for large date the posterior converges to a point mass (independent of the model being true or not)

Interlude: What is the difference between parameters and latent variables?

Joint distribution: p(β,z,x)=p(β|α)⏟parametersn∏i=1p(zi|γ)⏟latent variablesp(xi|β,zi)
Practical notion:
- Number of latent variables grows with number of data points.
- Number of parameters stays fixed.
There is no real difference

A useful distinction: Local vs global variables

p(β,z,x)=p(β|α)⏟globaln∏i=1p(zi|γ)⏟localp(xi|β,zi)

The distinction is determined by conditional dependencies: p(xi,zi|x−i,z−i,β)=p(xi,zi|β)

The ith observation and the ith local variable are conditional independent, given the global variable, of all other local variables and observation

Example: Gaussian mixture model

Which are the local and which are the global variables?

p(x|z)=∏kN(x|μk,σk)zk;p(z)=∏kπzkk

global: means μk, standard deviations σk, mixture proportions πk;
local: cluster assignments zk.

Robust Bayesian Modeling

Motivation

Wang and Blei, 2015 – A General Method for Robust Bayesian Modeling

… all models are wrong …

Robustness: Inference should be insensitive to small deviations from the model assumptions.
Wang and Blei introduce a general framework for robust Bayesian modeling.

quote is by george box
the most generic approach is to use distributions with heavier tails
until now robust models have been build on a case by case basis
The aim of the authors is to introduce a general framework

Key idea: Localisation of global parameters

Classical model: p(β,x)=p(β|α)n∏i=1p(xi|β)
- All data is drawn from the parameter.
- The hyperparmater α is usually fixed.
Robust model: p(β,x)=n∏i=1p(βi|α)p(xi|β)
- Every data point is assumed drawn from an individual realisation of the parameter, which is drawn from the prior.
- Outliers are explained by variation in the parameters.

Graphical Model for Localisation

Classic model – global β

Robust model – local β

We now need to fit the hyperparameter α.
Fixing α would make the data points independent.

Example: Normal observation model

Localise the precision parameter and use the conjugate prior

Any guesses?

p(xi|α)=Student-t(xi|μ,(λ,ν)=f(α))

2nd key idea: Empirical Bayes

Estimate hyperparameters via maximum likelihood ˆα=arg maxαn∑i=1∫p(xi|βi)p(βi|α)dβi
aka evidence approximation evidence=p(xi|α)=∫p(xi|βi)p(βi|α)dβi
Here we use the data to determine the prior, is that legit?

full bayesian inference: "bayes empirical bayes"
needs a hyperprior
evidence is the prob of the data, after integrating out the parameters. aka marginal likelihood.

Performance

Linear Regression

Trainin data: yi|xi∼N(ωTxi+b,σi+0.02)σi∼Gamma(k,1)
Test data: yi|xi∼N(ωTxi+b,0.02)

Logistic Regression

yi|xi∼Bernoulli(σ(ωTxi))

The posterior predictive

Classical Bayesian model: p(xi|x,α)=∫p(xi|β)p(β|x,α)dβ
- Gives correct predictive distr. only if the data comes from the model.
Robust Bayesian model p(xi|ˆα)=∫p(xi|βi)p(βi|ˆα)dβi
- Gives correct predictive distr. independent of model mismatch.
If we want to make predictions under the model, which one should we choose?

References

Wang and Blei 2015, "A General Method for Robust Bayesian Modeling"
Gelman et al. 2014 "Bayesian Data Analysis", 3rd Edition
Murphy 2012, "Machine Learning: A Probabilistic Perspective"
Carlin and Louis 2000, "Empirical Bayes: Past, Present and Future"

Robust Bayesian Modeling Yau Group meeting Tammo Rukat February 2, 2016

Robust Bayesian Modeling – Yau Group meeting – Tammo Rukat

TammoR

Robust Bayesian Modeling – Yau Group meeting – Tammo Rukat

0 0

robust_bayes

Robust Bayesian Modeling

Yau Group meeting

Tammo Rukat

February 2, 2016

Table of Contents

Bayesian Modeling

What is a Bayesian Model?

Interlude: What is the difference between parameters and latent variables?

A useful distinction: Local vs global variables

Example: Gaussian mixture model

Robust Bayesian Modeling

Motivation

Key idea: Localisation of global parameters

Graphical Model for Localisation

Example: Normal observation model

2nd key idea: Empirical Bayes

Performance

Linear Regression

Logistic Regression

The posterior predictive

References

Robust Bayesian Modeling – Yau Group meeting – Tammo Rukat

TammoR

Robust Bayesian Modeling – Yau Group meeting – Tammo Rukat

0 0 (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/platform.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

robust_bayes

Robust Bayesian Modeling

Yau Group meeting

Tammo Rukat

February 2, 2016

Table of Contents

Bayesian Modeling

What is a Bayesian Model?

Interlude: What is the difference between parameters and latent variables?

A useful distinction: Local vs global variables

Example: Gaussian mixture model

Robust Bayesian Modeling

Motivation

Key idea: Localisation of global parameters

Graphical Model for Localisation

Example: Normal observation model

2nd key idea: Empirical Bayes

Performance

Linear Regression

Logistic Regression

The posterior predictive

References

0 0