Robust Bayesian Modeling – Yau Group meeting – Tammo Rukat



Robust Bayesian Modeling – Yau Group meeting – Tammo Rukat

0 0


robust_bayes


On Github TammoR / robust_bayes

Robust Bayesian Modeling

Yau Group meeting

Tammo Rukat

February 2, 2016

Bayesian Modeling

What is a Bayesian Model?

  • A joint distribution of parameters β and data x.
  • We usually think of exchangable models p(β,x)=p(β|α)⏟priorn∏i=1p(xi|β)⏟likelihood
  • Generalise this to accomodate most common models
    • conditional models: p(xi|β)→p(xi|yi,β)e.g.=N(xi|wTyi,σ)
    • or latent variable models: p(xi|β)=∑zp(xi,zi|β)e.g.=∑kπkN(xi|μk,σk)
  • data is assumed to be drawn from the parameter
  • parameter is drawn from the prior.
  • we condition on data to calculate the posterior distribution of parameters
  • when the data is small, the prior plays a big role
  • for large date the posterior converges to a point mass (independent of the model being true or not)

Interlude: What is the difference between parameters and latent variables?

  • Joint distribution: p(β,z,x)=p(β|α)⏟parametersn∏i=1p(zi|γ)⏟latent variablesp(xi|β,zi)
  • Practical notion:
    • Number of latent variables grows with number of data points.
    • Number of parameters stays fixed.
  • There is no real difference

A useful distinction: Local vs global variables

p(β,z,x)=p(β|α)⏟globaln∏i=1p(zi|γ)⏟localp(xi|β,zi)

  • The distinction is determined by conditional dependencies: p(xi,zi|x−i,z−i,β)=p(xi,zi|β)

The ith observation and the ith local variable are conditional independent, given the global variable, of all other local variables and observation

Example: Gaussian mixture model

  • Which are the local and which are the global variables?

p(x|z)=∏kN(x|μk,σk)zk;p(z)=∏kπzkk

  • global: means μk, standard deviations σk, mixture proportions πk;
  • local: cluster assignments zk.

Robust Bayesian Modeling

Motivation

  • Wang and Blei, 2015 – A General Method for Robust Bayesian Modeling

… all models are wrong …

  • Robustness: Inference should be insensitive to small deviations from the model assumptions.
  • Wang and Blei introduce a general framework for robust Bayesian modeling.
  • quote is by george box
  • the most generic approach is to use distributions with heavier tails
  • until now robust models have been build on a case by case basis
  • The aim of the authors is to introduce a general framework

Key idea: Localisation of global parameters

  • Classical model: p(β,x)=p(β|α)n∏i=1p(xi|β)
    • All data is drawn from the parameter.
    • The hyperparmater α is usually fixed.
  • Robust model: p(β,x)=n∏i=1p(βi|α)p(xi|β)
    • Every data point is assumed drawn from an individual realisation of the parameter, which is drawn from the prior.
    • Outliers are explained by variation in the parameters.

Graphical Model for Localisation

  • Classic model – global β

  • Robust model – local β

  • We now need to fit the hyperparameter α.
  • Fixing α would make the data points independent.

Example: Normal observation model

  • Localise the precision parameter and use the conjugate prior

p(xi|α)=∫p(xi|βi)p(βi|α)dβi=∫N(xi|μ,σi)Gam−1(σi|α)dσi

  • Any guesses?

p(xi|α)=Student-t(xi|μ,(λ,ν)=f(α))

2nd key idea: Empirical Bayes

  • Estimate hyperparameters via maximum likelihood ˆα=arg maxαn∑i=1∫p(xi|βi)p(βi|α)dβi
  • aka evidence approximation evidence=p(xi|α)=∫p(xi|βi)p(βi|α)dβi
  • Here we use the data to determine the prior, is that legit?
  • full bayesian inference: "bayes empirical bayes"
  • needs a hyperprior
  • evidence is the prob of the data, after integrating out the parameters. aka marginal likelihood.

Performance

Linear Regression

  • Trainin data: yi|xi∼N(ωTxi+b,σi+0.02)σi∼Gamma(k,1)
  • Test data: yi|xi∼N(ωTxi+b,0.02)

Logistic Regression

yi|xi∼Bernoulli(σ(ωTxi))

The posterior predictive

  • Classical Bayesian model: p(xi|x,α)=∫p(xi|β)p(β|x,α)dβ
    • Gives correct predictive distr. only if the data comes from the model.
  • Robust Bayesian model p(xi|ˆα)=∫p(xi|βi)p(βi|ˆα)dβi
    • Gives correct predictive distr. independent of model mismatch.
  • If we want to make predictions under the model, which one should we choose?

References

  • Wang and Blei 2015, "A General Method for Robust Bayesian Modeling"
  • Gelman et al. 2014 "Bayesian Data Analysis", 3rd Edition
  • Murphy 2012, "Machine Learning: A Probabilistic Perspective"
  • Carlin and Louis 2000, "Empirical Bayes: Past, Present and Future"
1
Robust Bayesian Modeling Yau Group meeting Tammo Rukat February 2, 2016