Something old and something new – Randy Lai – Fiducial inference



Something old and something new – Randy Lai – Fiducial inference

0 0


GFItalk

http://rtalks.net/GFItalk

On Github randy3k / GFItalk

Something old and something new

Randy Lai

10, Nov, 2014

Table of Contents

What this talk is about?

  • Something old (before 1950)
    • Ronald Fisher (1890 - 1962)
    • Jerzy Neyman (1894 - 1981)

  • Something new (after 2000)

Frequentest rationale

  • Decision theory
  • A procedure \(\delta(x)\)
  • Define a loss function \(l(\theta, \delta)\) to measure performance
  • Report \((\delta(x), R_\delta(\theta))\) where \[ R_\delta(\theta) = E_\theta L[\theta, \delta(X)] \]
  • Example, \(X \sim N(\theta,1)\)
  • \(\delta(x) = (x-1.96 ,x + 1.96)\)
  • \(l(\theta, \delta) = I(\theta \not \in \delta(x)) = I[ \delta \not \in (x-1.96 , x+1.96)]\),
  • \(R_\delta(\theta) = P( \delta < X-1.96 \text{ or } \delta > X+1.96)\)
  • If \(\theta=0\) and \(x=1\), a frequentist will report the interval \((0.96, 2.96)\) and the risk \(0.05\)

Justification

Under some mild conditions, \[ \lim_{m\to \infty} \frac{1}{m} \sum_{i=1}^{m} L(\theta_0, \delta(X_i)) = R_\delta(\theta_0). \]

\(R_\delta(\theta_0)\) measures the long run performances of \(\delta\) for \(\theta_0\).

However, in practice, it is not quite useful.

  • \(\theta_0\) is unknown
  • \(\delta\) could be applied to different problems, different \(\theta_i\)

Formal definition

  • Consider an infinite sequence of problemes: \(X_i \sim P_{\theta_i}\)

  • Neyman (1967) defined the following frequentest measure \(\bar R_\delta\) to measure the performance of \(\delta\) \[ \limsup_{m\to \infty} \frac{1}{m} \sum_{i=1}^{m} L(\theta_i, \delta(X_i)) \le \bar R_\delta \]
    • it is free of \(\theta\)
    • Compare to the Bayes risk \[ r_\delta = \int R_\delta(\theta) \pi(\theta) d\theta \]
  • (Berger, 1985) “practicing” frequentist haves quite differently from the “formal” frequentist.

Potential problems of frequentist approach

Example 1

1 2 3 \(P_0\) 0.009 0.001 0.99 \(P_1\) 0.001 0.989 0.01

if \(x=1\), how much evidence/confidence to support \(P_0\) or \(P_1\)?

  • Frequentist approach

    Consider the test which reject \(P_0\) when \(X=1,2\).

    • Type I error prob = 0.01, Type II error prob = 0.01
    • Upon observing \(x=1\), a frequentist will say “\(P_0\) is rejected with confidence 0.99”.

    • Misleading? \(P_0(X=1) > P_1(X=1)\).

Example 2

Supposer \(X\sim N(\mu, 1)\) and it is desired to test \[ H_1: \mu \le -2 \text{ vs } H_1: \mu >2. \]

  • Consider the test: reject \(H_0\) if \(x \ge 0\).
  • \(\alpha = \sup_{\mu\le -2} P_\mu(X\le 0) = 0.0228\).

  • If \(x=0\), a frequentist will say “\(H_0\) is rejected with error 0.0228”.

  • misleading?

  • even worse, this frequentist statement will be valid as long as \(x\ge0\).

  • what is the problem here?

    • conditioning
  • alternative statistical methods

Frequentisum is not the only game in town

  • Classical Subjective Bayesian
    • Bernstein–von Mises theorem
  • Objective prior
    • Jeffreys prior \[ \pi(\theta) = \sqrt{|I(\theta)|} \]
    • reference prior (Berger et al., 2009)
      • maximizing certain kind of information entropy
      • data dependent
  • Confidence distributions (Xie and Singh, 2013)
    • generalization of confidence interval of Neyman
  • Dempter-Shafer theory(Dempster, 2008) and inferential models (Liu and Martin, 2014)

Fiducial inference

  • Originated from Fisher (Fisher, 1935, 1930, 1933)
  • Was considered “one great failure” of Fisher (Zabell, 1992)
  • Fisher once said

    I don’t understand yet what fiducial probability does. We shall have to live with it a long time before we know what it’s doing for us. But it should not be ignored just because we don’t yet have a clear interpretation.

  • So, what it is?
  • to constuct posterior in absence of subjective prior
  • Fisher called it “inverse probability”

  • similar to the role of probability and likelihood: \(L(\theta|x) = f(x|\theta)\)

  • what is fiducial argument?

Fiducial in action

Suppose we have - \(F(x|\theta) = P(X\le x | \theta)\)

  • Fisher argued that \(R(\theta) = 1- F(x|\theta)\) such that

\[ r(\theta|x) \propto - \frac{\partial F(x|\theta)}{\partial\theta} \]

Example

  • Supposer \(X \sim N(\theta,1)\), then \(F(x | \theta) = \Phi(x-\theta)\)

  • The fiducial density is \[ r(\theta|x) \propto - \frac{\partial F(x|\theta)}{\partial\theta} = \phi(x-\theta) \]

  • \(\theta|x \sim N(x,1)\)

Reactions from the community

  • fiducial inference was first proposed by Fisher in 1930

  • has never gained widespread acceptance

  • the original fiducial argument was radically different from its later versions

  • fiducial inference never actually developed during Fisher’s lifetime

  • dozens of example of non-uniqueness and non-existence

Break with Neyman

  • Neyman introduced confidence interval in 1934, claiming to have generalized fiducial probability

  • Interestingly, Fisher was one of the discussants

  • Fisher disputed the idea of “confidence”

  • “confidence” is a purely frequentist concept

  • “confidence” is known to omit, or suppress, part of the information supplied by the sample

  • In a 1935 JRSS paper, Fisher shapely criticized Neyman

“Dr. Neyman had been somewhat unwise in his choice of topics”

  • In 1941, Neyman published a paper “Fiducial argument and the theory of confidence interval”

  • During the first 2 decdeas of dispute, Fisher almost never referred directly to Neyman in print

After many years

  • Tsui and Weerahandi (1989) and Weerahandi (1993) proposed generalized inference and generalized \(p\)-value.

  • Hannig et al. (2006) noted the relationship between generalized inference and fiducial inference.

  • Hannig (2009) termed the new approach GFI or generalized fiducial inference

  • what is GFI?

  • switching principle

Generalized fiducial inference

  • Consider the following data generating / structural / auxiliary equation \[ Y = G(U, \theta) \]

  • \(U\) is a random variable whose distribution if free of \(\theta\).
  • \(\theta\) is a fixed parameter

  • Example: if \(Y\sim N(\theta, 1)\)
    • \(Y = \theta + Z\), \(Z \sim N(0,1)\)
    • \(Y = \theta + \Phi^{-1}(U)\), \(U \sim U(0,1)\)

  • from a probability assertion about a statistic to probability assertion about a parameter
\[\theta = Q(Y, U^*)\]

Definition

  • Suppose \(Y=y\)

  • the generalized fiducial distribution of \(\theta\) is defined as the conditional distribution of \(Q(Y, U^*)\) given \(y = G(U^*, \theta)\), i.e.,

\[ Q(y, U^*) | \{\exists \theta, y = G(U^*, \theta)\} \]

where \(U^*\) have the same distribution of \(U\).

  • naive generation method
  • generate \(U^*\)
  • if exists \(\theta\) st \(y = G(U^*, \theta)\), proceed, otherwise, go to 1
  • compute \(\theta = Q(y, U^*)\)

Examples

  • My favorite example: \[ Y = \theta + Z, \ Z \sim N(0,1) \]

  • The generalized fiducial distribution of \(\theta\) is

\[ Q(y, Z^*) | \{\exists \theta, y = \theta + Z^*\} \]

which is \(y-Z^* \sim N(y, 1)\)

  • Slightly more difficult example

\[ \begin{align*} Y_1 = \theta + Z_1\\ Y_2 = \theta + Z_2 \end{align*} \]

Given \(Y=y\) the fiducial distribution is

\[ Q({\boldsymbol{y}}, {\boldsymbol{Z}}^*) | \{y_1 - Z_1^* = y_2 - Z_2^*\} \]

which is \(N(\bar y, 1)\).

Closer look and Borel paradox

  • \(\{y_1 - Z_1^* = y_2 - Z_2^*\}\) is of probability 0

  • lead to Borel paradox - conditioning on sets of probability 0 may not be well defined

  • example, if \(X\) and \(Y\) are i.i.d. standard normal, what is the distribution of \(Y\) given \(Y=X\).

  • it is actually an ill-posed problem, depends
    • \(W = X-Y = 0\)
    • \(W = X/Y = 1\)
  • it can be obtained as a weak limit of \[ \begin{equation*} { \lim_{\epsilon\to 0} \left[\arg\min_{\theta} \|y-G(U^\star,\theta)\| \Big | \min_\theta\|y-G(U^\star,\theta)\|\leq \epsilon\right]}. \end{equation*} \]

Closed form of the weak limit

Under some regularity conditions,

\[ \begin{equation*} r(\theta|y) \propto J(y,\theta) f(y,\theta) , \end{equation*} \] where \(f(y,\theta)\) is the likelihood and the function \(J(y,\theta)\) is

\[ \begin{equation*} J(y,\theta)= \sum_{\substack{{\boldsymbol{i}}=(i_1,\ldots,i_p) \\ 1\leq i_1<\cdots<i_p\leq n}}\left|\det\left(\left.\frac{\partial}{\partial\theta} G(u,\theta)\right|_{u=Q(y,\theta)}\right)_{\boldsymbol{i}}\right|. \end{equation*} \]

  • data-dependent prior
  • depends on \(G\)
  • for linear regression model, this coincides with Jeffreys prior.

Two theorems

Theorem 1 (Bernstein–von Mises theorem of GFI): let

\[r^{*}\left(s|y\right)=n^{-1/2} r(n^{-1/2}s+\hat{\theta} | y),\]

we have \[ \int_{\mathbb{R}^{p}}\left|r^{*}\left(s|y\right)-\frac{\sqrt{det\left|I\left({\theta}_{0}\right)\right|}}{\sqrt{2\pi}}e^{-s^{T}I\left({\theta}_{0}\right)s/2}\right|\, ds\stackrel{P_{\boldsymbol{\theta}_{0}}}{\rightarrow}0. \]

Theorem 2 (Consistency of confidence sets):

  • Suppose \(C_n(y_n)\) is a open sets of fiducial probability \(1-\alpha\), i.e. \(R_n(C_n(y_n)) = 1-\alpha\).

Under some regularity conditions,

\[ P(\theta_0 \in C_n(Y_n)) \to 1 - \alpha \]

Some of my works

  • Computational issues of generalized fiducial inference (Hannig et al., 2014)
  • High dimensional regression (Lai et al., 2014)

End quote

  • Maybe Fisher’s biggest blunder will become a big hit in the 21st century! (Efron, 1998)

References

Berger, J. (1985). “The frequentisi viewpoint and conditioning,” In Proceedings of the berkeley conference in honor ofJerzy neyman and jack kieferVol. 1, pp. 15–44.

Berger, J. O., Bernardo, J. M., and Sun, D. (2009). “The formal definition of reference priors,” The Annals of Statistics.

Dempster, A. P. (2008). “The dempster-shafer calculus for statisticians,” International Journal of Approximate Reasoning 48, 365–377.

Efron, B. (1998). “RA fisher in the 21st century,” Statistical Science.

Fisher, R. (1935). “The fiducial argument in statistical inference,” Annals of Human Genetics 6, 391–398.

Fisher, R. A. (1930). “Inverse probability,” In Mathematical proceedings of the cambridge philosophical society(Cambridge Univ Press), Vol. 26, pp. 528–535.

Fisher, R. A. (1933). “The concepts of inverse probability and fiducial probability referring to unknown parameters,” Proceedings of the Royal Society of London. Series A 139, 343–348.

Hannig, J. (2009). “On generalized fiducial inference,” Statistica Sinica 19, 491–544.

Hannig, J., Iyer, H., and Patterson, P. (2006). “Fiducial generalized confidence intervals,” Journal of the American Statistical Association 101, 254–269.

Hannig, J., Lai, R., and Lee, T. (2014). “Computational issues of generalized fiducial inference,” Computational Statistics & Data Analysis 71, Special Issue on Imprecision in Statistical Data Analysis, 849–858 (invited by special issue co–editors).

Lai, R. C. S., Hannig, J., and Lee, T. C. M. (2014). Generalized fiducial inference for ultrahigh dimensional regression.

Liu, C., and Martin, R. (2014). “Frameworks for prior-free posterior probabilistic inference,” Wiley Interdisciplinary Reviews: Computational Statistics.

Neyman, J. (1967). A selection of early statistical papers of j. neyman (University of California Press Berkeley, CA).

Tsui, K.-W., and Weerahandi, S. (1989). “Generalized p-values in significance testing of hypotheses in the presence of nuisance parameters,” Journal of the American Statistical Association 84, 602–607.

Weerahandi, S. (1993). “Generalized confidence intervals,” Journal of the American Statistical Association 88, 899–905.

Xie, M.-g., and Singh, K. (2013). “Confidence distribution, the frequentist distribution estimator of a parameter: A review,” International Statistical Review.

Zabell, S. (1992). “RA fisher and fiducial argument,” Statistical Science.