Something old and something new

Randy Lai

10, Nov, 2014

What this talk is about?

Something old (before 1950)
- Ronald Fisher (1890 - 1962)
- Jerzy Neyman (1894 - 1981)

Something new (after 2000)

Frequentest rationale

Decision theory
A procedure \(\delta(x)\)
Define a loss function \(l(\theta, \delta)\) to measure performance
Report \((\delta(x), R_\delta(\theta))\) where \[ R_\delta(\theta) = E_\theta L[\theta, \delta(X)] \]

Example, \(X \sim N(\theta,1)\)
\(\delta(x) = (x-1.96 ,x + 1.96)\)
\(l(\theta, \delta) = I(\theta \not \in \delta(x)) = I[ \delta \not \in (x-1.96 , x+1.96)]\),
\(R_\delta(\theta) = P( \delta < X-1.96 \text{ or } \delta > X+1.96)\)

If \(\theta=0\) and \(x=1\), a frequentist will report the interval \((0.96, 2.96)\) and the risk \(0.05\)

Justification

Under some mild conditions, \[ \lim_{m\to \infty} \frac{1}{m} \sum_{i=1}^{m} L(\theta_0, \delta(X_i)) = R_\delta(\theta_0). \]

\(R_\delta(\theta_0)\) measures the long run performances of \(\delta\) for \(\theta_0\).

However, in practice, it is not quite useful.

\(\theta_0\) is unknown
\(\delta\) could be applied to different problems, different \(\theta_i\)

Formal definition

Consider an infinite sequence of problemes: \(X_i \sim P_{\theta_i}\)
Neyman (1967) defined the following frequentest measure \(\bar R_\delta\) to measure the performance of \(\delta\) \[ \limsup_{m\to \infty} \frac{1}{m} \sum_{i=1}^{m} L(\theta_i, \delta(X_i)) \le \bar R_\delta \]
- it is free of \(\theta\)
- Compare to the Bayes risk \[ r_\delta = \int R_\delta(\theta) \pi(\theta) d\theta \]
(Berger, 1985) “practicing” frequentist haves quite differently from the “formal” frequentist.

Potential problems of frequentist approach

Example 1

1 2 3 \(P_0\) 0.009 0.001 0.99 \(P_1\) 0.001 0.989 0.01

if \(x=1\), how much evidence/confidence to support \(P_0\) or \(P_1\)?

Frequentist approach

Consider the test which reject \(P_0\) when \(X=1,2\).
- Type I error prob = 0.01, Type II error prob = 0.01
- Upon observing \(x=1\), a frequentist will say “\(P_0\) is rejected with confidence 0.99”.
- Misleading? \(P_0(X=1) > P_1(X=1)\).

Example 2

Supposer \(X\sim N(\mu, 1)\) and it is desired to test \[ H_1: \mu \le -2 \text{ vs } H_1: \mu >2. \]

Consider the test: reject \(H_0\) if \(x \ge 0\).
\(\alpha = \sup_{\mu\le -2} P_\mu(X\le 0) = 0.0228\).

If \(x=0\), a frequentist will say “\(H_0\) is rejected with error 0.0228”.
misleading?
even worse, this frequentist statement will be valid as long as \(x\ge0\).

what is the problem here?
- conditioning

alternative statistical methods

Frequentisum is not the only game in town

Classical Subjective Bayesian
- Bernstein–von Mises theorem

Objective prior
- Jeffreys prior \[ \pi(\theta) = \sqrt{|I(\theta)|} \]
- reference prior (Berger et al., 2009)
  - maximizing certain kind of information entropy
  - data dependent

Confidence distributions (Xie and Singh, 2013)
- generalization of confidence interval of Neyman

Dempter-Shafer theory(Dempster, 2008) and inferential models (Liu and Martin, 2014)

Fiducial inference

Originated from Fisher (Fisher, 1935, 1930, 1933)
Was considered “one great failure” of Fisher (Zabell, 1992)
Fisher once said

I don’t understand yet what fiducial probability does. We shall have to live with it a long time before we know what it’s doing for us. But it should not be ignored just because we don’t yet have a clear interpretation.

So, what it is?

to constuct posterior in absence of subjective prior
Fisher called it “inverse probability”
similar to the role of probability and likelihood: \(L(\theta|x) = f(x|\theta)\)
what is fiducial argument?

Fiducial in action

Suppose we have - \(F(x|\theta) = P(X\le x | \theta)\)

Fisher argued that \(R(\theta) = 1- F(x|\theta)\) such that

\[ r(\theta|x) \propto - \frac{\partial F(x|\theta)}{\partial\theta} \]

Example

Supposer \(X \sim N(\theta,1)\), then \(F(x | \theta) = \Phi(x-\theta)\)
The fiducial density is \[ r(\theta|x) \propto - \frac{\partial F(x|\theta)}{\partial\theta} = \phi(x-\theta) \]
\(\theta|x \sim N(x,1)\)

Reactions from the community

fiducial inference was first proposed by Fisher in 1930
has never gained widespread acceptance
the original fiducial argument was radically different from its later versions
fiducial inference never actually developed during Fisher’s lifetime
dozens of example of non-uniqueness and non-existence

Break with Neyman

Neyman introduced confidence interval in 1934, claiming to have generalized fiducial probability
Interestingly, Fisher was one of the discussants
Fisher disputed the idea of “confidence”
“confidence” is a purely frequentist concept
“confidence” is known to omit, or suppress, part of the information supplied by the sample
In a 1935 JRSS paper, Fisher shapely criticized Neyman

“Dr. Neyman had been somewhat unwise in his choice of topics”

In 1941, Neyman published a paper “Fiducial argument and the theory of confidence interval”
During the first 2 decdeas of dispute, Fisher almost never referred directly to Neyman in print

After many years

Tsui and Weerahandi (1989) and Weerahandi (1993) proposed generalized inference and generalized \(p\)-value.
Hannig et al. (2006) noted the relationship between generalized inference and fiducial inference.
Hannig (2009) termed the new approach GFI or generalized fiducial inference
what is GFI?
switching principle

Generalized fiducial inference

Consider the following data generating / structural / auxiliary equation \[ Y = G(U, \theta) \]
\(U\) is a random variable whose distribution if free of \(\theta\).
\(\theta\) is a fixed parameter

Example: if \(Y\sim N(\theta, 1)\)
- \(Y = \theta + Z\), \(Z \sim N(0,1)\)
- \(Y = \theta + \Phi^{-1}(U)\), \(U \sim U(0,1)\)

from a probability assertion about a statistic to probability assertion about a parameter

\[\theta = Q(Y, U^*)\]

Definition

Suppose \(Y=y\)
the generalized fiducial distribution of \(\theta\) is defined as the conditional distribution of \(Q(Y, U^*)\) given \(y = G(U^*, \theta)\), i.e.,

\[ Q(y, U^*) | \{\exists \theta, y = G(U^*, \theta)\} \]

where \(U^*\) have the same distribution of \(U\).

naive generation method
generate \(U^*\)
if exists \(\theta\) st \(y = G(U^*, \theta)\), proceed, otherwise, go to 1
compute \(\theta = Q(y, U^*)\)

Examples

My favorite example: \[ Y = \theta + Z, \ Z \sim N(0,1) \]
The generalized fiducial distribution of \(\theta\) is

\[ Q(y, Z^*) | \{\exists \theta, y = \theta + Z^*\} \]

which is \(y-Z^* \sim N(y, 1)\)

Slightly more difficult example

\[ \begin{align*} Y_1 = \theta + Z_1\\ Y_2 = \theta + Z_2 \end{align*} \]

Given \(Y=y\) the fiducial distribution is

\[ Q({\boldsymbol{y}}, {\boldsymbol{Z}}^*) | \{y_1 - Z_1^* = y_2 - Z_2^*\} \]

which is \(N(\bar y, 1)\).

Closer look and Borel paradox

\(\{y_1 - Z_1^* = y_2 - Z_2^*\}\) is of probability 0
lead to Borel paradox - conditioning on sets of probability 0 may not be well defined
example, if \(X\) and \(Y\) are i.i.d. standard normal, what is the distribution of \(Y\) given \(Y=X\).

it is actually an ill-posed problem, depends
- \(W = X-Y = 0\)
- \(W = X/Y = 1\)

it can be obtained as a weak limit of \[ \begin{equation*} { \lim_{\epsilon\to 0} \left[\arg\min_{\theta} \|y-G(U^\star,\theta)\| \Big | \min_\theta\|y-G(U^\star,\theta)\|\leq \epsilon\right]}. \end{equation*} \]

Closed form of the weak limit

Under some regularity conditions,

\[ \begin{equation*} r(\theta|y) \propto J(y,\theta) f(y,\theta) , \end{equation*} \] where \(f(y,\theta)\) is the likelihood and the function \(J(y,\theta)\) is

\[ \begin{equation*} J(y,\theta)= \sum_{\substack{{\boldsymbol{i}}=(i_1,\ldots,i_p) \\ 1\leq i_1<\cdots<i_p\leq n}}\left|\det\left(\left.\frac{\partial}{\partial\theta} G(u,\theta)\right|_{u=Q(y,\theta)}\right)_{\boldsymbol{i}}\right|. \end{equation*} \]

data-dependent prior
depends on \(G\)
for linear regression model, this coincides with Jeffreys prior.

Two theorems

Theorem 1 (Bernstein–von Mises theorem of GFI): let

\[r^{*}\left(s|y\right)=n^{-1/2} r(n^{-1/2}s+\hat{\theta} | y),\]

we have \[ \int_{\mathbb{R}^{p}}\left|r^{*}\left(s|y\right)-\frac{\sqrt{det\left|I\left({\theta}_{0}\right)\right|}}{\sqrt{2\pi}}e^{-s^{T}I\left({\theta}_{0}\right)s/2}\right|\, ds\stackrel{P_{\boldsymbol{\theta}_{0}}}{\rightarrow}0. \]

Theorem 2 (Consistency of confidence sets):

Suppose \(C_n(y_n)\) is a open sets of fiducial probability \(1-\alpha\), i.e. \(R_n(C_n(y_n)) = 1-\alpha\).

Under some regularity conditions,

\[ P(\theta_0 \in C_n(Y_n)) \to 1 - \alpha \]

Some of my works

Computational issues of generalized fiducial inference (Hannig et al., 2014)
High dimensional regression (Lai et al., 2014)

End quote

Maybe Fisher’s biggest blunder will become a big hit in the 21st century! (Efron, 1998)

References

Berger, J. (1985). “The frequentisi viewpoint and conditioning,” In Proceedings of the berkeley conference in honor ofJerzy neyman and jack kieferVol. 1, pp. 15–44.

Berger, J. O., Bernardo, J. M., and Sun, D. (2009). “The formal definition of reference priors,” The Annals of Statistics.

Dempster, A. P. (2008). “The dempster-shafer calculus for statisticians,” International Journal of Approximate Reasoning 48, 365–377.

Efron, B. (1998). “RA fisher in the 21st century,” Statistical Science.

Fisher, R. (1935). “The fiducial argument in statistical inference,” Annals of Human Genetics 6, 391–398.

Fisher, R. A. (1930). “Inverse probability,” In Mathematical proceedings of the cambridge philosophical society(Cambridge Univ Press), Vol. 26, pp. 528–535.

Fisher, R. A. (1933). “The concepts of inverse probability and fiducial probability referring to unknown parameters,” Proceedings of the Royal Society of London. Series A 139, 343–348.

Hannig, J. (2009). “On generalized fiducial inference,” Statistica Sinica 19, 491–544.

Hannig, J., Iyer, H., and Patterson, P. (2006). “Fiducial generalized confidence intervals,” Journal of the American Statistical Association 101, 254–269.

Hannig, J., Lai, R., and Lee, T. (2014). “Computational issues of generalized fiducial inference,” Computational Statistics & Data Analysis 71, Special Issue on Imprecision in Statistical Data Analysis, 849–858 (invited by special issue co–editors).

Lai, R. C. S., Hannig, J., and Lee, T. C. M. (2014). Generalized fiducial inference for ultrahigh dimensional regression.

Liu, C., and Martin, R. (2014). “Frameworks for prior-free posterior probabilistic inference,” Wiley Interdisciplinary Reviews: Computational Statistics.

Neyman, J. (1967). A selection of early statistical papers of j. neyman (University of California Press Berkeley, CA).

Tsui, K.-W., and Weerahandi, S. (1989). “Generalized p-values in significance testing of hypotheses in the presence of nuisance parameters,” Journal of the American Statistical Association 84, 602–607.

Weerahandi, S. (1993). “Generalized confidence intervals,” Journal of the American Statistical Association 88, 899–905.

Xie, M.-g., and Singh, K. (2013). “Confidence distribution, the frequentist distribution estimator of a parameter: A review,” International Statistical Review.

Zabell, S. (1992). “RA fisher and fiducial argument,” Statistical Science.

Something old and something new – Randy Lai – Fiducial inference

randy3k

Something old and something new – Randy Lai – Fiducial inference

0 0

GFItalk

Something old and something new

Randy Lai

10, Nov, 2014

Table of Contents

What this talk is about?

Frequentest rationale

Justification

Formal definition

Potential problems of frequentist approach

Frequentisum is not the only game in town

Fiducial inference

Fiducial in action

Reactions from the community

Break with Neyman

After many years

Generalized fiducial inference

Definition

Examples

Closer look and Borel paradox

Closed form of the weak limit

Two theorems

Some of my works

End quote

References

Something old and something new – Randy Lai – Fiducial inference

randy3k

Something old and something new – Randy Lai – Fiducial inference

0 0 (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/platform.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

GFItalk

Something old and something new

Randy Lai

10, Nov, 2014

Table of Contents

What this talk is about?

Frequentest rationale

Justification

Formal definition

Potential problems of frequentist approach

Frequentisum is not the only game in town

Fiducial inference

Fiducial in action

Reactions from the community

Break with Neyman

After many years

Generalized fiducial inference

Definition

Examples

Closer look and Borel paradox

Closed form of the weak limit

Two theorems

Some of my works

End quote

References

0 0