Stein's paradox11
22

In 1961 James and Stein published the paper Estimation with Quadratic Loss . Take normally distributed data with an unknown mean $\mu$ and variance $1$ . If you now choose a random value $x$ from this data and have to estimate the mean $\mu$ on the basis of this, intuitively $x$ is a reasonable estimate for $\mu$ (since a normal distribution is present, the randomly chosen $x$ is probably near $\mu$ ).

Now the experiment is repeated - this time with three independent, again normally distributed data sets each with variance $1$ and the mean values $\mu_1$ , $\mu_2$ , $\mu_3$ . After obtaining three random values $x_1$ , $x_2$ and $x_3$ , one estimates (using the same procedure) $\mu_1=x_1$ , $\mu_2=x_2$ and $\mu_3=x_3$ .

The surprising result of James and Stein is that there is a better estimate for $ \left( \mu_1, \mu_2, \mu_3 \right) $ (i.e. the combination of the three independent data sets) than $ \left( x_1, x_2, x_3 \right) $ . The "James Stein estimator" is then:

$$ \begin{pmatrix}\mu_1\\\mu_2\\\mu_3\end{pmatrix} = \left( 1-\frac{1}{x_1^2+x_2^2+x_3^2} \right) \begin{pmatrix}x_1\\x_2\\x_3\end{pmatrix} \neq \begin{pmatrix}x_1\\x_2\\x_3\end{pmatrix} $$

The mean square deviation of this estimator is then always smaller than the mean square deviation $ E \left[ \left|| X - \mu \right||^2 \right] $ of the usual estimator.

It is surprising and perhaps paradoxical that the James-Stein estimator shifts the usual estimator (by a shrinking factor) towards the origin and thus gives a better result in the majority of cases. This applies to dimensions $ \geq 3 $ , but not in the two-dimensional case.

A nice geometric explanation of why this works is provided by Brown & Zao . Note that this does not mean that you have a better estimate for every single dataset - you just have a better estimate with a smaller combined risk.

Back

Stein's paradox1122

Stein's paradox11
22