In 1961 James and Stein published the paper Estimation with Quadratic Loss . Take normally distributed data with an unknown mean $$\mu$$ and variance $$1$$ . If you now choose a random value $$x$$ from this data and have to estimate the mean $$\mu$$ on the basis of this, intuitively $$x$$ is a reasonable estimate for $$\mu$$ (since a normal distribution is present, the randomly chosen $$x$$ is probably near $$\mu$$ ).

Now the experiment is repeated - this time with three independent, again normally distributed data sets each with variance $$1$$ and the mean values $$\mu_1$$ , $$\mu_2$$ , $$\mu_3$$ . After obtaining three random values $$x_1$$ , $$x_2$$ and $$x_3$$ , one estimates (using the same procedure) $$\mu_1=x_1$$ , $$\mu_2=x_2$$ and $$\mu_3=x_3$$ .

The surprising result of James and Stein is that there is a better estimate for $$\left( \mu_1, \mu_2, \mu_3 \right)$$ (i.e. the combination of the three independent data sets) than $$\left( x_1, x_2, x_3 \right)$$ . The "James Stein estimator" is then:

$$\begin{pmatrix}\mu_1\\\mu_2\\\mu_3\end{pmatrix} = \left( 1-\frac{1}{x_1^2+x_2^2+x_3^2} \right) \begin{pmatrix}x_1\\x_2\\x_3\end{pmatrix} \neq \begin{pmatrix}x_1\\x_2\\x_3\end{pmatrix}$$

The mean square deviation of this estimator is then always smaller than the mean square deviation $$E \left[ \left|| X - \mu \right||^2 \right]$$ of the usual estimator.

It is surprising and perhaps paradoxical that the James-Stein estimator shifts the usual estimator (by a shrinking factor) towards the origin and thus gives a better result in the majority of cases. This applies to dimensions $$\geq 3$$ , but not in the two-dimensional case.

A nice geometric explanation of why this works is provided by Brown & Zao . Note that this does not mean that you have a better estimate for every single dataset - you just have a better estimate with a smaller combined risk.

Back