Central Limit Theorem in Action

Let’s say we want to guess a secret number. We have a magic button that gives the sum of the secret integer and a random quantity, also called noise, in the continuous range of [-10, 10], distributed uniformly. Notice that the noise has a mean of zero.

One strategy is to push the magic button once and guess the resulting number. Another strategy is to push the button many times and guess the mean of the numbers given to us. Which strategy do you think is better?

As you may have decided, we will always be better off considering more, rather than less, values from magic button pushes. While each push of the magic button does not give the secret number precisely, it still reveals more information about what the secret number is more likely to be.

A mean calculated with more samples is more accurate than a mean calculated with less samples. Let’s look at the distributions of means calculated from different numbers of samples to see this effect in action.

Consider sampling 10,000 batches of 1 samples each. We then have 10,000 "means", which are each really just 1 sample. If we plot a histogram of these 10,000 "means", we get the following:

Notice that there are no samples less than -6 or greater than 14, which is a range of 14 - (-6) = 20. In fact, this roughly looks like the normal distribution over [-6, 14].

  1. Why does the distribution of samples have the qualities described above?

  2. Can we deduce anything about what our secret number might be?

  3. If we only looked at one of these 10,000 "averages" (just a single sample), would it help us guess our secret number accurately?

Now, consider sampling 10,000 batches of 2 samples each. We then have 10,000 means, where each mean was taken over 2 samples. If we plot a histogram of these 10,000 means, we get the following:

This is a huge improvement over drawing just one sample! Notice that the distribution of means constructed from two samples tends towards a particular value (the secret number). This means that a mean over 2 samples is more likely to give us a closer guess to the secret number than a single sample.

Here's the result of sampling 10,000 batches of 10 samples each:

Wow! Notice how means over 10 samples really only range within 11 - (-3) = 14 values, instead of 20 like before! Taking a mean over 10 samples is even more likely to get us closer to the secret number. Also, this distribution now looks like the normal distribution.

Lastly, here's the result of sampling 10,000 batches of 100 samples each:

The trend continues; the more samples used to construct the sample mean, the more narrowly the distribution of the sample means centers on the secret number. In other words, the variance of the normal distribution of sample means decreases as the number of samples per mean increases.

We have now seen very clearly that the more samples we include in our sample mean, the more accurate our guess of the secret number.

We also observed that the plots progressively more closely resemble the PDF of a normal distribution centered around the population mean. This is no coincidence. Formally written, the Central Limit Theorem states:

limxXˉn=N(μ,1nσ2)\lim_{x \to \infty} \bar X_n = N(\mu, \frac{1}{n}\sigma^2)
  • Xˉn\bar X_nis the sample mean over nn samples drawn from random variable XX

  • μ\mu is the population mean of a random variable X

  • σ2\sigma^2 is the variance of X

  • N(μ,1nσ2)N(\mu, \frac{1}{n}\sigma^2)is the normal distribution with mean μ\mu and variance 1nσ2\frac{1}{n}\sigma^2

This means that as the number of samples included in the sample mean go to infinite, the sample mean converges to the normal distribution. This normal distribution has the same mean as XX and has a variance equal to the variance of XX scaled by a factor of 1n\frac{1}{n}.

Incredibly enough, this holds true even if the distribution of X is not normally distributed. In the example that produced the graphs above, recall that the noise was distributed uniformly, not normally!

In the examples graphed above, we were always centered around the secret number of 4 . This is because the means of N(μ,1nσ2)N(\mu, \frac{1}{n}\sigma^2) is μ\mu. Recall that the sample means became more tightly centered around 4 as we increased nn. This is because the variance of N(μ,1nσ2)N(\mu, \frac{1}{n}\sigma^2) decreases as nnincreases!

To finish up, here's a fun central limit theorem tidbit about a cow and a crowd.

The effect of increasing the accuracy of a measurement by increasing the number of measurements taken was exemplified by statistician Francis Galton in 1906. Galton compared the average guess of the weight of an ox over 787 random people to the guess of 1 highly qualified cattle expert. Not only was the crowd guess much more accurate than the expert's guess, but the crowd guess was only one pound off of the actual weight. This phenomenon is known as the 'wisdom of the crowd' and only holds when each person in the crowd makes their guess independently. This phenomenon is an example of the central limit theorem in action.

Last updated