Opportunity Through Data Textbook
  • Opportunity Through Data Textbook
  • Introduction
    • What is Data Science?
    • Introduction to Data Science: Exploratory Musical Analysis
  • Module 1
    • Introduction to Programming
      • The Command Line
      • Installing Programs
      • Python and the Command Line
      • Jupyter Notebook
    • Introduction to Python
      • Building Blocks of Python - Data Types and Variables
      • Functions
      • Formatting and Syntax
    • Math Review
      • Variables and Functions
      • Intro to Graphs
  • Module 2
    • Data Structures
      • Lists
      • Dictionaries
      • Tables
    • Programming Logic
      • Loops
      • Logical Operators
      • Conditionality
  • Module 3
    • Introduction to Probability
      • Probability and Sampling
    • Introduction to Statistics
      • Mean & Variance
      • Causality & Randomness
  • Module 4
    • Packages
    • Intro to NumPy
      • NumPy (continued)
  • Module 5
    • Introduction to Pandas
      • Introduction to Dataframes
      • Groupby and Join
    • Working with Data
    • Data Visualization
      • Matplotlib
      • Introduction to Data Visualization
  • Appendix
    • Table Utilities
    • Area of More Complicated Shapes
    • Introduction to Counting
    • Slope and Distance
    • Short Circuiting
    • Linear Regression
    • Glossary
  • Extension: Classification
    • Classification
    • Test Sets and Training Sets
    • Nearest Neighbors
  • Extension: Introduction to SQL
    • Introduction to SQL
    • Table Operations
      • Tables and Queries
      • Joins
  • Extension: Central Limit Theorem
    • Overview
    • Probability Distributions
      • Bernoulli Distribution
      • Uniform Distribution (Discrete)
      • Random Variables, Expectation, Variance
      • Discrete and Continuous Distributions
      • Uniform Distribution (Continuous)
      • Normal Distribution
    • Central Limit Theorem in Action
    • Confidence Intervals
  • Extension: Object-Oriented Programming
    • Object-Oriented Programming
      • Classes
      • Instantiation
      • Dot Notation
      • Mutability
  • Extension: Introduction to Excel
    • Introduction to Excel
      • Terminology and Interface
      • Getting Started with Analysis and Charts
      • Basics of Manipulating Data
    • Additional Features in Excel
      • Macros
      • The Data Tab
      • Pivot Tables
Powered by GitBook
On this page

Was this helpful?

  1. Extension: Central Limit Theorem

Central Limit Theorem in Action

PreviousNormal DistributionNextConfidence Intervals

Last updated 4 years ago

Was this helpful?

Let’s say we want to guess a secret number. We have a magic button that gives the sum of the secret integer and a random quantity, also called noise, in the continuous range of [-10, 10], distributed uniformly. Notice that the noise has a mean of zero.

One strategy is to push the magic button once and guess the resulting number. Another strategy is to push the button many times and guess the mean of the numbers given to us. Which strategy do you think is better?

As you may have decided, we will always be better off considering more, rather than less, values from magic button pushes. While each push of the magic button does not give the secret number precisely, it still reveals more information about what the secret number is more likely to be.

A mean calculated with more samples is more accurate than a mean calculated with less samples. Let’s look at the distributions of means calculated from different numbers of samples to see this effect in action.

Consider sampling 10,000 batches of 1 samples each. We then have 10,000 "means", which are each really just 1 sample. If we plot a histogram of these 10,000 "means", we get the following:

Notice that there are no samples less than -6 or greater than 14, which is a range of 14 - (-6) = 20. In fact, this roughly looks like the normal distribution over [-6, 14].

  1. Why does the distribution of samples have the qualities described above?

  2. Can we deduce anything about what our secret number might be?

  3. If we only looked at one of these 10,000 "averages" (just a single sample), would it help us guess our secret number accurately?

  1. Recall that each sample is a random number drawn from a uniform distribution over [-10, 10] added to a constant integer (the secret number). The only random part of a sample therefore is the part drawn from the uniform distribution, which is why the samples themselves are uniformly distributed.

  2. Yes! We know that the noise is centered around 0. This means that the mean value over all 10,000 samples should have very low noise, and should be close to the secret number. The strongest guess for the secret number is 4, since 4 is about the mean of the distribution above in the graph.

  3. Not really. The "means" are distributed uniformly over a range of 20 integers. If we pick one at random, it is not any more likely to be an accurate guess than an inaccurate guess.

Now, consider sampling 10,000 batches of 2 samples each. We then have 10,000 means, where each mean was taken over 2 samples. If we plot a histogram of these 10,000 means, we get the following:

This is a huge improvement over drawing just one sample! Notice that the distribution of means constructed from two samples tends towards a particular value (the secret number). This means that a mean over 2 samples is more likely to give us a closer guess to the secret number than a single sample.

Here's the result of sampling 10,000 batches of 10 samples each:

Wow! Notice how means over 10 samples really only range within 11 - (-3) = 14 values, instead of 20 like before! Taking a mean over 10 samples is even more likely to get us closer to the secret number. Also, this distribution now looks like the normal distribution.

Lastly, here's the result of sampling 10,000 batches of 100 samples each:

The trend continues; the more samples used to construct the sample mean, the more narrowly the distribution of the sample means centers on the secret number. In other words, the variance of the normal distribution of sample means decreases as the number of samples per mean increases.

We have now seen very clearly that the more samples we include in our sample mean, the more accurate our guess of the secret number.

We also observed that the plots progressively more closely resemble the PDF of a normal distribution centered around the population mean. This is no coincidence. Formally written, the Central Limit Theorem states:

lim⁡x→∞Xˉn=N(μ,1nσ2)\lim_{x \to \infty} \bar X_n = N(\mu, \frac{1}{n}\sigma^2)x→∞lim​Xˉn​=N(μ,n1​σ2)
  • Xˉn\bar X_nXˉn​is the sample mean over nnn samples drawn from random variable XXX

  • μ\muμ is the population mean of a random variable X

  • σ2\sigma^2σ2 is the variance of X

  • N(μ,1nσ2)N(\mu, \frac{1}{n}\sigma^2)N(μ,n1​σ2)is the normal distribution with mean μ\muμ and variance 1nσ2\frac{1}{n}\sigma^2n1​σ2

This means that as the number of samples included in the sample mean go to infinite, the sample mean converges to the normal distribution. This normal distribution has the same mean as XXX and has a variance equal to the variance of XXX scaled by a factor of 1n\frac{1}{n}n1​.

Incredibly enough, this holds true even if the distribution of X is not normally distributed. In the example that produced the graphs above, recall that the noise was distributed uniformly, not normally!

In the examples graphed above, we were always centered around the secret number of 4 . This is because the means of N(μ,1nσ2)N(\mu, \frac{1}{n}\sigma^2)N(μ,n1​σ2) is μ\muμ. Recall that the sample means became more tightly centered around 4 as we increased nnn. This is because the variance of N(μ,1nσ2)N(\mu, \frac{1}{n}\sigma^2)N(μ,n1​σ2) decreases as nnnincreases!

To finish up, here's a fun central limit theorem tidbit about a cow and a crowd.

The effect of increasing the accuracy of a measurement by increasing the number of measurements taken was exemplified by statistician Francis Galton in 1906. Galton compared the average guess of the weight of an ox over 787 random people to the guess of 1 highly qualified cattle expert. Not only was the crowd guess much more accurate than the expert's guess, but the crowd guess was only one pound off of the actual weight. This phenomenon is known as the 'wisdom of the crowd' and only holds when each person in the crowd makes their guess independently. This phenomenon is an example of the central limit theorem in action.