Mean & Variance

Mean

The mean, also called the "average" or the "expected value", is a measure of the central value of a set of numbers.

Formula

There are 2 formulas that can be used to calculate the mean. You may be familiar with the first formula:

\textrm{mean} = \frac{\textrm{sum(elements in the set you are calcualting the mean of)}}{\textrm{number of elements in the set}}

The second formula, which we will later show is the same as the first is:

\textrm{mean} = \textrm{sum}(element\cdot P(element) \textrm{ for each element in set})

where $element$ is a number in the set you are calculating the mean of and $P(element)$ is the probability that when you choose a random element in the set, the result is $element$ .

For example, I asked a number of people how many hours of sleep they get per night. I have put my results in the following table:

Person

Number of Hours of Sleep per Night

5.5

6.5

I want to find the mean number of hours of sleep per night the people in my sample get.

Using the first definition, the average number of hours of sleep a person in my sample gets per night is:

\frac{8 +6 + 5.5 + 11 + 7 + 8 + 6.5 + 8}{8} = 7.5

Notice that the mean, 7.5, is not a number that was in my set of numbers. The mean does not have to be a probable or possible value of a particular set of numbers. For example, I could have surveyed some people and asked them how many pairs of shoes they owned. I may find the mean to be 2.3. It is not possible to own 2.3 pairs of shoes, but it is possible that this is the average number of shoes a set of people owns.

Also notice that the mean in my example is in between the smallest and greatest values in my set, 5.5 and 11 respectively, but it is not halfway in between them, 8.75. This is true in general. The mean is in between the minimum and maximum values in a set, but it is not necessarily halfway in between them.

Let's now calculate the mean using the second definition. To do this we need to know the probability that when you choose a random element in the set, it is a particular value in the set. To find this probability I will sort my data:

Number of Hours of Sleep

Number of People

5.5

6.5

Now, recall that probability is defined as $\frac{\textrm{total number of ways something occurs}}{\textrm{total number of possible outcomes}}$ . In this case, our number of ways something occurs is the number of people who get a particular amount of sleep and total number of possible outcomes is the total number of people I surveyed.

Number of Hours of Sleep

Number of People

Probability a Person Got x Hours of Sleep

5.5

$\frac{1}{8}$

6.5

$\frac{1}{8}$

$\frac{3}{8}$

$\frac{1}{8}$

Note also that my probabilities add up to 1, as they should.

Now using the second formula:

\textrm{mean} = \Sigma n\cdot P(n) = 5.5 \cdot \frac{1}{8} + 6\cdot \frac{1}{8} + 6.5\cdot \frac{1}{8}+7 \cdot \frac{1}{8}+ 8\cdot \frac{3}{8} + 11\cdot \frac{1}{8} \newline \\[.3 in] = \frac{5.5 + 6 + 6.5+ 7 + 8 + 8 + 8 + 11}{8} \\[.3 in] = 7.5

This is the same result as with the first formula.

Basic Properties of the Mean

As we noticed in our calculations above, there are some basic properties of the mean. Below are three important facts about the mean that can help us better understand it.

the mean doesn't have to be an element of the collection. For example, the mean of {1, 2, 3, 4} = 2.5, 2.5 is not an element in collection {1, 2, 3, 4}.
the mean is always between the smallest and the largest elements in the collection. For instance, the mean of {1, 2, 2, 3, 7} = 3; 3 is greater than 1 and smaller than 7.
If the collection consists of elements measured in specified units, the mean then has the same units.

Variance

Suppose I have two sets of numbers:

Set 1 = {4, 5, 5, 5, 6} $\newline$ Set 2 = {-10, 1, 3, 10, 21}

The mean of each of these sets is the same, 5, but set 2 is more spread out than set 1.

The variance is a value that quantifies how spread out a set is.

Variance is the mean of the deviations from average squared.

What does this mean? Let's take Set 1 = {4, 5, 5, 5, 6} as an example. We are told above (and hopefully you have confirmed by calculation) that the mean of this set is 5. We can define an element's deviation as the difference between it and the average of the set. So, the deviations for Set 1 would be {-1, 0, 0, 0,1} since 4 - 5 = -1, 5 - 5 =0, and 6 - 5 = 1. But if we were to take the average of these deviations, we would get 0.

Now let's take Set 2 = {-10, 1, 3, 10, 21} and find the deviations. We would get {-15, -4, -2, 5, 16}. Again, if we take the average of these deviations we will get 0.

This is not a coincidence, but will always be true. The sum of the deviations of a set from the average of a set will always be 0.

Since we want to measure how spread out a set is, just taking the average of the deviations is not helpful to us - we will always get 0 no matter how spread out our set is. Instead, we can square the deviations, which will turn them positive, and then take the average! This is a number we call the variance of a set of numbers, representing how varied (or different!) they are.

In summary, to find the variance of a set of numbers:

Find the mean of the set of numbers
For each number in the set, subtract it from the mean and square it (find the deviation squared)
Find the mean of the squared numbers from step 2

However, variance gives us weird units. It will give us our original unit squared. If we were measuring the show size of a population, the variance would be in units of show size squared. This is unnatural to work with. Instead, we can define the standard deviation of a set of numbers as the square root of variance; using the standard deviation instead of the variance will give us the correct units.

To find the standard deviation of a set of numbers:

Find the mean of the set of numbers
For each number in the set subtract it from the mean and square it
Find the mean of the squared numbers from step 2
Take the square root of the result of step 3

The standard deviation expressed as a mathematical formula is:

\mathrm{stdev}= \sqrt\frac{\mathrm{sum}((x_i - \mu)^2)}{N}

where $\mathrm{stdev}$ is the standard deviation, $x_i$ represents the $i\textrm{th}$ element in the set, $\mu$ is the average of the set of numbers, and N is the number of elements in the set.

Let's try it with Set 1 and Set 2:

For Set 1:

1. We have already found the mean is 5

Number

Difference from Mean

Difference Squared

-1

3. The mean of {1, 0, 0, 0,1} is $\frac{2}{5}$ or .40.

4. $\sqrt{.4} \approx .632$

The standard deviation for Set 1 is .632.

For Set 2:

1. The mean of set 2 is 5

Number

Difference from Mean

Difference Squared

-15

225

-4

-2

256

3. The mean of {225, 16, 4, 25, 256} = 526/5 = 105.2

4. $\sqrt{105.2} = 10.26$

The standard deviation for set 2 is 10.26.

Thus, the much larger spread of numbers in Set 2 than in Set 1 translates to a much larger standard deviation for Set 2 than for Set 1!

A measurement is often expressed in the form mean $\pm$ a few standard deviations. When we make a statement like this, we are expressing that a set of measurements has an average value and how large the spread of the data is from that average value.

PreviousIntroduction to Statistics NextCausality & Randomness

Last updated 4 years ago

Was this helpful?