Opportunity Through Data Textbook
  • Opportunity Through Data Textbook
  • Introduction
    • What is Data Science?
    • Introduction to Data Science: Exploratory Musical Analysis
  • Module 1
    • Introduction to Programming
      • The Command Line
      • Installing Programs
      • Python and the Command Line
      • Jupyter Notebook
    • Introduction to Python
      • Building Blocks of Python - Data Types and Variables
      • Functions
      • Formatting and Syntax
    • Math Review
      • Variables and Functions
      • Intro to Graphs
  • Module 2
    • Data Structures
      • Lists
      • Dictionaries
      • Tables
    • Programming Logic
      • Loops
      • Logical Operators
      • Conditionality
  • Module 3
    • Introduction to Probability
      • Probability and Sampling
    • Introduction to Statistics
      • Mean & Variance
      • Causality & Randomness
  • Module 4
    • Packages
    • Intro to NumPy
      • NumPy (continued)
  • Module 5
    • Introduction to Pandas
      • Introduction to Dataframes
      • Groupby and Join
    • Working with Data
    • Data Visualization
      • Matplotlib
      • Introduction to Data Visualization
  • Appendix
    • Table Utilities
    • Area of More Complicated Shapes
    • Introduction to Counting
    • Slope and Distance
    • Short Circuiting
    • Linear Regression
    • Glossary
  • Extension: Classification
    • Classification
    • Test Sets and Training Sets
    • Nearest Neighbors
  • Extension: Introduction to SQL
    • Introduction to SQL
    • Table Operations
      • Tables and Queries
      • Joins
  • Extension: Central Limit Theorem
    • Overview
    • Probability Distributions
      • Bernoulli Distribution
      • Uniform Distribution (Discrete)
      • Random Variables, Expectation, Variance
      • Discrete and Continuous Distributions
      • Uniform Distribution (Continuous)
      • Normal Distribution
    • Central Limit Theorem in Action
    • Confidence Intervals
  • Extension: Object-Oriented Programming
    • Object-Oriented Programming
      • Classes
      • Instantiation
      • Dot Notation
      • Mutability
  • Extension: Introduction to Excel
    • Introduction to Excel
      • Terminology and Interface
      • Getting Started with Analysis and Charts
      • Basics of Manipulating Data
    • Additional Features in Excel
      • Macros
      • The Data Tab
      • Pivot Tables
Powered by GitBook
On this page

Was this helpful?

  1. Extension: Central Limit Theorem
  2. Probability Distributions

Random Variables, Expectation, Variance

Review of Module 3, Introduction to Probability

Now that we know a bit about probability distributions, we can discuss the concepts of a random variables, expectation, and variance. While these concepts apply to all probability distributions, we will discuss each one in the context of the Bernoulli coin flip example, where we win 1 point for flipping heads and 0 points for flipping tails on a fair coin.

Random variable

A random variable represents the outcome of drawing once from the probability distribution upon which it is defined.

In our example, let the number of points we win from a fair coin flip be represented by the random variable, X. X is equal to 1 point with 0.5 chance and 0 points with 0.5 chance. Random variables are denoted by capital letters.

Expectation

The expectation of a random variable is the mean (or average) value the random variable takes, according to its probability distribution. In other words, it is a weighted average of the possible values of a random variable, where each value is weighted by its probability. Here is the expectation of a Bernoulli random variable.

E(X)=0∗(1−p)+1∗p=pE(X) = 0*(1-p) + 1*p = pE(X)=0∗(1−p)+1∗p=p

In our example, we win 1 point if we flip heads and 0 points if we flip tails on our fair coin. What is the expected number of points won in our coin flip example?

The expectation is 0.5*1 + 0.5 *0 = 0.5 points because half of the time we win 1 point and the other half of the time we win 0 points.

Variance

The variance of a random variable is a measurement of how much the outcomes of the random variable differ from its expectation. It is defined as the expectation over the squared differences of each outcome from the mean. Written mathematically, the definition of variance for a Bernoulli random variable is:

var(X)=(1−p)∗(0−E(X))2+p∗(1−E(X))2var(X) = (1-p)*(0 - E(X))^2 + p*(1-E(X))^2var(X)=(1−p)∗(0−E(X))2+p∗(1−E(X))2

What is the variance of points won in our coin flip example?

0.5*(0-0.5)^2 + 0.5*(1-0.5)^2 = 0.25

Standard Deviation

The standard deviation is the square root of the variance. Note that calculating the variance involves squaring the values of the random variable; this means that the variance is in the squared units of the random variable. We use the standard deviation to discuss variance in the same units as the random variable.

PreviousUniform Distribution (Discrete)NextDiscrete and Continuous Distributions

Last updated 4 years ago

Was this helpful?