Opportunity Through Data Textbook
  • Opportunity Through Data Textbook
  • Introduction
    • What is Data Science?
    • Introduction to Data Science: Exploratory Musical Analysis
  • Module 1
    • Introduction to Programming
      • The Command Line
      • Installing Programs
      • Python and the Command Line
      • Jupyter Notebook
    • Introduction to Python
      • Building Blocks of Python - Data Types and Variables
      • Functions
      • Formatting and Syntax
    • Math Review
      • Variables and Functions
      • Intro to Graphs
  • Module 2
    • Data Structures
      • Lists
      • Dictionaries
      • Tables
    • Programming Logic
      • Loops
      • Logical Operators
      • Conditionality
  • Module 3
    • Introduction to Probability
      • Probability and Sampling
    • Introduction to Statistics
      • Mean & Variance
      • Causality & Randomness
  • Module 4
    • Packages
    • Intro to NumPy
      • NumPy (continued)
  • Module 5
    • Introduction to Pandas
      • Introduction to Dataframes
      • Groupby and Join
    • Working with Data
    • Data Visualization
      • Matplotlib
      • Introduction to Data Visualization
  • Appendix
    • Table Utilities
    • Area of More Complicated Shapes
    • Introduction to Counting
    • Slope and Distance
    • Short Circuiting
    • Linear Regression
    • Glossary
  • Extension: Classification
    • Classification
    • Test Sets and Training Sets
    • Nearest Neighbors
  • Extension: Introduction to SQL
    • Introduction to SQL
    • Table Operations
      • Tables and Queries
      • Joins
  • Extension: Central Limit Theorem
    • Overview
    • Probability Distributions
      • Bernoulli Distribution
      • Uniform Distribution (Discrete)
      • Random Variables, Expectation, Variance
      • Discrete and Continuous Distributions
      • Uniform Distribution (Continuous)
      • Normal Distribution
    • Central Limit Theorem in Action
    • Confidence Intervals
  • Extension: Object-Oriented Programming
    • Object-Oriented Programming
      • Classes
      • Instantiation
      • Dot Notation
      • Mutability
  • Extension: Introduction to Excel
    • Introduction to Excel
      • Terminology and Interface
      • Getting Started with Analysis and Charts
      • Basics of Manipulating Data
    • Additional Features in Excel
      • Macros
      • The Data Tab
      • Pivot Tables
Powered by GitBook
On this page

Was this helpful?

  1. Extension: Central Limit Theorem

Probability Distributions

What is a probability distribution?

In module 3, we learned about probabilities, events, and outcome spaces. A probability distribution puts these three concepts together. Each probability distribution is defined upon an outcome space and assigns a probability to each event in this outcome space. Furthermore,

1) The sum of the probabilities over all events must equal 1

2) The probability of each event must be greater than or equal to 0

Probability distributions are used to model systems that have some element of randomness to them. Such models are designed either using observations of the system or physical laws. These models can help you better understand the system and make predictions about what will happen next in the system.

Define a probability distribution for whether it will be rainy or sunny tomorrow. If we wanted this model to be useful, we would need to use real-world data, but for this question, just make a guess!

Bonus: how many different probability distributions are there?

The outcome space is {rainy, sunny}.

The probabilities of each event in the outcome space must sum to 1.

One possible probability distribution is a 0.75 chance of sun and a 0.25 chance of rain. These probabilities sum to 1 and are each nonnegative.

There are an infinite number of probability distributions you can define for this outcome space; any two nonnegative numbers that sum to 1 can be the probabilities of sun and rain.

When modeling a real-world system, we want to use the probability distribution that most accurately captures the nature of the system. Taking a guess, as we did in the above question, isn't going to cut it. There are many techniques to design a probability distribution that are more accurate than guessing. You can learn more about these techniques by studying statistics and machine learning!

While it can be very difficult to accurately define the probability distribution for some real-world systems, such as the weather, other real-world systems can be modeled using one of several common probability distributions.

The following four probability distributions are some, but not all, of these common distributions. Learning about these distributions is really exciting because it can change the way you think about randomness in the world! We will cover the Bernoulli, discrete uniform, continuous uniform, and normal distributions.

PreviousOverviewNextBernoulli Distribution

Last updated 4 years ago

Was this helpful?