Opportunity Through Data Textbook
  • Opportunity Through Data Textbook
  • Introduction
    • What is Data Science?
    • Introduction to Data Science: Exploratory Musical Analysis
  • Module 1
    • Introduction to Programming
      • The Command Line
      • Installing Programs
      • Python and the Command Line
      • Jupyter Notebook
    • Introduction to Python
      • Building Blocks of Python - Data Types and Variables
      • Functions
      • Formatting and Syntax
    • Math Review
      • Variables and Functions
      • Intro to Graphs
  • Module 2
    • Data Structures
      • Lists
      • Dictionaries
      • Tables
    • Programming Logic
      • Loops
      • Logical Operators
      • Conditionality
  • Module 3
    • Introduction to Probability
      • Probability and Sampling
    • Introduction to Statistics
      • Mean & Variance
      • Causality & Randomness
  • Module 4
    • Packages
    • Intro to NumPy
      • NumPy (continued)
  • Module 5
    • Introduction to Pandas
      • Introduction to Dataframes
      • Groupby and Join
    • Working with Data
    • Data Visualization
      • Matplotlib
      • Introduction to Data Visualization
  • Appendix
    • Table Utilities
    • Area of More Complicated Shapes
    • Introduction to Counting
    • Slope and Distance
    • Short Circuiting
    • Linear Regression
    • Glossary
  • Extension: Classification
    • Classification
    • Test Sets and Training Sets
    • Nearest Neighbors
  • Extension: Introduction to SQL
    • Introduction to SQL
    • Table Operations
      • Tables and Queries
      • Joins
  • Extension: Central Limit Theorem
    • Overview
    • Probability Distributions
      • Bernoulli Distribution
      • Uniform Distribution (Discrete)
      • Random Variables, Expectation, Variance
      • Discrete and Continuous Distributions
      • Uniform Distribution (Continuous)
      • Normal Distribution
    • Central Limit Theorem in Action
    • Confidence Intervals
  • Extension: Object-Oriented Programming
    • Object-Oriented Programming
      • Classes
      • Instantiation
      • Dot Notation
      • Mutability
  • Extension: Introduction to Excel
    • Introduction to Excel
      • Terminology and Interface
      • Getting Started with Analysis and Charts
      • Basics of Manipulating Data
    • Additional Features in Excel
      • Macros
      • The Data Tab
      • Pivot Tables
Powered by GitBook
On this page

Was this helpful?

  1. Extension: Central Limit Theorem
  2. Probability Distributions

Discrete and Continuous Distributions

PreviousRandom Variables, Expectation, VarianceNextUniform Distribution (Continuous)

Last updated 4 years ago

Was this helpful?

Both the Bernoulli and Uniform distributions previously covered are classified as discrete probability distributions because their outcome spaces are defined on a countable number of events. Discrete probability distributions are commonly defined on integers, meaning they do not to take on fractional values.

In contrast to discrete probability distributions, there are continuous probability distributions. Continuous distributions are defined on outcome spaces of uncountable values. You can think of uncountable values as the continuous range of values, including rational and irrational numbers, along a number line. A continuous random variable can take on any value between any other two values included in the outcome space.

When defining a continuous probability distribution, it is not possible to set nonzero probabilities for each event that all sum to 1. Roughly speaking, this is because there are an infinite number of events, meaning that regardless of how small you make the probabilities, they will sum to a large quantity. Because of this, the probability that a continuous distribution will take on an exact value is always zero.

Recall that we characterize discrete probability distributions by specifying the probability of each possible event. Because continuous distributions assign a probability of zero to every event, we need another way to characterize continuous probability distributions.

This is where probability density functions (PDF) come into the picture. PDFs give relative likelihoods of exact values, rather than their absolute values, which are all zero. This means the PDF lets you compare the likelihoods of specific values with each other, but it does not give the actual probability any of those values. Importantly, the area under the PDF curve is always equal to 1.

Here is a plot of the PDF of a continuous random variable that follows what is called a "normal distribution":

Note that the values of the PDF for each value of X do not sum to 1, but still show which values of X are more likely than others. For instance, X=0 is more likely than X=2.

Mathematically defining the PDF requires learning about the cumulative density function (CDF), which will not be covered in this section. Fully defining the PDF and CDF requires high school level calculus; as you may have noticed, the requirement of the area under the PDF curve being equal to 1 can be checked by ensuring the integral of the PDF is equal to 1. You are highly encouraged to pursue these concepts beyond this section.

PDF for the normal distribution with a mean of 0 and a standard deviation of 1.