Opportunity Through Data Textbook
  • Opportunity Through Data Textbook
  • Introduction
    • What is Data Science?
    • Introduction to Data Science: Exploratory Musical Analysis
  • Module 1
    • Introduction to Programming
      • The Command Line
      • Installing Programs
      • Python and the Command Line
      • Jupyter Notebook
    • Introduction to Python
      • Building Blocks of Python - Data Types and Variables
      • Functions
      • Formatting and Syntax
    • Math Review
      • Variables and Functions
      • Intro to Graphs
  • Module 2
    • Data Structures
      • Lists
      • Dictionaries
      • Tables
    • Programming Logic
      • Loops
      • Logical Operators
      • Conditionality
  • Module 3
    • Introduction to Probability
      • Probability and Sampling
    • Introduction to Statistics
      • Mean & Variance
      • Causality & Randomness
  • Module 4
    • Packages
    • Intro to NumPy
      • NumPy (continued)
  • Module 5
    • Introduction to Pandas
      • Introduction to Dataframes
      • Groupby and Join
    • Working with Data
    • Data Visualization
      • Matplotlib
      • Introduction to Data Visualization
  • Appendix
    • Table Utilities
    • Area of More Complicated Shapes
    • Introduction to Counting
    • Slope and Distance
    • Short Circuiting
    • Linear Regression
    • Glossary
  • Extension: Classification
    • Classification
    • Test Sets and Training Sets
    • Nearest Neighbors
  • Extension: Introduction to SQL
    • Introduction to SQL
    • Table Operations
      • Tables and Queries
      • Joins
  • Extension: Central Limit Theorem
    • Overview
    • Probability Distributions
      • Bernoulli Distribution
      • Uniform Distribution (Discrete)
      • Random Variables, Expectation, Variance
      • Discrete and Continuous Distributions
      • Uniform Distribution (Continuous)
      • Normal Distribution
    • Central Limit Theorem in Action
    • Confidence Intervals
  • Extension: Object-Oriented Programming
    • Object-Oriented Programming
      • Classes
      • Instantiation
      • Dot Notation
      • Mutability
  • Extension: Introduction to Excel
    • Introduction to Excel
      • Terminology and Interface
      • Getting Started with Analysis and Charts
      • Basics of Manipulating Data
    • Additional Features in Excel
      • Macros
      • The Data Tab
      • Pivot Tables
Powered by GitBook
On this page
  • Slopes
  • Slope Formula
  • Review of Functions and Equations
  • Slope-Intercept Formula
  • Distance
  • Formulas and Explanations
  • Practice Problems

Was this helpful?

  1. Appendix

Slope and Distance

How do we find connections between different values? This subsection serves as a refresher on concepts you might or might not remember from algebra, covering slope and the distance formula.

PreviousIntroduction to CountingNextShort Circuiting

Last updated 5 years ago

Was this helpful?

Key Terms:

Coordinate Plane: grid with x and y axis where points can be plotted

Slopes

Slope is a number that indicates the direction and steepness of something. We usually use slope to describe the steepness of a line seen on a coordinate plane.

Why is the idea of slope important? We will see that prediction and estimation make up a large portion of Data Science applications. For example, if we use xxx versus yyy grams of some medicine called A, how many people will be healed? We can use this idea of slope to define a relationship between two variables ( xxx and yyy ).

We use slope to estimate regressions (which we will learn more about these in a future chapter) and predict results based on certain settings. For example, we might want to predict people’s weights given their heights. As we discussed earlier, we can use slope to define a relationship between two variables. In this case, their heights (which are given/known) are the xxx values. We can set up a model using regressions to then predict the dependent variable (or a=ba = ba=b ), which represents the weight. This is called dependent, because the value that you put in for xxx will affect the value of yyy . In this way, there is a relationship between the two variables, and we will soon see that this relationship is defined as the slope.

Slope Formula

With larger positive values, numbers will increase much more quickly, whereas smaller positive values will increase much more slowly. In mirroring that, for smaller negative numbers (remember: smaller negative numbers are further from 0), numbers will decrease much more quickly, whereas larger negative numbers will decrease much more slowly.

Review of Functions and Equations

Functions, as we introduced in the programming section, are very similar to equations. Think back to our definitions of functions. When you input a value into a function, you get one output. This is the same for an equation! Equations are simply mathematical functions that deal with numbers.

def add_two(x):
    return x + 2

Don't worry too much on how to write the function. Rather, focus on what the function is doing. Now, if we were to run this program, we can see what our function would output.

>>> add_two(4)
6
>>> add_two(3)
5

Slope-Intercept Formula

-2

-4

-1

-1

0

2

1

5

2

8

Distance

Distance is a mathematical calculation of the space between two (coordinate) points. Let's say we drew a line connecting two points. With slope, we were calculating the steepness of the line. But with distance, we're calculating how long the line is.

Although there are many different ways to calculate distance, we mostly will use the “Euclidean Distance,” which is just a fancy term for the type of distance you are probably most familiar with.

Why does this matter? We will use this in countless applications throughout data science. Most likely, we will be using the distance formula in calculating the closest points in K-nearest neighbors (link).

Formulas and Explanations

Sanity check: Distance should always be positive! (Why?)

Here's how we can calculate the distance between two points using a function in Python!

def distance(x1, x2, y1, y2):
    x_squared = (x2-x1) ** 2
    y_squared = (y2-y1) ** 2
    return math.sqrt(x_squared + y_squared)

Practice Problems

  1. Try to explain distance and slope to a friend or peer.

  2. -2

    -1

    0

    1

    2

Now that we know what slope means, how can we calculate it? This is the formula for how to get the slope between two points: (x1,y1) and (x2,y2)(x_1, y_1) \space and \space (x_2, y_2)(x1​,y1​) and (x2​,y2​), which you may have heard of as "rise over run." Think of it as calculating the steepness of a line if you had two points and drew a line connecting them.

slope=riserun=ΔyΔx=y2−y1x2−x1slope = \frac{rise}{run} = \frac{\Delta y}{\Delta x} = \frac{y_2 - y_1}{x_2-x_1} slope=runrise​=ΔxΔy​=x2​−x1​y2​−y1​​

The equation above gives us a fraction, but we should always try and simplify our fraction. For example, instead of saying that we have a slope of ​ 62\frac{6}{2} 26​, we should further simplify that to 333 .

We can also represent mathematical equations through code! Let's take the equation y=x+2y = x + 2y=x+2. How can we write a function that takes in an input ( xxx ), and returns an output ( yyy ), that is equal to the input value + 2+\space 2+ 2 ?

Notice that when we plug in 333 as the xxx-value to the equation y=x+2y = x+ 2y=x+2, we will always get the answer y=5y = 5y=5 as the output. In this equation, we will get different outputs no matter what number we input. But, equations can also produce the same output for completely different input values. Try and think of an example of an equation that does this.

One possible example is y=x2y = x^2y=x2. x=−2, x=2x=-2, \space x =2x=−2, x=2 both have y=4y = 4y=4 as the output. What about y=0∗xy = 0 * xy=0∗x? Notice that for this function, no matter what value of xxx you plug in, you will get y=0y = 0y=0 as the output.

The slope-intercept formula might sound fancy, but it's just another way for us to represent the same equations we've seen so far. In the following equation, yyy is the output number that comes from plugging in an xxx value, multiplying that by our slope mmm and shifting that by bbb units on the y-axis. bbb is also known as the y-intercept, where the line hits the y-axis.

y=mx+by = mx + by=mx+b

By plugging in values to this formula, we can see that for y=3x+2y = 3x + 2y=3x+2, if we plug in 000 as our xxx value, the corresponding yy y value is 222 . Likewise, x=1x = 1x=1 results in y=5y = 5y=5. Below is a table that shows the corresponding values for this equation. Notice that this equation goes on infinitely, but we've only shown 5 inputs of x in the table. This means that if you plugged in a really big or really small number to the equation (say 1000000000000), you would still get an output value. There are no restrictions for where the equation stops! However, some equations do have certain restrictions, called bounds. If any of these are confusing, try calculating one by hand or drawing out what the line should look like.

Value

Value

To calculate the (shortest) distance between two coordinate points(x1,y1)(x_1, y_1)(x1​,y1​) and (x2,y2x_2, y_2x2​,y2​), we use the following formula:

distance=(x2−x1)2+(y2−y1)2distance =\sqrt{(x_2-x_1)^2 + (y_2-y_1)^2}distance=(x2​−x1​)2+(y2​−y1​)2​

While this formula may look complicated, it comes from a well-known mathematical formula known as Pythagorean Theorem (a2+b2=c2a^2 + b^2 = c^2a2+b2=c2). This theorem is used to find the length of one side of a triangle, given the other two. Finding the length of a side of a triangle is the same as finding the distance between two points, since the side of a triangle is just a line. Here is a graphical interpretation:

This might look complicated, but let's break it down. The variable x_squared represents (x2−x1)2(x_2-x_1)^2 (x2​−x1​)2, and y_squared represents (y2−y1)2(y_2-y_1)^2 (y2​−y1​)2. The two star symbols (**) represents the power sign in Python. So, 3**2 just means 3 squared. Then, we take the square root of the two variables added together. math.sqrt() is a built-in function (remember those?) that computes the square root of something for us, so that we don't have to do all of that math manually.

Given y=−2x+3y = -2x+3y=−2x+3 , complete this table:

Value

Value

Find the slope between (3,−9)(3, -9)(3,−9)and (−5,5)(-5, 5)(−5,5), then see where it intersects the yyy-axis, then use all of these results to construct a slope-intercept equation. Repeat for (1,2)(1, 2)(1,2) and (3,4)(3,4)(3,4).

True or False: Do (2,1),(−4,13)(2,1), (-4, 13)(2,1),(−4,13) and (−11,1),(1,−19)(-11, 1),(1, -19)(−11,1),(1,−19) have the same slope?

True or False: Do (−1,−2),(1,2)(-1, -2), (1,2)(−1,−2),(1,2) and (0,4),(−2,0)(0,4), (-2,0)(0,4),(−2,0) have the same slope and intercept? (Draw out these two lines, do you see anything interesting?)

What is the distance between (3,−9)(3, -9)(3,−9)and (−5,5)(-5, 5)(−5,5)?

Which two points are closest in: (−1,−2),(2,−3),(1,2),(−2,3)(-1, -2), (2, -3), (1, 2), (-2, 3)(−1,−2),(2,−3),(1,2),(−2,3)?

xxx
yyy
xxx
yyy
Visual Representation of Different Slopes
Visualization of Slope and Intercept
Translating Pythagorean Theorem to Distance Formula