Opportunity Through Data Textbook
  • Opportunity Through Data Textbook
  • Introduction
    • What is Data Science?
    • Introduction to Data Science: Exploratory Musical Analysis
  • Module 1
    • Introduction to Programming
      • The Command Line
      • Installing Programs
      • Python and the Command Line
      • Jupyter Notebook
    • Introduction to Python
      • Building Blocks of Python - Data Types and Variables
      • Functions
      • Formatting and Syntax
    • Math Review
      • Variables and Functions
      • Intro to Graphs
  • Module 2
    • Data Structures
      • Lists
      • Dictionaries
      • Tables
    • Programming Logic
      • Loops
      • Logical Operators
      • Conditionality
  • Module 3
    • Introduction to Probability
      • Probability and Sampling
    • Introduction to Statistics
      • Mean & Variance
      • Causality & Randomness
  • Module 4
    • Packages
    • Intro to NumPy
      • NumPy (continued)
  • Module 5
    • Introduction to Pandas
      • Introduction to Dataframes
      • Groupby and Join
    • Working with Data
    • Data Visualization
      • Matplotlib
      • Introduction to Data Visualization
  • Appendix
    • Table Utilities
    • Area of More Complicated Shapes
    • Introduction to Counting
    • Slope and Distance
    • Short Circuiting
    • Linear Regression
    • Glossary
  • Extension: Classification
    • Classification
    • Test Sets and Training Sets
    • Nearest Neighbors
  • Extension: Introduction to SQL
    • Introduction to SQL
    • Table Operations
      • Tables and Queries
      • Joins
  • Extension: Central Limit Theorem
    • Overview
    • Probability Distributions
      • Bernoulli Distribution
      • Uniform Distribution (Discrete)
      • Random Variables, Expectation, Variance
      • Discrete and Continuous Distributions
      • Uniform Distribution (Continuous)
      • Normal Distribution
    • Central Limit Theorem in Action
    • Confidence Intervals
  • Extension: Object-Oriented Programming
    • Object-Oriented Programming
      • Classes
      • Instantiation
      • Dot Notation
      • Mutability
  • Extension: Introduction to Excel
    • Introduction to Excel
      • Terminology and Interface
      • Getting Started with Analysis and Charts
      • Basics of Manipulating Data
    • Additional Features in Excel
      • Macros
      • The Data Tab
      • Pivot Tables
Powered by GitBook
On this page

Was this helpful?

  1. Extension: Classification

Classification

An introduction to classification.

PreviousGlossaryNextTest Sets and Training Sets

Last updated 4 years ago

Was this helpful?

Introduction

‌How do you decide when to turn on the lights in your room?

You take in the world around you — can you see your bed? Your desk? Are you able to read? All of this information is data, and you are making a decision based off of this data. The more data you have, the better you are at making decisions.‌

In the same way, you can teach computers how to use data to make decisions. Given enough information, a computer might be able to tell you when to turn on the lights, or even how to drive a car!‌

One way a computer can use information to make decisions is through classification. In classification, you are trying to sort objects into groups based on characteristics they have. For example, given a bunch of fruits, how do you tell which ones are which? You might differentiate oranges based on the color of their skin, or the rough texture. Apples have smooth, shiny skin, and are very firm. On the other hand, peaches are soft and fuzzy. Using this, you are able to classify any new fruits you receive by figuring out what fruit it is.‌

In the same way, you can teach a computer to make decisions like this. If you provide a computer with a table containing characteristics of fruits, it might look like this:

fruit_type

skin_texture

skin_color

shape

peach

fuzzy

pink

round

apple

smooth

red

oblong

orange

rough

orange

round

apple

smooth

yellow

round‌

Now, given a mystery fruit, where you have all the information but its fruit type, can you tell which category it belongs in?

We call a computer program that can determine what category something belongs in a classifier.