Opportunity Through Data Textbook
  • Opportunity Through Data Textbook
  • Introduction
    • What is Data Science?
    • Introduction to Data Science: Exploratory Musical Analysis
  • Module 1
    • Introduction to Programming
      • The Command Line
      • Installing Programs
      • Python and the Command Line
      • Jupyter Notebook
    • Introduction to Python
      • Building Blocks of Python - Data Types and Variables
      • Functions
      • Formatting and Syntax
    • Math Review
      • Variables and Functions
      • Intro to Graphs
  • Module 2
    • Data Structures
      • Lists
      • Dictionaries
      • Tables
    • Programming Logic
      • Loops
      • Logical Operators
      • Conditionality
  • Module 3
    • Introduction to Probability
      • Probability and Sampling
    • Introduction to Statistics
      • Mean & Variance
      • Causality & Randomness
  • Module 4
    • Packages
    • Intro to NumPy
      • NumPy (continued)
  • Module 5
    • Introduction to Pandas
      • Introduction to Dataframes
      • Groupby and Join
    • Working with Data
    • Data Visualization
      • Matplotlib
      • Introduction to Data Visualization
  • Appendix
    • Table Utilities
    • Area of More Complicated Shapes
    • Introduction to Counting
    • Slope and Distance
    • Short Circuiting
    • Linear Regression
    • Glossary
  • Extension: Classification
    • Classification
    • Test Sets and Training Sets
    • Nearest Neighbors
  • Extension: Introduction to SQL
    • Introduction to SQL
    • Table Operations
      • Tables and Queries
      • Joins
  • Extension: Central Limit Theorem
    • Overview
    • Probability Distributions
      • Bernoulli Distribution
      • Uniform Distribution (Discrete)
      • Random Variables, Expectation, Variance
      • Discrete and Continuous Distributions
      • Uniform Distribution (Continuous)
      • Normal Distribution
    • Central Limit Theorem in Action
    • Confidence Intervals
  • Extension: Object-Oriented Programming
    • Object-Oriented Programming
      • Classes
      • Instantiation
      • Dot Notation
      • Mutability
  • Extension: Introduction to Excel
    • Introduction to Excel
      • Terminology and Interface
      • Getting Started with Analysis and Charts
      • Basics of Manipulating Data
    • Additional Features in Excel
      • Macros
      • The Data Tab
      • Pivot Tables
Powered by GitBook
On this page

Was this helpful?

  1. Module 2
  2. Data Structures

Tables

How do we store large amounts of data?

KEY TERMS

  • Table: a way to organize data using rows and columns

  • Row: each horizontal line in a table is a row (also known as an entry)

  • Column: each vertical line in a table is a column

  • Attribute: a characteristic of an entry that describes that particular entry. Each attribute corresponds to a column

Now that we know more about data types in Python and representing sequences of items, let’s see what they can be used for!

In data science, we are concerned with data as well as its organization. We want to have well-organized data that’s simple to read and understand. A common way to organize data is by using a table.

What is a table? Essentially, a table is a way to organize data in rows and columns. Rows run horizontally and columns run vertically. Across the top of a table, you’ll see the column labels, or the column names. Column labels are usually attributes that describe something about every entry. For example, if we decided to put the data from our phone_numbers dictionary into a table, it would look something like this:

Name

Phone Number

"Sam"

3431234098

"Daisy"

5672349876

"John"

8907654321

Looking at the rows of our table, we find that each row represents a one of our friends. In tables, each row is an entry. Moreover, we learn about two things about each friend: their name and their phone number, which are the friend's attributes. In tables, each entry is described by multiple attributes.

Looking at the columns of our table, we notice that every column has a label and values. Each column label, like 'Name' or 'Phone Number', is associated with the list of values in that column. In fact, every column label could be considered a key mapped to a list of values. This means we can represent a table using a dictionary with key: value pairs!

# this is the phone_numbers dictionary we've seen before
>>> phone_numbers
{'Sam': 3431234098, 'Daisy': 5672349876, 'John': 8907654321}

# this is how we might organize the same information in a table
>>> phone_numbers_table = {'Name': ['Sam', 'Daisy', 'John'], 
                   'Phone Number': [3431234098,5672349876,8907654321]}

Notice that phone_numbers and phone_numbers_table look very different -- both are dictionaries and contain exactly the same information, but we've changed how the data is organized. Instead of having each key: value pair in the dictionary represent one friend (ex. 'Sam' : 3431234098), the key: value pairs now represent columns in a table.

Suppose we want to get a list of all the friends that we want to call tonight, how would you manipulate the table? (Hint: a list of friends is a list of names) What about a list of all phone numbers?

To access one specific column in a table, we can run <table name>[<column name>]

>>> friends = phone_numbers["Name"]
>>> friends
['Sam', 'Daisy', 'John']
>>> numbers = phone_numbers["Phone Number"]
>>> numbers
[3431234098,5672349876,8907654321]

Now, if we wanted to find the phone number , how would we read the phone_numbers_table dictionary? To read a row in a table, we look across the columns. For example, reading the first row of a table means reading the first item in every column. We see that 'Daisy' is the second element in the column 'Name'. Therefore, the corresponding price can be found in the second element of the 'Phone Number' column.

# the values in the 'Phone Number' column of the table are associated with the key 'Price($)'
# we've assigned price to the list of values in the 'Price($)' column
>>> number = phone_number_table['Phone Number']
>>> number
[3431234098, 5672349876, 8907654321]

# Daisy is the second element of the 'Name' column
# so her corresponding phone number is also the second element of the 
# 'Phone Number' column.
# !! Remember that lists are zero indexed, 
# !! so the second element is at index 1
>>> numbers[1]
5672349876

# the following statement returns the same thing
>>> phone_numbers_table['Phone Number'][1]
5672349876

You meet a new friend, Mike, when waiting in line for Peet's Coffee, and you want to add them to your new table. How would you go about adding them to our phone_numbers_table?

In order to add Mike to our contact list, we need to add a new entry to our phone_numbers_table. An entry in our phone_numbers_table, as we discovered before, is described by two attributes ('Name'and'Phone Number'). Therefore, to add a new entry to a table is to add its attributes to the corresponding columns.

#to access the columns, take them out of the table
>>> names = phone_numbers_table['Name']
['Sam', 'Daisy', 'John']
>>> numbers = phone_numbers_table['Phone Number']
[3431234098, 5672349876, 8907654321]

#add the attributes of Mike to the columns
>>> names = ['Sam', 'Daisy', 'John', 'Mike']
>>> numbers = [3431234098, 5672349876, 8907654321, 5558801916]

#update the columns by putting them back in the table
>>> phone_numbers_table['Name'] = names
>>> phone_numbers_table['Phone Number'] = numbers

Another way to add a new entry to the table is to use the method .append()

# here's Mike
>>> new_name = 'Mike'
>>> new_number = 5558801916

# let's add the new name to the list of names
>>> phone_numbers_table['Name'].append(new_name)

# now let's add the new number to the list of numbers
>>> phone_numbers_table['Phone Number'].append(new_number)

# now student_table has a new entry
>>> phone_number_table
{'Name': ['Sam', 'Daisy', 'John', 'Mike'], 
'Phone Number': [3431234098, 5672349876, 8907654321, 5558801916]}

Now, we've realized that your once-friend John actually hates Peet's Coffee and you no longer want to call him ever again. In order to remove such entry, we can use the method del, which is short for delete.

# now we want to remove the entry for John
# John is described by the third item in each column

# let's remove peach from the list of fruits
>>> del phone_numbers_table['Name'][2]

# now let's remove John's number from the list of numbers
>>> del phone_numbers_table['Phone Number'][2]

# now a row has been removed from phone_numbers_table
>>> phone_numbers_table
{'Name': ['Sam', 'Daisy', 'Mike'], 
'Phone Number': [3431234098, 5672349876, 5558801916]}

Summary

  • Tables are a way to organize data in rows and columns. Each row represents a new entry. Each column represents an attribute of those entries. Every column has a column label.

  • A table can be represented by a dictionary. Each key: value pair represents one column where the key is the column label and the value is the list of values in that column.

PreviousDictionariesNextProgramming Logic

Last updated 4 years ago

Was this helpful?