Opportunity Through Data Textbook
  • Opportunity Through Data Textbook
  • Introduction
    • What is Data Science?
    • Introduction to Data Science: Exploratory Musical Analysis
  • Module 1
    • Introduction to Programming
      • The Command Line
      • Installing Programs
      • Python and the Command Line
      • Jupyter Notebook
    • Introduction to Python
      • Building Blocks of Python - Data Types and Variables
      • Functions
      • Formatting and Syntax
    • Math Review
      • Variables and Functions
      • Intro to Graphs
  • Module 2
    • Data Structures
      • Lists
      • Dictionaries
      • Tables
    • Programming Logic
      • Loops
      • Logical Operators
      • Conditionality
  • Module 3
    • Introduction to Probability
      • Probability and Sampling
    • Introduction to Statistics
      • Mean & Variance
      • Causality & Randomness
  • Module 4
    • Packages
    • Intro to NumPy
      • NumPy (continued)
  • Module 5
    • Introduction to Pandas
      • Introduction to Dataframes
      • Groupby and Join
    • Working with Data
    • Data Visualization
      • Matplotlib
      • Introduction to Data Visualization
  • Appendix
    • Table Utilities
    • Area of More Complicated Shapes
    • Introduction to Counting
    • Slope and Distance
    • Short Circuiting
    • Linear Regression
    • Glossary
  • Extension: Classification
    • Classification
    • Test Sets and Training Sets
    • Nearest Neighbors
  • Extension: Introduction to SQL
    • Introduction to SQL
    • Table Operations
      • Tables and Queries
      • Joins
  • Extension: Central Limit Theorem
    • Overview
    • Probability Distributions
      • Bernoulli Distribution
      • Uniform Distribution (Discrete)
      • Random Variables, Expectation, Variance
      • Discrete and Continuous Distributions
      • Uniform Distribution (Continuous)
      • Normal Distribution
    • Central Limit Theorem in Action
    • Confidence Intervals
  • Extension: Object-Oriented Programming
    • Object-Oriented Programming
      • Classes
      • Instantiation
      • Dot Notation
      • Mutability
  • Extension: Introduction to Excel
    • Introduction to Excel
      • Terminology and Interface
      • Getting Started with Analysis and Charts
      • Basics of Manipulating Data
    • Additional Features in Excel
      • Macros
      • The Data Tab
      • Pivot Tables
Powered by GitBook
On this page
  • Overlaying Graphs
  • Area Principle
  • Style Guide

Was this helpful?

  1. Module 5
  2. Data Visualization

Introduction to Data Visualization

What constitutes an effective visualization?

PreviousMatplotlibNextTable Utilities

Last updated 4 years ago

Was this helpful?

Earlier, we talked about how data science helps us find the answers to questions using sets of information gathered from the world. As a data scientist, a part of our job is to communicate these answers effectively. This is where data visualizations come in handy - they can help us present a lot of information quickly, effectively and in an engaging manner.

Here is a simple example of the usefulness of data visualizations. The following table shows ice cream preferences in a class of 10 people.

Ice Cream Flavor

Number of People

Chocolate

4

Strawberry

3

Vanilla

3

Below is a bar graph of the above data.

Do you prefer the table or the graph?

Let's take a look at another example. Here is a table of drivers that were stopped by the police who were subsequently searched.

As we can see, all this information is hard to digest at once. We use a bar graph to represent information from the parts of the table we want to convey.

This bar graph reveals that Black and Hispanic drivers are more likely to be searched than White drivers, demonstrating bias.

Your choice of data visualization depends on what you want to represent and the information you want to convey. For example if you want to compare values within a dataset, you might want to use a bar graph (as above) or a line graph. However, if you want to show the composition of something, you might want to use a pie chart. Below is a pie chart showing how many men and women are in Congress. As you can see, we have quite a way to go before women are properly represented!

If we want to look at trends or possible correlations we can use scatter plots. The one shown below illustrates a correlation between the years of education and income.

You can also compare the trends between classes by overlaying graphs. The example below compares the total number of cases pf COVID-19 overtime among different age groups.

Overlaying Graphs

If we want to compare quantities in two or more cases, you can use an overlaid bar graph. The following graph shows the comparison of median weekly earnings between men and women by race.

The bar graph is better suited to our analysis than a scatter plot in this case because of two reasons:

  • We are more interested in comparing quantities than exploring or determining trends or correlation

  • Since the bar graph compares on a month to month basis, our x-axis has labels rather than numerical fields which is a requirement for a scatter plot.

As you can see different types on data require a variety of data visualizations depending on the aim of the exploration, types of data available and what we hope for the visualization to communicate.

Area Principle

The area principle states that the area of the graph must equal the amount of data it's representing.

Although this graph looks cool in 3D, it actually violates the area principle because the area of the bars does not reflect the data it's representing. The 3D aspect throws it off.

Style Guide

In terms of style, it is recommended to adhere to the following guidelines:

  • Use consistent colors on the graphs, especially when trying to illustrate changes over time.

  • Use horizontal labels on the X-axis to improve readability.

  • Start the Y-axis at 0 and ensure a uniform scale to prevent your graph from being misleading.

Source: https://www.bjs.gov/content/pub/pdf/pbtss11.pdf
Source: https://www.pewresearch.org/fact-tank/2021/01/15/a-record-number-of-women-are-serving-in-the-117th-congress/
Source: http://www.texample.net/tikz/examples/scatterplot/
https://www.cdc.gov/mmwr/volumes/69/wr/mm695152a8.htm
Source: https://www.bls.gov/opub/reports/womens-earnings/2018/pdf/home.pdf