Opportunity Through Data Textbook
  • Opportunity Through Data Textbook
  • Introduction
    • What is Data Science?
    • Introduction to Data Science: Exploratory Musical Analysis
  • Module 1
    • Introduction to Programming
      • The Command Line
      • Installing Programs
      • Python and the Command Line
      • Jupyter Notebook
    • Introduction to Python
      • Building Blocks of Python - Data Types and Variables
      • Functions
      • Formatting and Syntax
    • Math Review
      • Variables and Functions
      • Intro to Graphs
  • Module 2
    • Data Structures
      • Lists
      • Dictionaries
      • Tables
    • Programming Logic
      • Loops
      • Logical Operators
      • Conditionality
  • Module 3
    • Introduction to Probability
      • Probability and Sampling
    • Introduction to Statistics
      • Mean & Variance
      • Causality & Randomness
  • Module 4
    • Packages
    • Intro to NumPy
      • NumPy (continued)
  • Module 5
    • Introduction to Pandas
      • Introduction to Dataframes
      • Groupby and Join
    • Working with Data
    • Data Visualization
      • Matplotlib
      • Introduction to Data Visualization
  • Appendix
    • Table Utilities
    • Area of More Complicated Shapes
    • Introduction to Counting
    • Slope and Distance
    • Short Circuiting
    • Linear Regression
    • Glossary
  • Extension: Classification
    • Classification
    • Test Sets and Training Sets
    • Nearest Neighbors
  • Extension: Introduction to SQL
    • Introduction to SQL
    • Table Operations
      • Tables and Queries
      • Joins
  • Extension: Central Limit Theorem
    • Overview
    • Probability Distributions
      • Bernoulli Distribution
      • Uniform Distribution (Discrete)
      • Random Variables, Expectation, Variance
      • Discrete and Continuous Distributions
      • Uniform Distribution (Continuous)
      • Normal Distribution
    • Central Limit Theorem in Action
    • Confidence Intervals
  • Extension: Object-Oriented Programming
    • Object-Oriented Programming
      • Classes
      • Instantiation
      • Dot Notation
      • Mutability
  • Extension: Introduction to Excel
    • Introduction to Excel
      • Terminology and Interface
      • Getting Started with Analysis and Charts
      • Basics of Manipulating Data
    • Additional Features in Excel
      • Macros
      • The Data Tab
      • Pivot Tables
Powered by GitBook
On this page

Was this helpful?

  1. Introduction

Introduction to Data Science: Exploratory Musical Analysis

PreviousWhat is Data Science?NextIntroduction to Programming

Last updated 4 years ago

Was this helpful?

We can find data about almost any subject and analyze it to find interesting patterns. We can then correlate these patterns with observations about society to discover new insights about the world we live in.

In this example we will be looking at how popular music has changed in the past 35 years. We are using a dataset that gives us information about the Billboard Top 100 songs since 1980. These songs have been chosen for their popularity measured in sales, radio airplay, and streaming. For each song that made the top 100 list every year, we have data about characteristics like its genre (hip hop, pop, rock, rap) and the kind of words it used (positive, negative). We can use this data to perform a little exploratory analysis and look at patterns in popular music over time.

In this first graph below we have plotted the number of hip hop songs that made it to the Top 100 over time. We can see from this graph that the largest number of hip hop songs were in the Top 100 in 2009. From this information, we could infer that hip hop was at its most popular in 2009.

The next three graphs below have plotted the number of pop, rock, and rap songs that made it to the Top 100 over time. Look at the graphs and think of a few patterns or interesting points that you notice.

These next two graphs show how the percent of positive or negative words in a song change over time. The general trend shows that as time has gone on, songs have become less positive and more negative. Does this surprise you? Why do you think this might be?

How has the usage of different words changed over time? Maybe we can get a little bit of this information by seeing how the words appear in songs over time. The word "you," for example, fluctuates between 1000 and 2000 uses over the 35-year time period, but because the number keeps fluctuating (as seen by the spiky nature of the graph), we can't gain that much information about it.

On the other hand, the word "like" has increased significantly over the past 35 years, perhaps as it has emerged as a filler word. This tells us that we've been saying "like" a lot more recently!

Data is powerful, and we can get a lot of information from it -- what other questions do you have about this dataset? How might you answer them? Using the tools you'll learn in this course, you'll soon be able to answer them!