Opportunity Through Data Textbook
  • Opportunity Through Data Textbook
  • Introduction
    • What is Data Science?
    • Introduction to Data Science: Exploratory Musical Analysis
  • Module 1
    • Introduction to Programming
      • The Command Line
      • Installing Programs
      • Python and the Command Line
      • Jupyter Notebook
    • Introduction to Python
      • Building Blocks of Python - Data Types and Variables
      • Functions
      • Formatting and Syntax
    • Math Review
      • Variables and Functions
      • Intro to Graphs
  • Module 2
    • Data Structures
      • Lists
      • Dictionaries
      • Tables
    • Programming Logic
      • Loops
      • Logical Operators
      • Conditionality
  • Module 3
    • Introduction to Probability
      • Probability and Sampling
    • Introduction to Statistics
      • Mean & Variance
      • Causality & Randomness
  • Module 4
    • Packages
    • Intro to NumPy
      • NumPy (continued)
  • Module 5
    • Introduction to Pandas
      • Introduction to Dataframes
      • Groupby and Join
    • Working with Data
    • Data Visualization
      • Matplotlib
      • Introduction to Data Visualization
  • Appendix
    • Table Utilities
    • Area of More Complicated Shapes
    • Introduction to Counting
    • Slope and Distance
    • Short Circuiting
    • Linear Regression
    • Glossary
  • Extension: Classification
    • Classification
    • Test Sets and Training Sets
    • Nearest Neighbors
  • Extension: Introduction to SQL
    • Introduction to SQL
    • Table Operations
      • Tables and Queries
      • Joins
  • Extension: Central Limit Theorem
    • Overview
    • Probability Distributions
      • Bernoulli Distribution
      • Uniform Distribution (Discrete)
      • Random Variables, Expectation, Variance
      • Discrete and Continuous Distributions
      • Uniform Distribution (Continuous)
      • Normal Distribution
    • Central Limit Theorem in Action
    • Confidence Intervals
  • Extension: Object-Oriented Programming
    • Object-Oriented Programming
      • Classes
      • Instantiation
      • Dot Notation
      • Mutability
  • Extension: Introduction to Excel
    • Introduction to Excel
      • Terminology and Interface
      • Getting Started with Analysis and Charts
      • Basics of Manipulating Data
    • Additional Features in Excel
      • Macros
      • The Data Tab
      • Pivot Tables
Powered by GitBook
On this page
  • What Even is Data?
  • The Power of Data Science

Was this helpful?

  1. Introduction

What is Data Science?

PreviousOpportunity Through Data TextbookNextIntroduction to Data Science: Exploratory Musical Analysis

Last updated 4 years ago

Was this helpful?

Data Science is an inter-disciplinary field that incorporates various mathematical tools, algorithms, and machine learning principles in order to recognize patterns and make predictions about the future. In other words, data science can be considered the intersection between mathematics, computer science and business. The primary goal for a data scientist is to uncover hidden truths and patterns that lie within raw, unfiltered data.

What Even is Data?

Data is any piece of information collected -- both qualitative and quantitative -- that represents a particular attribute or trait. It can be as simple as your eye color or as complex as the Gross Domestic Product of the United States during a given time interval.

Data Collection -- the act of gathering data into one source -- is just one of the tasks that a data scientist has to carry out. Other include data cleaning, building predictive models, and creating powerful data visualizations.

Oftentimes, data scientists will follow the Data Science Lifecycle; this process is composed of the following steps:

The Power of Data Science

Over the past ten years, data science has skyrocketed in relevance thanks to advancements in machine learning and artificial intelligence. Today, the effects of data science can be felt across the globe -- and even above it (SpaceX, NASA)! More than that, the skills of data science can be applied to virtually every discipline, including but not limited to:

  • Transportation and Travel (Uber, Tesla)

  • Banking and E-Commerce (Fraud Protection, Predictive Analytics)

  • Healthcare and Medicine (Disease Detection and Mapping)

  • Professional Sports (Saber-metrics in MLB)

  • Entertainment (YouTube, Netflix, Spotify Recommendations)

Whether you think about it or not, we all interact with applications of data science, virtually on a daily basis. This textbook will provide you with the necessary tools to start making an impact yourself!

Source: Towards Data Science