Joins

What if we want to select data from multiple tables?

KEY TERMS

  • join: the process of combining two tables

  • dot notation: the syntax we use to find specific characteristics of an object, like a table

One of the largest benefits of SQL is using short and simple queries to narrow down information from a large aggregate of data. For better organizational management, data is often stored in multiple tables. For example, in our previous section our data was reflected in a table specific to individual dogs. However, say we also have a more general table on dog breed characteristics. Below is our breeds table.

Breed

Origin Country

Size

French Bulldog

France

Medium

Yellow Lab

Canada

Large

Beagle

England

Medium

Border Collie

England

Medium

German Shepherd

Germany

Large

Chihuahua

Mexico

Small

We can combine the breeds table with the dogs table from the previous section using a join.

To accomplish joining at a base level, all we have to do is select from both of our tables, and separating them with a comma. However, we have to change up our notation a little bit. Within our statement, we must use what is called dot notation. Dot notation in this case is essentially using a "dot" to specify where our column is coming from. We can't simply say where breed = breed. We must note that we are saying that the breed column from dogs must equal the breed column from breeds table. Below is an example of how we specify that.

select * from dogs, breeds where dogs.breed = breed.breed;

While the join operation above is a very basic example, sometimes we want even more specificity in the way we join tables. We may be trying to combine dozens of tables at the same time, and therefore, a more directed operation may be required

Left Join

An left-join is a table join operation where all of the rows of the left side table are returned, as well as the matching rows from the rightwards table(s). For example, say we only wanted the columns from the breeds table in our final table, but has another table more dogs with a list of every dog. The resulting table from a left join would be much longer (as it may contain thousands of dog breeds) but the only relevant information to us would be the Breed, Origin Country, and Size columns from more dogs. The syntax for a left join is shown below.

SELECT table1.column1,table1.column2,table2.column1,....
FROM table1 
LEFT JOIN table2
ON table1.matching_column = table2.matching_column;

While this formula is a little bit more complicated than the examples we've seen before, we can walk through it line by line.

  1. We use dot notation to access specific columns

  2. table1 is the table we select columns from

  3. table2's columns are the columns that make up the final table

  4. ON is very similar to the WHERE syntax we've seen before. In specified joins, we typically use ON as a way to define our join condition. We only will copy the matching columns that are contained in table2's original columns. The rows for which there is no matching row on the right side, the set of results will contain null

Similarly, we can do something very similar with a right join. A right join follows almost the exact same pattern, but we join to the right table instead. There are a few specific cases where the difference between a left join and a right join are important to decipher, but for the purposes of this course, we can think of them very similarly. While which type of join you choose is up to you, it is fairly common for data scientists to mainly use left join.

Full Join

Full Joins are useful when we want to have a resulting table with every column of every table we've selected from. Essentially, a full join is the combination of a left join and a right join. You might be asking what happens when we have rows that do not match amongst the tables. Say one of the tables we full joined with breeds had a couple cats in it, while containing a country of origin and size value. The resulting table would simply include a null value for that specific cat's breed in the Breed column of our final table. The syntax for a full join is shown below.

SELECT table1.column1,table1.column2,table2.column1,....
FROM table1 
FULL JOIN table2
ON table1.matching_column = table2.matching_column;

Stepping through line by line again, we can interpret the following:

  • We use dot notation to select specific columns we want in our full join

  • We specify the table we are trying to get data from

  • We call a full join operation. This means we want every column from our select statement in our final table

  • We specify our join condition

These three table operations are the main types of joins that we use in SQL. These operations let us both append data to a specified table, as well as aggregate all of our information in a final table.

Last updated