Joins
What if we want to select data from multiple tables?
KEY TERMS
join: the process of combining two tables
dot notation: the syntax we use to find specific characteristics of an object, like a table
One of the largest benefits of SQL is using short and simple queries to narrow down information from a large aggregate of data. For better organizational management, data is often stored in multiple tables. For example, in our previous section our data was reflected in a table specific to individual dogs. However, say we also have a more general table on dog breed characteristics. Below is our breeds table.
Breed
Origin Country
Size
French Bulldog
France
Medium
Yellow Lab
Canada
Large
Beagle
England
Medium
Border Collie
England
Medium
German Shepherd
Germany
Large
Chihuahua
Mexico
Small
We can combine the breeds table with the dogs table from the previous section using a join.
To accomplish joining at a base level, all we have to do is select from both of our tables, and separating them with a comma. However, we have to change up our notation a little bit. Within our statement, we must use what is called dot notation. Dot notation in this case is essentially using a "dot" to specify where our column is coming from. We can't simply say where breed = breed. We must note that we are saying that the breed column from dogs must equal the breed column from breeds table. Below is an example of how we specify that.
While the join operation above is a very basic example, sometimes we want even more specificity in the way we join tables. We may be trying to combine dozens of tables at the same time, and therefore, a more directed operation may be required
Left Join
An left-join is a table join operation where all of the rows of the left side table are returned, as well as the matching rows from the rightwards table(s). For example, say we only wanted the columns from the breeds table in our final table, but has another table more dogs with a list of every dog. The resulting table from a left join would be much longer (as it may contain thousands of dog breeds) but the only relevant information to us would be the Breed, Origin Country, and Size columns from more dogs. The syntax for a left join is shown below.
While this formula is a little bit more complicated than the examples we've seen before, we can walk through it line by line.
We use dot notation to access specific columns
table1 is the table we select columns from
table2's columns are the columns that make up the final table
ON
is very similar to theWHERE
syntax we've seen before. In specified joins, we typically use ON as a way to define our join condition. We only will copy the matching columns that are contained in table2's original columns. The rows for which there is no matching row on the right side, the set of results will contain null
Similarly, we can do something very similar with a right join. A right join follows almost the exact same pattern, but we join to the right table instead. There are a few specific cases where the difference between a left join and a right join are important to decipher, but for the purposes of this course, we can think of them very similarly. While which type of join you choose is up to you, it is fairly common for data scientists to mainly use left join.
Full Join
Full Joins are useful when we want to have a resulting table with every column of every table we've selected from. Essentially, a full join is the combination of a left join and a right join. You might be asking what happens when we have rows that do not match amongst the tables. Say one of the tables we full joined with breeds had a couple cats in it, while containing a country of origin and size value. The resulting table would simply include a null value for that specific cat's breed in the Breed column of our final table. The syntax for a full join is shown below.
Stepping through line by line again, we can interpret the following:
We use dot notation to select specific columns we want in our full join
We specify the table we are trying to get data from
We call a full join operation. This means we want every column from our select statement in our final table
We specify our join condition
These three table operations are the main types of joins that we use in SQL. These operations let us both append data to a specified table, as well as aggregate all of our information in a final table.
Last updated
Was this helpful?