SQL Resources/SQL/Understanding JOIN

Understanding JOIN

The optional JOIN clause can be placed in the FROM part of a SQL SELECT statement (read more about the FROM clause). In this article we'll explain how it works.

-- Generic SQL SELECT statement
select
  -- (some columns)
from
  -- (some tables)
  -- (zero or more JOIN clauses)
where
  -- (A 'predicate' expression to be used as a filter)
group by
  -- (some columns)

You need a JOIN clause whenever you want your SELECT statement to retrieve rows from multiple tables at the same time - it tells the database exactly how it should match those tables together.

There are several different types of JOIN clause, common ones are

  • LEFT JOIN
  • RIGHT JOIN
  • INNER JOIN
  • OUTER JOIN
  • CROSS JOIN

In this article, we'll look at some examples to see how they work.

LEFT JOIN

A basic LEFT JOIN statement looks like

select
  *
from
  table_1
left join
  table_2 on table_1.column_1 = table_2.column_2

This query will return

  • all columns from table_1 and table_2
  • all rows from table_1
  • all rows from table_2 where column_2 matches the column column_1 from table_1

This is why it's called a LEFT join - the table on the left of the join clause keeps all of its rows, even if there isn't a matching row in the table on the right.

If there is no matching row in the table on the right, the columns from the right will be filled with NULLs.

with left_table as (select * from unnest([
  struct('a' as strings, 1 as numbers),
  struct('b' as strings, 2 as numbers),
  struct('c' as strings, 3 as numbers)
])),
right_table as (select * from unnest([
  struct('a' as strings, 4 as numbers),
  struct('b' as strings, 5 as numbers)
]))

select
  *
from
  left_table
left join
  right_table on left_table.strings = right_table.strings
strings
a
b
c
numbers
1
2
3
strings_1
a
b
NULL
numbers_1
4
5
NULL

RIGHT JOIN

A basic RIGHT JOIN statement looks like

select
  *
from
  table_1
right join
  table_2 on table_1.column_1 = table_2.column_2

This query will return

  • all columns from table_1 and table_2
  • all rows from table_2
  • all rows from table_1 where column_1 matches the column column_2 from table_2

This is why it's called a RIGHT join - the table on the right of the join clause keeps all of its rows, even if there isn't a matching row in the table on the left.

If there is no matching row in the table on the left, the columns from the left will be filled with NULLs.

with left_table as (select * from unnest([
  struct('a' as strings, 1 as numbers),
  struct('b' as strings, 2 as numbers)
])),
right_table as (select * from unnest([
  struct('a' as strings, 4 as numbers),
  struct('b' as strings, 5 as numbers),
  struct('c' as strings, 6 as numbers)
]))

select
  *
from
  left_table
right join
  right_table on left_table.strings = right_table.strings
strings
a
b
NULL
numbers
1
2
NULL
strings_1
a
b
c
numbers_1
4
5
6

INNER JOIN

A basic INNER JOIN statement looks like

select
  *
from
  table_1
inner join
  table_2 on table_1.column_1 = table_2.column_2

This query will return

  • all columns from table_1 and table_2
  • all rows from both tables where the values in column_1 and column_2 both match and exist

This is why it's called an INNER join - only the rows which match from both tables will be included in the result.

Note in the following that the values 'c' and 'd' don't appear in the output, as those values don't exist in both tables.

with left_table as (select * from unnest([
  struct('a' as strings, 1 as numbers),
  struct('b' as strings, 2 as numbers),
  struct('c' as strings, 3 as numbers)
])),
right_table as (select * from unnest([
  struct('a' as strings, 4 as numbers),
  struct('b' as strings, 5 as numbers),
  struct('d' as strings, 6 as numbers)
]))

select
  *
from
  left_table
inner join
  right_table on left_table.strings = right_table.strings
strings
a
b
numbers
1
2
strings_1
a
b
numbers_1
4
5

FULL OUTER JOIN

A basic FULL OUTER JOIN statement looks like

select
  *
from
  table_1
full outer join
  table_2 on table_1.column_1 = table_2.column_2

This query will return

  • all columns from table_1 and table_2
  • all rows from both tables

This is why it's called an OUTER join - all rows will be included in the result, even if they don't exist in one of the tables.

If there is no matching row in one of the tables, the missing columns will be filled with NULLs.

Note in the following that the values 'c' and 'd' appear in the output, even though they only exist in one table each.

with left_table as (select * from unnest([
  struct('a' as strings, 1 as numbers),
  struct('b' as strings, 2 as numbers),
  struct('c' as strings, 3 as numbers)
])),
right_table as (select * from unnest([
  struct('a' as strings, 4 as numbers),
  struct('b' as strings, 5 as numbers),
  struct('d' as strings, 6 as numbers)
]))

select
  *
from
  left_table
full outer join
  right_table on left_table.strings = right_table.strings
strings
a
b
c
NULL
numbers
1
2
3
NULL
strings_1
a
b
NULL
d
numbers_1
4
5
NULL
6

CROSS JOIN

A basic CROSS JOIN statement looks like

select
  *
from
  table_1
cross join
  table_2 -- Note - no 'on' expression required

This query will return

  • all columns from table_1 and table_2
  • all possible combinations of rows from both tables

This is why it's called a CROSS join - the number of rows returned is the product of the number of rows of the two tables. You should be careful when using CROSS JOINs as they can return really massive tables, and are often not what you require.

Note in the following that the number of rows in the output is 2 x 3 = 6.

with left_table as (select * from unnest([
  struct('a' as strings, 1 as numbers),
  struct('b' as strings, 2 as numbers)
])),
right_table as (select * from unnest([
  struct('a' as strings, 4 as numbers),
  struct('b' as strings, 5 as numbers),
  struct('c' as strings, 6 as numbers)
]))

select
  *
from
  left_table
cross join
  right_table
strings
a
a
a
b
b
b
numbers
1
1
1
2
2
2
strings_1
a
b
c
a
b
c
numbers_1
4
5
6
4
5
6

Share SQL code
Write SQL with your team in real-time.
Share SQL code
Write SQL with your team in real-time.