Understanding JOIN
The optional JOIN
clause can be placed in the FROM
part of a SQL SELECT
statement (read more about the FROM
clause). In this article we'll explain how it works.
-- Generic SQL SELECT statement
select
-- (some columns)
from
-- (some tables)
-- (zero or more JOIN clauses)
where
-- (A 'predicate' expression to be used as a filter)
group by
-- (some columns)
You need a JOIN
clause whenever you want your SELECT
statement to retrieve rows from multiple tables at the same time - it tells the database exactly how it should match those tables together.
There are several different types of JOIN
clause, common ones are
LEFT JOIN
RIGHT JOIN
INNER JOIN
OUTER JOIN
CROSS JOIN
In this article, we'll look at some examples to see how they work.
In this article the code snippets are written in the Google BigQuery Standard SQL syntax.
LEFT JOIN
A basic LEFT JOIN
statement looks like
select
*
from
table_1
left join
table_2 on table_1.column_1 = table_2.column_2
This query will return
table_1
and table_2
table_1
table_2
where column_2
matches the column column_1
from table_1
This is why it's called a LEFT
join - the table on the left of the join clause keeps all of its rows, even if there isn't a matching row in the table on the right.
If there is no matching row in the table on the right, the columns from the right will be filled with NULL
s.
with left_table as (select * from unnest([
struct('a' as strings, 1 as numbers),
struct('b' as strings, 2 as numbers),
struct('c' as strings, 3 as numbers)
])),
right_table as (select * from unnest([
struct('a' as strings, 4 as numbers),
struct('b' as strings, 5 as numbers)
]))
select
*
from
left_table
left join
right_table on left_table.strings = right_table.strings
RIGHT JOIN
A basic RIGHT JOIN
statement looks like
select
*
from
table_1
right join
table_2 on table_1.column_1 = table_2.column_2
This query will return
table_1
and table_2
table_2
table_1
where column_1
matches the column column_2
from table_2
This is why it's called a RIGHT
join - the table on the right of the join clause keeps all of its rows, even if there isn't a matching row in the table on the left.
If there is no matching row in the table on the left, the columns from the left will be filled with NULL
s.
with left_table as (select * from unnest([
struct('a' as strings, 1 as numbers),
struct('b' as strings, 2 as numbers)
])),
right_table as (select * from unnest([
struct('a' as strings, 4 as numbers),
struct('b' as strings, 5 as numbers),
struct('c' as strings, 6 as numbers)
]))
select
*
from
left_table
right join
right_table on left_table.strings = right_table.strings
INNER JOIN
A basic INNER JOIN
statement looks like
select
*
from
table_1
inner join
table_2 on table_1.column_1 = table_2.column_2
This query will return
table_1
and table_2
column_1
and column_2
both match and existThis is why it's called an INNER
join - only the rows which match from both tables will be included in the result.
Note in the following that the values 'c' and 'd' don't appear in the output, as those values don't exist in both tables.
with left_table as (select * from unnest([
struct('a' as strings, 1 as numbers),
struct('b' as strings, 2 as numbers),
struct('c' as strings, 3 as numbers)
])),
right_table as (select * from unnest([
struct('a' as strings, 4 as numbers),
struct('b' as strings, 5 as numbers),
struct('d' as strings, 6 as numbers)
]))
select
*
from
left_table
inner join
right_table on left_table.strings = right_table.strings
FULL OUTER JOIN
A basic FULL OUTER JOIN
statement looks like
select
*
from
table_1
full outer join
table_2 on table_1.column_1 = table_2.column_2
This query will return
table_1
and table_2
This is why it's called an OUTER
join - all rows will be included in the result, even if they don't exist in one of the tables.
If there is no matching row in one of the tables, the missing columns will be filled with NULL
s.
Note in the following that the values 'c' and 'd' appear in the output, even though they only exist in one table each.
with left_table as (select * from unnest([
struct('a' as strings, 1 as numbers),
struct('b' as strings, 2 as numbers),
struct('c' as strings, 3 as numbers)
])),
right_table as (select * from unnest([
struct('a' as strings, 4 as numbers),
struct('b' as strings, 5 as numbers),
struct('d' as strings, 6 as numbers)
]))
select
*
from
left_table
full outer join
right_table on left_table.strings = right_table.strings
CROSS JOIN
A basic CROSS JOIN
statement looks like
select
*
from
table_1
cross join
table_2 -- Note - no 'on' expression required
This query will return
table_1
and table_2
This is why it's called a CROSS
join - the number of rows returned is the product of the number of rows of the two tables. You should be careful when using CROSS JOIN
s as they can return really massive tables, and are often not what you require.
Note in the following that the number of rows in the output is 2 x 3 = 6.
with left_table as (select * from unnest([
struct('a' as strings, 1 as numbers),
struct('b' as strings, 2 as numbers)
])),
right_table as (select * from unnest([
struct('a' as strings, 4 as numbers),
struct('b' as strings, 5 as numbers),
struct('c' as strings, 6 as numbers)
]))
select
*
from
left_table
cross join
right_table