SQL Resources
SQL
Understanding JOIN

Understanding JOIN

The optional JOIN clause can be placed in the FROM part of a SQL SELECT statement (read more about the FROM clause). In this article we'll explain how it works.

-- Generic SQL SELECT statement
select
  -- (some columns)
from
  -- (some tables)
  -- (zero or more JOIN clauses)
where
  -- (A 'predicate' expression to be used as a filter)
group by
  -- (some columns)

You need a JOIN clause whenever you want your SELECT statement to retrieve rows from multiple tables at the same time - it tells the database exactly how it should match those tables together.

There are several different types of JOIN clause, common ones are

  • LEFT JOIN
  • RIGHT JOIN
  • INNER JOIN
  • OUTER JOIN
  • CROSS JOIN
  • In this article, we'll look at some examples to see how they work.

    In this article the code snippets are written in the Google BigQuery Standard SQL syntax.

    LEFT JOIN

    A basic LEFT JOIN statement looks like

    select
      *
    from
      table_1
    left join
      table_2 on table_1.column_1 = table_2.column_2

    This query will return

  • all columns from table_1 and table_2
  • all rows from table_1
  • all rows from table_2 where column_2 matches the column column_1 from table_1
  • This is why it's called a LEFT join - the table on the left of the join clause keeps all of its rows, even if there isn't a matching row in the table on the right.

    If there is no matching row in the table on the right, the columns from the right will be filled with NULLs.

    with left_table as (select * from unnest([
      struct('a' as strings, 1 as numbers),
      struct('b' as strings, 2 as numbers),
      struct('c' as strings, 3 as numbers)
    ])),
    right_table as (select * from unnest([
      struct('a' as strings, 4 as numbers),
      struct('b' as strings, 5 as numbers)
    ]))
    
    select
      *
    from
      left_table
    left join
      right_table on left_table.strings = right_table.strings
    stringsnumbersstrings_1numbers_1
    a1a4
    b2b5
    c3nullnull

    RIGHT JOIN

    A basic RIGHT JOIN statement looks like

    select
      *
    from
      table_1
    right join
      table_2 on table_1.column_1 = table_2.column_2

    This query will return

  • all columns from table_1 and table_2
  • all rows from table_2
  • all rows from table_1 where column_1 matches the column column_2 from table_2
  • This is why it's called a RIGHT join - the table on the right of the join clause keeps all of its rows, even if there isn't a matching row in the table on the left.

    If there is no matching row in the table on the left, the columns from the left will be filled with NULLs.

    with left_table as (select * from unnest([
      struct('a' as strings, 1 as numbers),
      struct('b' as strings, 2 as numbers)
    ])),
    right_table as (select * from unnest([
      struct('a' as strings, 4 as numbers),
      struct('b' as strings, 5 as numbers),
      struct('c' as strings, 6 as numbers)
    ]))
    
    select
      *
    from
      left_table
    right join
      right_table on left_table.strings = right_table.strings
    stringsnumbersstrings_1numbers_1
    a1a4
    b2b5
    nullnullc6

    INNER JOIN

    A basic INNER JOIN statement looks like

    select
      *
    from
      table_1
    inner join
      table_2 on table_1.column_1 = table_2.column_2

    This query will return

  • all columns from table_1 and table_2
  • all rows from both tables where the values in column_1 and column_2 both match and exist
  • This is why it's called an INNER join - only the rows which match from both tables will be included in the result.

    Note in the following that the values 'c' and 'd' don't appear in the output, as those values don't exist in both tables.

    with left_table as (select * from unnest([
      struct('a' as strings, 1 as numbers),
      struct('b' as strings, 2 as numbers),
      struct('c' as strings, 3 as numbers)
    ])),
    right_table as (select * from unnest([
      struct('a' as strings, 4 as numbers),
      struct('b' as strings, 5 as numbers),
      struct('d' as strings, 6 as numbers)
    ]))
    
    select
      *
    from
      left_table
    inner join
      right_table on left_table.strings = right_table.strings
    stringsnumbersstrings_1numbers_1
    a1a4
    b2b5

    FULL OUTER JOIN

    A basic FULL OUTER JOIN statement looks like

    select
      *
    from
      table_1
    full outer join
      table_2 on table_1.column_1 = table_2.column_2

    This query will return

  • all columns from table_1 and table_2
  • all rows from both tables
  • This is why it's called an OUTER join - all rows will be included in the result, even if they don't exist in one of the tables.

    If there is no matching row in one of the tables, the missing columns will be filled with NULLs.

    Note in the following that the values 'c' and 'd' appear in the output,  even though they only exist in one table each.

    with left_table as (select * from unnest([
      struct('a' as strings, 1 as numbers),
      struct('b' as strings, 2 as numbers),
      struct('c' as strings, 3 as numbers)
    ])),
    right_table as (select * from unnest([
      struct('a' as strings, 4 as numbers),
      struct('b' as strings, 5 as numbers),
      struct('d' as strings, 6 as numbers)
    ]))
    
    select
      *
    from
      left_table
    full outer join
      right_table on left_table.strings = right_table.strings
    stringsnumbersstrings_1numbers_1
    a1a4
    b2b5
    c3nullnull
    nullnulld6

    CROSS JOIN

    A basic CROSS JOIN statement looks like

    select
      *
    from
      table_1
    cross join
      table_2 -- Note - no 'on' expression required

    This query will return

  • all columns from table_1 and table_2
  • all possible combinations of rows from both tables
  • This is why it's called a CROSS join - the number of rows returned is the product of the number of rows of the two tables. You should be careful when using CROSS JOINs as they can return really massive tables, and are often not what you require.

    Note in the following that the number of rows in the output is 2 x 3 = 6.

    with left_table as (select * from unnest([
      struct('a' as strings, 1 as numbers),
      struct('b' as strings, 2 as numbers)
    ])),
    right_table as (select * from unnest([
      struct('a' as strings, 4 as numbers),
      struct('b' as strings, 5 as numbers),
      struct('c' as strings, 6 as numbers)
    ]))
    
    select
      *
    from
      left_table
    cross join
      right_table
    stringsnumbersstrings_1numbers_1
    a1a4
    a1b5
    a1c6
    b2a4
    b2b5
    b2c6
    Subscribe to newsletter

    Subscribe to receive the latest blog posts to your inbox every week.

    By subscribing you agree to our Privacy Policy.
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.

    Start solving your organization's biggest problems with Count today.

    Stay up to date with all things count, data and problem solving.