SQL Resources/Snowflake/Pivot Tables

Pivot Tables

We're all familiar with the power of Pivot tables, but building them in SQL tends to be tricky.

Snowflake has some bespoke features that make this a little easier though. In this context:

In the example below I've looked at the total hours I've watched some of my favorite programs on Netflix by year, then I've pivoted the data to show each year as a column:

SELECT
  SUM(DURATION) / (60 * 60) HOURS_WATCHED, TITLE, YEAR(START_TIME) YEAR
FROM PUBLIC.NETFLIX 
where TITLE IN ('Friends', 'GLOW', 'Arrested Development', 'The Good Place')
GROUP BY TITLE, YEAR
HOURS_WATCHED
15.726388889
78.595555556
19.2725
192.049166667
58.134166667
2.197777778
4.356388889
9.827777778
9.428888889
2.958888889
4.449444444
0.3686111111
23.560277778
1.806388889
8.503888889
0.4994444444
0.9927777778
1.868611111
2.521111111
14.368055556
2.156388889
TITLE
GLOW
Friends
Friends
Friends
Friends
Friends
Friends
The Good Place
The Good Place
The Good Place
The Good Place
The Good Place
Arrested Development
Arrested Development
Arrested Development
Arrested Development
Arrested Development
Arrested Development
Arrested Development
Arrested Development
Arrested Development
YEAR
2020
2019
2015
2018
2020
2016
2017
2020
2018
2019
2021
2017
2012
2014
2017
2016
2020
2018
2019
2013
2015

We'll take our above summary table and create a new column for each year:

SELECT * FROM SUMMARIZED_DATA 
PIVOT(sum(hours_watched) for year in (2018, 2019, 2020)) 
ORDER BY "2019" DESC
TITLE
GLOW
Friends
The Good Place
Arrested Development
2018
NULL
192.049166667
9.428888889
1.868611111
2019
NULL
78.595555556
2.958888889
2.521111111
2020
15.726388889
58.134166667
9.827777778
0.9927777778

Displaying the data in this way helps emphasise changes across the years for each show.

Step-by-Step

Option 1: CASE

In most SQL dialects, to construct a pivot table, you'd be forced to use a string of CASE statements:

SELECT 
  <INDEX COLS>, 
  AGGREGATE_FUNCTION(CASE WHEN <CONDITION> THEN <VALUE> END) <COL_1>, 
  AGGREGATE_FUNCTION(CASE WHEN <CONDITION> THEN <VALUE> END) <COL_2>
FROM
  <TABLE>
SELECT TITLE, 
  SUM(CASE WHEN YEAR(START_TIME) = 2018 THEN DURATION END ) / (60 * 60) "2018",
  SUM(CASE WHEN YEAR(START_TIME) = 2019 THEN DURATION END ) / (60 * 60) "2019",
  SUM(CASE WHEN YEAR(START_TIME) = 2020 THEN DURATION END ) / (60 * 60) "2020"
FROM PUBLIC.NETFLIX
WHERE TITLE IN ('Friends', 'GLOW', 'Arrested Development', 'The Good Place')
GROUP BY TITLE
TITLE
GLOW
Friends
The Good Place
Arrested Development
2018
NULL
192.049166667
9.428888889
1.868611111
2019
NULL
78.595555556
2.958888889
2.521111111
2020
15.726388889
58.134166667
9.827777778
0.9927777778

Option 2: PIVOT

In Snowflake there is a bespoke function, PIVOT, which simplifies this behavior.

SELECT ...
FROM ...
   PIVOT ( <aggregate_function> ( <pivot_column> )
            FOR <value_column> IN ( <pivot_value_1> [ , <pivot_value_2> ... ] ) )

[ ... ]
SELECT 
  YEAR , 
  "'Friends'" as "Friends", 
  "'GLOW'" AS GLOW, 
  "'The Good Place'" as "The Good Place" 
FROM 
  SUMMARIZED_DATA
  PIVOT(SUM(HOURS_WATCHED) FOR TITLE IN ('Friends', 'GLOW','The Good Place'))
ORDER BY YEAR desc
YEAR
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
Friends
NULL
58.134166667
78.595555556
192.049166667
4.356388889
2.197777778
19.2725
NULL
NULL
NULL
GLOW
NULL
15.726388889
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
The Good Place
4.449444444
9.827777778
2.958888889
9.428888889
0.3686111111
NULL
NULL
NULL
NULL
NULL

The downside of both of these approaches is that you have to specify the resulting pivoted columns before you create the table, meaning you can't have a dynamic pivot table.

There is a post here that details a way to make dynamic pivot tables with stored procedures.

Unpivot

Naturally, you may want to turn columns into rows, which involves unpivoting. Snowflake, handily, has an UNPIVOT function that will do just that.

SELECT ...
FROM ...
   UNPIVOT ( <value_column>
             FOR <name_column> IN ( <column_list> ) )

[ ... ]

To unpivot our pivot table of YEAR | Friends | GLOW | The Good Place into a table of: YEAR | TITLE | HOURS WATCHED, we can do:

SELECT * FROM PIVOT_WAY UNPIVOT(HOURS_WATCHED FOR TITLE IN ("Friends","GLOW","The Good Place")) 
YEAR
2021
2020
2020
2020
2019
2019
2018
2018
2017
2017
2016
2015
TITLE
The Good Place
Friends
GLOW
The Good Place
Friends
The Good Place
Friends
The Good Place
Friends
The Good Place
Friends
Friends
HOURS_WATCHED
4.449444444
58.134166667
15.726388889
9.827777778
78.595555556
2.958888889
192.049166667
9.428888889
4.356388889
0.3686111111
2.197777778
19.2725

How We Built This

This page was built using Count, the first notebook built around SQL. It combines the best features of a SQL IDE, Data Visualization Tool, and Computational Notebooks. In the Count notebook, each cell acts like a CTE, meaning you can reference any other cell in your queries.

This makes not only for far more readable reports (like this one) but also a much faster and more powerful way to do your analysis, essentially turning your analysis into a connected graph of data frames rather than one-off convoluted queries and CSV files. And with a built-in visualization framework, you won't have to export your data to make your charts. Go from raw data to interactive report in one document.

Share SQL code
Write SQL with your team in real-time.
Share SQL code
Write SQL with your team in real-time.