SQL Resources
SQL
Pivot Tables

Pivot Tables

We're all familiar with the power of Pivot tables, but building them in SQL tends to be tricky.

Snowflake has some bespoke features that make this a little easier though. In this context:

Pivoting a table means turning rows into columns

In the example below I've looked at the total hours I've watched some of my favorite programs on Netflix by year, then I've pivoted the data to show each year as a column:

SELECT
  SUM(DURATION) / (60 * 60) HOURS_WATCHED, TITLE, YEAR(START_TIME) YEAR
FROM PUBLIC.NETFLIX 
where TITLE IN ('Friends', 'GLOW', 'Arrested Development', 'The Good Place')
GROUP BY TITLE, YEAR
HOURS_WATCHEDTITLEYEAR
15.726388889GLOW2020
78.595555556Friends2019
19.2725Friends2015
192.049166667Friends2018
58.134166667Friends2020
2.197777778Friends2016
4.356388889Friends2017
9.827777778The Good Place2020
9.428888889The Good Place2018
2.958888889The Good Place2019
4.449444444The Good Place2021
0.3686111111The Good Place2017
23.560277778Arrested Development2012
1.806388889Arrested Development2014
8.503888889Arrested Development2017
0.4994444444Arrested Development2016
0.9927777778Arrested Development2020
1.868611111Arrested Development2018
2.521111111Arrested Development2019
14.368055556Arrested Development2013
2.156388889Arrested Development2015

We'll take our above summary table and create a new column for each year:

SELECT * FROM SUMMARIZED_DATA 
PIVOT(sum(hours_watched) for year in (2018, 2019, 2020)) 
ORDER BY "2019" DESC
TITLE201820192020
GLOWnullnull15.726388889
Friends192.04916666778.59555555658.134166667
The Good Place9.4288888892.9588888899.827777778
Arrested Development1.8686111112.5211111110.9927777778

Displaying the data in this way helps emphasise changes across the years for each show.

Step-by-Step

Option 1: CASE

In most SQL dialects, to construct a pivot table, you'd be forced to use a string of CASE statements:

SELECT 
  <INDEX COLS>, 
  AGGREGATE_FUNCTION(CASE WHEN <CONDITION> THEN <VALUE> END) <COL_1>, 
  AGGREGATE_FUNCTION(CASE WHEN <CONDITION> THEN <VALUE> END) <COL_2>
FROM
  <TABLE>
SELECT TITLE, 
  SUM(CASE WHEN YEAR(START_TIME) = 2018 THEN DURATION END ) / (60 * 60) "2018",
  SUM(CASE WHEN YEAR(START_TIME) = 2019 THEN DURATION END ) / (60 * 60) "2019",
  SUM(CASE WHEN YEAR(START_TIME) = 2020 THEN DURATION END ) / (60 * 60) "2020"
FROM PUBLIC.NETFLIX
WHERE TITLE IN ('Friends', 'GLOW', 'Arrested Development', 'The Good Place')
GROUP BY TITLE
TITLE201820192020
GLOWnullnull15.726388889
Friends192.04916666778.59555555658.134166667
The Good Place9.4288888892.9588888899.827777778
Arrested Development1.8686111112.5211111110.9927777778

Option 2: PIVOT

In Snowflake there is a bespoke function, PIVOT, which simplifies this behavior.

SELECT ...
FROM ...
   PIVOT ( <aggregate_function> ( <pivot_column> )
            FOR <value_column> IN ( <pivot_value_1> [ , <pivot_value_2> ... ] ) )

[ ... ]
SELECT 
  YEAR , 
  "'Friends'" as "Friends", 
  "'GLOW'" AS GLOW, 
  "'The Good Place'" as "The Good Place" 
FROM 
  SUMMARIZED_DATA
  PIVOT(SUM(HOURS_WATCHED) FOR TITLE IN ('Friends', 'GLOW','The Good Place'))
ORDER BY YEAR desc
YEARFriendsGLOWThe Good Place
2021nullnull4.449444444
202058.13416666715.7263888899.827777778
201978.595555556null2.958888889
2018192.049166667null9.428888889
20174.356388889null0.3686111111
20162.197777778nullnull
201519.2725nullnull
2014nullnullnull
2013nullnullnull
2012nullnullnull

The downside of both of these approaches is that you have to specify the resulting pivoted columns before you create the table, meaning you can't have a dynamic pivot table.

There is a post here that details a way to make dynamic pivot tables with stored procedures.

Unpivot

Naturally, you may want to turn columns into rows, which involves unpivoting. Snowflake, handily, has an UNPIVOT function that will do just that.

SELECT ...
FROM ...
   UNPIVOT ( <value_column>
             FOR <name_column> IN ( <column_list> ) )

[ ... ]

To unpivot our pivot table of YEAR | Friends | GLOW | The Good Place into a table of: YEAR | TITLE | HOURS WATCHED, we can do:

SELECT * FROM PIVOT_WAY UNPIVOT(HOURS_WATCHED FOR TITLE IN ("Friends","GLOW","The Good Place"))
YEARTITLEHOURS_WATCHED
2021The Good Place4.449444444
2020Friends58.134166667
2020GLOW15.726388889
2020The Good Place9.827777778
2019Friends78.595555556
2019The Good Place2.958888889
2018Friends192.049166667
2018The Good Place9.428888889
2017Friends4.356388889
2017The Good Place0.3686111111
2016Friends2.197777778
2015Friends19.2725

How We Built This

This page was built using Count. It combines the best features of a SQL IDE, Data Visualization Tool, and Computational Notebooks. In the Count notebook, each cell acts like a CTE, meaning you can reference any other cell in your queries.

This makes not only for far more readable reports (like this one) but also a much faster and more powerful way to do your analysis, essentially turning your analysis into a connected graph of data frames rather than one-off convoluted queries and CSV files. And with a built-in visualization framework, you won't have to export your data to make your charts. Go from raw data to interactive report in one document.

Subscribe to newsletter

Subscribe to receive the latest blog posts to your inbox every week.

By subscribing you agree to our Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Start solving your organization's biggest problems with Count today.

Stay up to date with all things count, data and problem solving.