We're all familiar with the power of Pivot tables, but building them in SQL tends to be tricky.
Snowflake has some bespoke features that make this a little easier though. In this context:
In the example below I've looked at the total hours I've watched some of my favorite programs on Netflix by year, then I've pivoted the data to show each year as a column:
SELECT
SUM(DURATION) / (60 * 60) HOURS_WATCHED, TITLE, YEAR(START_TIME) YEAR
FROM PUBLIC.NETFLIX
where TITLE IN ('Friends', 'GLOW', 'Arrested Development', 'The Good Place')
GROUP BY TITLE, YEAR
We'll take our above summary table and create a new column for each year:
SELECT * FROM SUMMARIZED_DATA
PIVOT(sum(hours_watched) for year in (2018, 2019, 2020))
ORDER BY "2019" DESC
Displaying the data in this way helps emphasise changes across the years for each show.
In most SQL dialects, to construct a pivot table, you'd be forced to use a string of CASE statements:
SELECT
<INDEX COLS>,
AGGREGATE_FUNCTION(CASE WHEN <CONDITION> THEN <VALUE> END) <COL_1>,
AGGREGATE_FUNCTION(CASE WHEN <CONDITION> THEN <VALUE> END) <COL_2>
FROM
<TABLE>
SELECT TITLE,
SUM(CASE WHEN YEAR(START_TIME) = 2018 THEN DURATION END ) / (60 * 60) "2018",
SUM(CASE WHEN YEAR(START_TIME) = 2019 THEN DURATION END ) / (60 * 60) "2019",
SUM(CASE WHEN YEAR(START_TIME) = 2020 THEN DURATION END ) / (60 * 60) "2020"
FROM PUBLIC.NETFLIX
WHERE TITLE IN ('Friends', 'GLOW', 'Arrested Development', 'The Good Place')
GROUP BY TITLE
In Snowflake there is a bespoke function, PIVOT, which simplifies this behavior.
SELECT ...
FROM ...
PIVOT ( <aggregate_function> ( <pivot_column> )
FOR <value_column> IN ( <pivot_value_1> [ , <pivot_value_2> ... ] ) )
[ ... ]
SELECT
YEAR ,
"'Friends'" as "Friends",
"'GLOW'" AS GLOW,
"'The Good Place'" as "The Good Place"
FROM
SUMMARIZED_DATA
PIVOT(SUM(HOURS_WATCHED) FOR TITLE IN ('Friends', 'GLOW','The Good Place'))
ORDER BY YEAR desc
The downside of both of these approaches is that you have to specify the resulting pivoted columns before you create the table, meaning you can't have a dynamic pivot table.
There is a post here that details a way to make dynamic pivot tables with stored procedures.
Naturally, you may want to turn columns into rows, which involves unpivoting. Snowflake, handily, has an UNPIVOT function that will do just that.
SELECT ...
FROM ...
UNPIVOT ( <value_column>
FOR <name_column> IN ( <column_list> ) )
[ ... ]
To unpivot our pivot table of YEAR | Friends | GLOW | The Good Place into a table of: YEAR | TITLE | HOURS WATCHED, we can do:
SELECT * FROM PIVOT_WAY UNPIVOT(HOURS_WATCHED FOR TITLE IN ("Friends","GLOW","The Good Place"))
This page was built using Count, the first notebook built around SQL. It combines the best features of a SQL IDE, Data Visualization Tool, and Computational Notebooks. In the Count notebook, each cell acts like a CTE, meaning you can reference any other cell in your queries.
This makes not only for far more readable reports (like this one) but also a much faster and more powerful way to do your analysis, essentially turning your analysis into a connected graph of data frames rather than one-off convoluted queries and CSV files. And with a built-in visualization framework, you won't have to export your data to make your charts. Go from raw data to interactive report in one document.