Window functions in Snowflake are a way to compute values over a group of rows. They return a single value for each row, in contrast to aggregate functions which return a single value for a group of rows.
SELECT
letter,
SUM(number) AS aggregate
FROM
(
SELECT
'A' AS letter,
2 AS number
UNION ALL
( SELECT
'A' AS letter,
3 AS number)
UNION ALL
( SELECT
'C' AS letter,
6 AS number)
) AS table_3
GROUP BY
letter
SELECT
letter,
SUM(number) OVER (PARTITION BY letter) AS "WINDOW"
FROM
(
SELECT
'A' AS letter,
2 AS number
UNION ALL
( SELECT
'A' AS letter,
3 AS number)
UNION ALL
( SELECT
'C' AS letter,
6 AS number)
) AS table_3
In the first aggregate example, the resulting data were grouped by letter, but in the second window example, we preserved our rows.
Window functions are very powerful, and once you've gotten your head around how to use them, you'll be surprised at what they allow you to do. Some common use cases are:
<function> ( <arguments> ) OVER ( [ PARTITION BY <expr1> ] ORDER BY <expr2> [ cumulativeFrame | slidingFrame ] )
where the PARTITION BY denotes how to GROUP rows into partitions,ORDER BY how to order the rows in those partitions, and FRAME which determines which rows to consider in those ordered partitions.
In general, window functions can be grouped into 3 types:
FIRST_VALUE( <expr> ) [ { IGNORE | RESPECT } NULLS ]
OVER ( [ PARTITION BY <expr1> ] ORDER BY <expr2> [ { ASC | DESC } ] [ <window_frame> ] )
LAST_VALUE( <expr> ) [ { IGNORE | RESPECT } NULLS ]
OVER ( [ PARTITION BY <expr1> ] ORDER BY <expr2> [ { ASC | DESC } ] [ <window_frame> ] )
NTH_VALUE( <expr> , n ) [ FROM { FIRST | LAST } ] [ { IGNORE | RESPECT } NULLS ]
OVER ( [ PARTITION BY <expr1> ] ORDER BY <expr2> [ { ASC | DESC } ] [ <window_frame> ] )
LEAD ( <expr> [ , <offset> , <default> ] ) [ { IGNORE | RESPECT } NULLS ] OVER ( [ PARTITION BY <expr1> ] ORDER BY <expr2> [ { ASC | DESC } ] )
LAG ( <expr> [ , <offset> , <default> ] ) [ { IGNORE | RESPECT } NULLS ]
OVER ( [ PARTITION BY <expr1> ] ORDER BY <expr2> [ { ASC | DESC } ] )
MODE( <expr1> ) OVER ( [ PARTITION BY <expr2> ] )
For these functions we'll use the following demo data:
SELECT 'James Harden' AS player, 2335 AS points, 2020 AS season UNION ALL (SELECT 'Damian Lillard' AS player, 1978 AS points, 2020 AS season) UNION ALL (SELECT 'Devin Booker' AS player, 1863 AS points, 2020 AS season) UNION ALL (SELECT 'James Harden' AS player, 2818 AS points, 2019 AS season) UNION ALL (SELECT 'Paul George' AS player, 1978 AS points, 2019 AS season) UNION ALL (SELECT 'Kemba Walker' AS player, 2102 AS points, 2019 AS season) UNION ALL (SELECT 'Damian Lillard' AS player, 2067 AS points, 2019 AS season) UNION ALL (SELECT 'Devin Booker' AS player, 1700 AS points, 2019 AS season) UNION ALL (SELECT 'Paul George' AS player, 1033 AS points, 2020 AS season) UNION ALL (SELECT 'Kemba Walker' AS player, 1145 AS points, 2020 AS season)
How to find a year-over-year change:
SELECT DISTINCT
player,
FIRST_VALUE(POINTS) OVER (PARTITION BY PLAYER ORDER BY SEASON ASC) AS first_season,
LAST_VALUE(POINTS) OVER (PARTITION BY PLAYER ORDER BY SEASON ASC) AS last_season,
(100 * ((LAST_VALUE(points) OVER (PARTITION BY player ORDER BY season ASC) - FIRST_VALUE(points) OVER (PARTITION BY player ORDER BY season ASC)) / FIRST_VALUE(points) OVER (PARTITION BY player ORDER BY season ASC))) AS per_change
FROM
TOP_SCORERS
ORDER BY PLAYER
We used FIRST_VALUE and LAST_VALUE to find the scores for each player in the earliest and most recent seasons of data. Then we computed the percent difference using:
100 * ((new value - old value) / old value) per_difference
RANK() OVER ( [ PARTITION BY <expr1> ] ORDER BY <expr2> [ { ASC | DESC } ] [ <window_frame> ] )
DENSE_RANK() OVER ( [ PARTITION BY <expr1> ] ORDER BY <expr2> [ ASC | DESC ] [ <window_frame> ] )
PERCENT_RANK() OVER ( [ PARTITION BY <expr1> ] ORDER BY <expr2> [ { ASC | DESC } ] [ <cumulativeRangeFrame> ] )
CUME_DIST() OVER ( [ PARTITION BY <partition_expr> ] ORDER BY <order_expr> [ ASC | DESC ] )
NTILE( <constant_value> ) OVER ( [ PARTITION BY <expr1> ] ORDER BY <expr2> [ { ASC | DESC } ] )
ROW_NUMBER() OVER (
[ PARTITION BY <expr1> [, <expr2> ... ] ]
ORDER BY <expr3> [ , <expr4> ... ] [ { ASC | DESC } ]
)
WIDTH_BUCKET( <expr> , <min_value> , <max_value> , <num_buckets> )
CONDITIONAL_CHANGE_EVENT( <expr1> ) OVER ( [ PARTITION BY <expr2> ] ORDER BY <expr3> )
CONDITIONAL_TRUE_EVENT( <expr1> ) OVER ( [ PARTITION BY <expr2> ] ORDER BY <expr3> )
How to get top 3 results for each group?
SELECT
*
FROM
(
SELECT
season,
RANK() OVER (PARTITION BY season ORDER BY points DESC) AS points_rank,
player,
points
FROM
TOP_SCORERS
) AS table_1
WHERE
(points_rank <= 3)
In this example, we used RANK to rank each player by points over each season. Then we used a subquery to then return only the top 3 ranked players for each season.
Aggregate functions are available outside of window functions, but can be additionally applied over a specified window.
ANY_VALUE( [ DISTINCT ] <expr1> ) OVER ( [ PARTITION BY <expr2> ] )
AVG( <expr1> ) OVER ( [ PARTITION BY <expr2> ] [ ORDER BY <expr3> [ ASC | DESC ] [ <window_frame> ] ] )
CORR( y , x ) OVER ( [ PARTITION BY <expr3> ] )
COUNT( <expr1> [ , <expr2> ... ] )
OVER ( [ PARTITION BY <expr3> ] [ ORDER BY <expr4> [ ASC | DESC ] [ <window_frame> ] ] )
COUNT_IF( <condition> )
OVER ( [ PARTITION BY <expr1> ] [ ORDER BY <expr2> [ ASC | DESC ] [ <window_frame> ] ] )
COVAR_POP( y , x ) OVER ( [ PARTITION BY <expr1> ] )
COVAR_SAMP( y , x ) OVER ( [ PARTITION BY <expr1> ] )
LISTAGG( [ DISTINCT ] <expr1> [, <delimiter> ] )
[ WITHIN GROUP ( <orderby_clause> ) ]
OVER ( [ PARTITION BY <expr2> ] )
MAX( <expr> ) [ OVER ( [ PARTITION BY <expr1> ] [ ORDER BY <expr2> [ <window_frame> ] ] ) ]
MEDIAN( <expr> ) OVER ( [ PARTITION BY <expr2> ] )
MIN( <expr> ) [ OVER ( [ PARTITION BY <expr1> ] [ ORDER BY <expr2> [ <window_frame> ] ] ) ]
PERCENTILE_CONT( <percentile> ) WITHIN GROUP (ORDER BY <order_by_expr>) OVER ( [ PARTITION BY <expr3> ] )
PERCENTILE_DISC( <percentile> ) WITHIN GROUP (ORDER BY <order_by_expr> ) OVER ( [ PARTITION BY <expr3> ] )
RATIO_TO_REPORT( <expr1> ) [ OVER ( [ PARTITION BY <expr2> ] [ ORDER BY <expr3> ] ) ]
STDDEV( <expr1> ) OVER (
[ PARTITION BY <expr2> ]
[ ORDER BY <expr3> [ ASC | DESC ] [ <window_frame> ] ]
)
STDDEV_POP( <expr1> ) OVER (
[ PARTITION BY <expr2> ]
[ ORDER BY <expr3> [ ASC | DESC ] [ <window_frame> ] ]
)
STDDEV_SAMP( <expr1> ) OVER (
[ PARTITION BY <expr2> ]
[ ORDER BY <expr3> [ ASC | DESC ] [ <window_frame> ] ]
)
SUM( <expr1> ) OVER ( [ PARTITION BY <expr2> ] [ ORDER BY <expr3> [ ASC | DESC ] [ <window_frame> ] ] )
VARIANCE_POP( <expr1> ) OVER (
[ PARTITION BY <expr2> ]
[ ORDER BY <expr3> [ ASC | DESC ] [ <window_frame> ] ]
)
VARIANCE( <expr1> ) OVER (
[ PARTITION BY <expr2> ]
[ ORDER BY <expr3> [ ASC | DESC ] [ <window_frame> ] ]
)
select
season,
player,
points,
SUM(top_scorers.points) OVER (PARTITION BY player ORDER BY season ASC) AS running_total_points
FROM
TOP_SCORERS
ORDER BY PLAYER ASC, SEASON ASC
To find the running total simply use SUM with an OVER clause where you specify your groupings (PARTITION BY), and the order in which to add them (ORDER BY).
This page was built using Count, the first notebook built around SQL. It combines the best features of a SQL IDE, Data Visualization Tool, and Computational Notebooks. In the Count notebook, each cell acts like a CTE, meaning you can reference any other cell in your queries.
This makes not only for far more readable reports (like this one) but also a much faster and more powerful way to do your analysis, essentially turning your analysis into a connected graph of data frames rather than one-off convoluted queries and CSV files. And with a built-in visualization framework, you won't have to export your data to make your charts. Go from raw data to interactive report in one document.