SELECT knowledge FROM sql_resources WHERE category='snowflake' AND slug='window-functions'

Window Functions

Window functions in Snowflake are a way to compute values over a group of rows. They return a single value for each row, in contrast to aggregate functions which return a single value for a group of rows.

Aggregate vs window/analytic functions

LETTERAGGREGATE
A5
C6
LETTERWINDOW
A5
A5
C6

In the first aggregate example, the resulting data were grouped by letter, but in the second window example, we preserved our rows.

Loading code...

Why use window functions?

Window functions are very powerful, and once you've gotten your head around how to use them, you'll be surprised at what they allow you to do. Some common use cases are:

Loading code...
  • Running/Cumulative Total
  • Moving Average
  • Rank rows by custom criteria and groupings
  • Finding the year-over-year % Change

where the PARTITION BY denotes how to GROUP rows into partitions,ORDER BY  how to order the rows in those partitions, and FRAME which determines which rows to consider in those ordered partitions.

In general, window functions can be grouped into 3 types:

Loading code...
  • Navigation functions: Return the value given a specific location criteria (e.g. first_value)
  • Numbering functions: Assign a number (e.g. rank) to each row based on their position in the specified window
  • Analytic functions: Perform a calculation on a set of values (e.g. sum)

FIRST_VALUE: Returns the value_expression for the first row in the current window frame.

LAST_VALUE: Returns the value_expression for the last row in the current window frame.

NTH_VALUE: Returns the value_expression for the Nth row of the current window frame.

  • LEAD: Returns the value_expression on the subsequent row.
  • LAG: Returns the value_expression on the preceding row.
  • MODE: Returns the most frequent value for the values within the expression

For these functions we'll use the following demo data:

Loading code...
PLAYERPOINTSSEASON
James Harden23352020
Damian Lillard19782020
Devin Booker18632020
James Harden28182019
Paul George19782019
Kemba Walker21022019
Damian Lillard20672019
Devin Booker17002019
Paul George10332020
Kemba Walker11452020

How to find a year-over-year change:

Loading code...
PLAYERFIRST_SEASONLAST_SEASONPER_CHANGE
Damian Lillard20671978-4.3058
Devin Booker170018639.5882
James Harden28182335-17.1398
Kemba Walker21021145-45.5281
Paul George19781033-47.7755

We used FIRST_VALUE and LAST_VALUE to find the scores for each player in the earliest and most recent seasons of data. Then we computed the percent difference using:

Loading code...

Numbering functions

  • RANK: Returns the rank of each row in the ordered partition (starts at 1).
  • DENSE_RANK : Returns the rank, but values of the same value get the same rank (starts at 1).
  • PERCENT_RANK :  Returns the percentile rank of a row.
  • CUME_DIST : Returns the relative rank of a row.
  • NTILE:  Returns the bucket number after dividing each partition into constant_integer_expression buckets.
  • ROW_NUMBER : Returns the sequential row number for each ordered partition.
  • WIDTH_BUCKET : Constructs equi-width histograms, in which the histogram range is divided into intervals of identical size, and returns the bucket number into which the value of an expression falls after it has been evaluated.
  • CONDITIONAL_CHANGE_EVENT: Returns a window event number for each row when the value of the expression is different from the value in the previous row.
  • CONDITIONAL_TRUE_EVENT: Returns a window event number for each row based on the result of the boolean expression.

How to get top 3 results for each group?

SEASONPOINTS_RANKPLAYERPOINTS
20191James Harden2818
20192Kemba Walker2102
20193Damian Lillard2067
20201James Harden2335
20202Damian Lillard1978
20203Devin Booker1863

In this example, we used RANK to rank each player by points over each season. Then we used a subquery to then return only the top 3 ranked players for each season.

Loading code...

Aggregate functions

Aggregate functions are available outside of window functions, but can be additionally applied over a specified window.

  • ANY_VALUE: Returns expression for some row chosen from the group. Basically a random selection from an expression.
  • AVG: Returns the average of the non-NULL input values.
  • CORR: Returns the coefficient (COVAR_POP(y, x) / (STDDEV_POP(x) * STDDEV_POP(y))) of correlation for a set of number pairs.
  • COUNT: Returns the number of [distinct] elements in expression
  • COUNT_IF: Returns the count of True values for expression
  • COVAR_POP: Returns the population covariance of a set of numbers
  • COVAR_SAMP:Returns the sample covariance of a set of numbers
  • LISTAGG: Returns the concatenated input values, separated by the delimiter string.
  • MAX: Returns the maximum non-NULL value of the expression.
  • MEDIAN: Returns the median of a set of values.
  • MIN: Returns the minimum non-NULL value of the expression.
  • PERCENTILE_CONT: Returns the percentile value based on a continuous distribution of the input column
  • PERCENTILE_DISC: Returns the percentile value based on a discrete distribution of the input column
  • RATIO_TO_REPORT: Returns the ratio of a value within a group to the sum of the values within a group.
  • STDEV: Returns the sample standard deviation (square root of sample variance) of non-NULL values.
  • STDEV_POP: Returns population standard deviation of values
  • STDEV_SAMP:  Returns sample standard deviation of values
  • SUM: Returns the sum of all non-null values
  • VAR_POP: Returns the population variance of results
  • VARIANCE | VAR_SAMP: Returns the sample variance of results

There are more, less commonly used window functions available here.

How to find a running total?

SEASONPLAYERPOINTSRUNNING_TOTAL_POINTS
2019Damian Lillard20672067
2020Damian Lillard19784045
2019Devin Booker17001700
2020Devin Booker18633563
2019James Harden28182818
2020James Harden23355153
2019Kemba Walker21022102
2020Kemba Walker11453247
2019Paul George19781978
2020Paul George10333011

To find the running total simply use SUM with an OVER clause where you specify your groupings (PARTITION BY), and the order in which to add them (ORDER BY).

How We Built This

This page was built using Count, the first notebook built around SQL. It combines the best features of a SQL IDE, Data Visualization Tool, and Computational Notebooks. In the Count notebook, each cell acts like a CTE, meaning you can reference any other cell in your queries.

This makes not only for far more readable reports (like this one) but also a much faster and more powerful way to do your analysis, essentially turning your analysis into a connected graph of data frames rather than one-off convoluted queries and CSV files. And with a built-in visualization framework, you won't have to export your data to make your charts. Go from raw data to interactive report in one document.

Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...