SQL Resources
SQL
Window Functions

Window Functions

Window functions in Snowflake are a way to compute values over a group of rows. They return a single value for each row, in contrast to aggregate functions which return a single value for a group of rows.

Aggregate vs window/analytic functions

SELECT
  letter,
  SUM(number) AS aggregate
FROM
  (
    SELECT
      'A' AS letter,
      2 AS number
    UNION ALL
(    SELECT
      'A' AS letter,
      3 AS number)
    UNION ALL
(    SELECT
      'C' AS letter,
      6 AS number)
  ) AS table_3
GROUP BY
  letter
LETTERAGGREGATE
A5
C6
SELECT
  letter,
  SUM(number) OVER (PARTITION BY letter) AS "WINDOW"
FROM
  (
    SELECT
      'A' AS letter,
      2 AS number
    UNION ALL
(    SELECT
      'A' AS letter,
      3 AS number)
    UNION ALL
(    SELECT
      'C' AS letter,
      6 AS number)
  ) AS table_3
LETTERWINDOW
A5
A5
C6

In the first aggregate example, the resulting data were grouped by letter, but in the second window example, we preserved our rows.

Why use window functions?

Window functions are very powerful, and once you've gotten your head around how to use them, you'll be surprised at what they allow you to do. Some common use cases are:

  • Running/Cumulative Total
  • Moving Average
  • Rank rows by custom criteria and groupings
  • Finding the year-over-year % Change
  • Syntax

    <function> ( <arguments> ) OVER ( [ PARTITION BY <expr1> ] ORDER BY <expr2> [ cumulativeFrame | slidingFrame ] )

    where the PARTITION BY denotes how to GROUP rows into partitions,ORDER BY  how to order the rows in those partitions, and FRAME which determines which rows to consider in those ordered partitions.

    In general, window functions can be grouped into 3 types:

  • Navigation functions: Return the value given a specific location criteria (e.g. first_value)
  • Numbering functions: Assign a number (e.g. rank) to each row based on their position in the specified window
  • Analytic functions: Perform a calculation on a set of values (e.g. sum)
  • Navigation functions

    FIRST_VALUE: Returns the value_expression for the first row in the current window frame.

    FIRST_VALUE( <expr> ) [ { IGNORE | RESPECT } NULLS ]
                          OVER ( [ PARTITION BY <expr1> ] ORDER BY <expr2>  [ { ASC | DESC } ] [ <window_frame> ] )

    LAST_VALUE: Returns the value_expression for the last row in the current window frame.

    LAST_VALUE( <expr> ) [ { IGNORE | RESPECT } NULLS ]
                         OVER ( [ PARTITION BY <expr1> ] ORDER BY <expr2> [ { ASC | DESC } ] [ <window_frame> ] )

    NTH_VALUE: Returns the value_expression for the Nth row of the current window frame.

    NTH_VALUE( <expr> , n ) [ FROM { FIRST | LAST } ] [ { IGNORE | RESPECT } NULLS ]
                            OVER ( [ PARTITION BY <expr1> ] ORDER BY <expr2> [ { ASC | DESC } ] [ <window_frame> ] )
  • LEAD: Returns the value_expression on the subsequent row.
  • LEAD ( <expr> [ , <offset> , <default> ] ) [ { IGNORE | RESPECT } NULLS ] OVER ( [ PARTITION BY <expr1> ] ORDER BY <expr2> [ { ASC | DESC } ] )
    
  • LAG: Returns the value_expression on the preceding row.
  • MODE: Returns the most frequent value for the values within the expression
  • MODE( <expr1> ) OVER ( [ PARTITION BY <expr2> ] )
    

    For these functions we'll use the following demo data:

    SELECT
      'James Harden' AS player,
      2335 AS points,
      2020 AS season
    UNION ALL
    (SELECT
      'Damian Lillard' AS player,
      1978 AS points,
      2020 AS season)
    UNION ALL
    (SELECT
      'Devin Booker' AS player,
      1863 AS points,
      2020 AS season)
    UNION ALL
    (SELECT
      'James Harden' AS player,
      2818 AS points,
      2019 AS season)
    UNION ALL
    (SELECT
      'Paul George' AS player,
      1978 AS points,
      2019 AS season)
    UNION ALL
    (SELECT
      'Kemba Walker' AS player,
      2102 AS points,
      2019 AS season)
    UNION ALL
    (SELECT
      'Damian Lillard' AS player,
      2067 AS points,
      2019 AS season)
    UNION ALL
    (SELECT
      'Devin Booker' AS player,
      1700 AS points,
      2019 AS season)
    UNION ALL
    (SELECT
      'Paul George' AS player,
      1033 AS points,
      2020 AS season)
    UNION ALL
    (SELECT
      'Kemba Walker' AS player,
      1145 AS points,
      2020 AS season)
    PLAYERPOINTSSEASON
    James Harden23352020
    Damian Lillard19782020
    Devin Booker18632020
    James Harden28182019
    Paul George19782019
    Kemba Walker21022019
    Damian Lillard20672019
    Devin Booker17002019
    Paul George10332020
    Kemba Walker11452020

    How to find a year-over-year change:

    SELECT DISTINCT
      player,
      FIRST_VALUE(POINTS) OVER (PARTITION BY PLAYER ORDER BY SEASON ASC) AS first_season,
      LAST_VALUE(POINTS) OVER (PARTITION BY PLAYER ORDER BY SEASON ASC) AS last_season,
      (100 * ((LAST_VALUE(points) OVER (PARTITION BY player ORDER BY season ASC) - FIRST_VALUE(points) OVER (PARTITION BY player ORDER BY season ASC)) / FIRST_VALUE(points) OVER (PARTITION BY player ORDER BY season ASC))) AS per_change  
    FROM
      TOP_SCORERS
    ORDER BY PLAYER
    PLAYERFIRST_SEASONLAST_SEASONPER_CHANGE
    Damian Lillard20671978-4.3058
    Devin Booker170018639.5882
    James Harden28182335-17.1398
    Kemba Walker21021145-45.5281
    Paul George19781033-47.7755

    We used FIRST_VALUE and LAST_VALUE to find the scores for each player in the earliest and most recent seasons of data. Then we computed the percent difference using:

    100 * ((new value - old value) / old value) per_difference
    

    Numbering functions

  • RANK: Returns the rank of each row in the ordered partition (starts at 1).
  • RANK() OVER ( [ PARTITION BY <expr1> ] ORDER BY <expr2> [ { ASC | DESC } ] [ <window_frame> ] )
    
  • DENSE_RANK : Returns the rank, but values of the same value get the same rank (starts at 1).
  • PERCENT_RANK() OVER ( [ PARTITION BY <expr1> ] ORDER BY <expr2> [ { ASC | DESC } ] [ <cumulativeRangeFrame> ] )
    
  • PERCENT_RANK :  Returns the percentile rank of a row.
  • PERCENT_RANK() OVER ( [ PARTITION BY <expr1> ] ORDER BY <expr2> [ { ASC | DESC } ] [ <cumulativeRangeFrame> ] )
    
  • CUME_DIST : Returns the relative rank of a row.
  • CUME_DIST() OVER ( [ PARTITION BY <partition_expr> ] ORDER BY <order_expr>  [ ASC | DESC ] )
    
  • NTILE:  Returns the bucket number after dividing each partition into constant_integer_expression buckets.
  • NTILE( <constant_value> ) OVER ( [ PARTITION BY <expr1> ] ORDER BY <expr2> [ { ASC | DESC } ] )
    
  • ROW_NUMBER : Returns the sequential row number for each ordered partition.
  • ROW_NUMBER() OVER (
      [ PARTITION BY <expr1> [, <expr2> ... ] ]
      ORDER BY <expr3> [ , <expr4> ... ] [ { ASC | DESC } ]
      )
  • WIDTH_BUCKET : Constructs equi-width histograms, in which the histogram range is divided into intervals of identical size, and returns the bucket number into which the value of an expression falls after it has been evaluated.
  • WIDTH_BUCKET( <expr> , <min_value> , <max_value> , <num_buckets> )
    
  • CONDITIONAL_CHANGE_EVENT: Returns a window event number for each row when the value of the expression is different from the value in the previous row.
  • CONDITIONAL_CHANGE_EVENT( <expr1> ) OVER ( [ PARTITION BY <expr2> ] ORDER BY <expr3> )
    
  • CONDITIONAL_TRUE_EVENT: Returns a window event number for each row based on the result of the boolean expression.
  • CONDITIONAL_TRUE_EVENT( <expr1> ) OVER ( [ PARTITION BY <expr2> ] ORDER BY <expr3> )
    

    How to get top 3 results for each group?

    SELECT
     *
    FROM
     (
       SELECT
         season,
         RANK() OVER (PARTITION BY season ORDER BY points DESC) AS points_rank,
         player,
         points
       FROM
         TOP_SCORERS
     ) AS table_1
    WHERE
     (points_rank <= 3)
    SEASONPOINTS_RANKPLAYERPOINTS
    20191James Harden2818
    20192Kemba Walker2102
    20193Damian Lillard2067
    20201James Harden2335
    20202Damian Lillard1978
    20203Devin Booker1863

    In this example, we used RANK to rank each player by points over each season. Then we used a subquery to then return only the top 3 ranked players for each season.

    Aggregate functions

    Aggregate functions are available outside of window functions, but can be additionally applied over a specified window.

  • ANY_VALUE: Returns expression for some row chosen from the group. Basically a random selection from an expression.
  • ANY_VALUE( [ DISTINCT ] <expr1> ) OVER ( [ PARTITION BY <expr2> ] )
    
  • AVG: Returns the average of the non-NULL input values.
  • AVG( <expr1> ) OVER ( [ PARTITION BY <expr2> ] [ ORDER BY <expr3> [ ASC | DESC ] [ <window_frame> ] ] )
    
  • CORR: Returns the coefficient (COVAR_POP(y, x) / (STDDEV_POP(x) * STDDEV_POP(y))) of correlation for a set of number pairs.
  • CORR( y , x ) OVER ( [ PARTITION BY <expr3> ] )
    
  • COUNT: Returns the number of [distinct] elements in expression
  • COUNT( <expr1> [ , <expr2> ... ] )
        OVER ( [ PARTITION BY <expr3> ] [ ORDER BY <expr4> [ ASC | DESC ] [ <window_frame> ] ] )
  • COUNT_IF: Returns the count of True values for expression
  • COUNT_IF( <condition> )
        OVER ( [ PARTITION BY <expr1> ] [ ORDER BY <expr2> [ ASC | DESC ] [ <window_frame> ] ] )
  • COVAR_POP: Returns the population covariance of a set of numbers
  • COVAR_POP( y , x ) OVER ( [ PARTITION BY <expr1> ] )
    
  • COVAR_SAMP:Returns the sample covariance of a set of numbers
  • COVAR_SAMP( y , x ) OVER ( [ PARTITION BY <expr1> ] )
    
  • LISTAGG: Returns the concatenated input values, separated by the delimiter string.
  • LISTAGG( [ DISTINCT ] <expr1> [, <delimiter> ] )
        [ WITHIN GROUP ( <orderby_clause> ) ]
        OVER ( [ PARTITION BY <expr2> ] )
  • MAX: Returns the maximum non-NULL value of the expression.
  • MAX( <expr> ) [ OVER ( [ PARTITION BY <expr1> ] [ ORDER BY <expr2> [ <window_frame> ] ] ) ]
    
  • MEDIAN: Returns the median of a set of values.
  • MEDIAN( <expr> ) OVER ( [ PARTITION BY <expr2> ] )
    
  • MIN: Returns the minimum non-NULL value of the expression.
  • MIN( <expr> ) [ OVER ( [ PARTITION BY <expr1> ] [ ORDER BY <expr2> [ <window_frame> ] ] ) ]
    
  • PERCENTILE_CONT: Returns the percentile value based on a continuous distribution of the input column
  • PERCENTILE_CONT( <percentile> ) WITHIN GROUP (ORDER BY <order_by_expr>) OVER ( [ PARTITION BY <expr3> ] )
    
  • PERCENTILE_DISC: Returns the percentile value based on a discrete distribution of the input column
  • PERCENTILE_DISC( <percentile> ) WITHIN GROUP (ORDER BY <order_by_expr> ) OVER ( [ PARTITION BY <expr3> ] )
    
  • RATIO_TO_REPORT: Returns the ratio of a value within a group to the sum of the values within a group.
  • RATIO_TO_REPORT( <expr1> ) [ OVER ( [ PARTITION BY <expr2> ] [ ORDER BY <expr3> ] ) ]
    
  • STDEV: Returns the sample standard deviation (square root of sample variance) of non-NULL values.
  • STDDEV( <expr1> ) OVER (
                           [ PARTITION BY <expr2> ]
                           [ ORDER BY <expr3> [ ASC | DESC ] [ <window_frame> ] ]
                           )
  • STDEV_POP: Returns population standard deviation of values
  • STDDEV_POP( <expr1> ) OVER (
                               [ PARTITION BY <expr2> ]
                               [ ORDER BY <expr3> [ ASC | DESC ] [ <window_frame> ] ]
                               )
  • STDEV_SAMP:  Returns sample standard deviation of values
  • STDDEV_SAMP( <expr1> ) OVER (
                                [ PARTITION BY <expr2> ]
                                [ ORDER BY <expr3> [ ASC | DESC ] [ <window_frame> ] ]
                                )
  • SUM: Returns the sum of all non-null values
  • SUM( <expr1> ) OVER ( [ PARTITION BY <expr2> ] [ ORDER BY <expr3> [ ASC | DESC ] [ <window_frame> ] ] )
    
  • VAR_POP: Returns the population variance of results
  • VARIANCE_POP( <expr1> ) OVER (
                                 [ PARTITION BY <expr2> ]
                                 [ ORDER BY <expr3> [ ASC | DESC ] [ <window_frame> ] ]
                                 )
  • VARIANCE | VAR_SAMP: Returns the sample variance of results
  • VARIANCE( <expr1> ) OVER (
                             [ PARTITION BY <expr2> ]
                             [ ORDER BY <expr3> [ ASC | DESC ] [ <window_frame> ] ]
                             )


    There are more, less commonly used window functions available here.

    How to find a running total?

    SELECT
      season,
      player,
      points,
      SUM(top_scorers.points) OVER (PARTITION BY player ORDER BY season ASC) AS running_total_points
    FROM
      TOP_SCORERS
    ORDER BY PLAYER ASC, SEASON ASC

    SEASONPLAYERPOINTSRUNNING_TOTAL_POINTS
    2019Damian Lillard20672067
    2020Damian Lillard19784045
    2019Devin Booker17001700
    2020Devin Booker18633563
    2019James Harden28182818
    2020James Harden23355153
    2019Kemba Walker21022102
    2020Kemba Walker11453247
    2019Paul George19781978
    2020Paul George10333011

    To find the running total simply use SUM with an OVER clause where you specify your groupings (PARTITION BY), and the order in which to add them (ORDER BY).

    How We Built This

    This page was built using Count, the first notebook built around SQL. It combines the best features of a SQL IDE, Data Visualization Tool, and Computational Notebooks. In the Count notebook, each cell acts like a CTE, meaning you can reference any other cell in your queries.

    This makes not only for far more readable reports (like this one) but also a much faster and more powerful way to do your analysis, essentially turning your analysis into a connected graph of data frames rather than one-off convoluted queries and CSV files. And with a built-in visualization framework, you won't have to export your data to make your charts. Go from raw data to interactive report in one document.

    Subscribe to newsletter

    Subscribe to receive the latest blog posts to your inbox every week.

    By subscribing you agree to our Privacy Policy.
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.

    Start solving your organization's biggest problems with Count today.

    Stay up to date with all things count, data and problem solving.