SQL Resources
SQL
Moving averages

Moving averages

Moving averages in Snowflake are an excellent way to discover patterns in your data.

For example, the chart below shows the daily temperature in London as points, and a 30 day moving average as a line. Plotting both lets us see the variation of temperature day-by-day, without obscuring longer-term patterns.

SELECT
  DATE, TEMP,
  AVG(TEMP) OVER (
    ORDER BY DATE ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
  ) "30_DAY_AVG_TEMP"
FROM PUBLIC.LONDON_WEATHER 
DATETEMP30_DAY_AVG_TEMP
2016-01-0141.141.1
2016-01-0249.845.45
2016-01-034645.633333333
2016-01-0445.445.575
2016-01-0545.745.6
2016-01-0643.745.283333333
2016-01-0745.145.257142857
2016-01-0842.144.8625
2016-01-0947.545.155555556
2016-01-1042.944.93
2016-01-1142.944.745454545
2016-01-1241.944.508333333
2016-01-1340.444.192307692
2016-01-143943.821428571
2016-01-1537.243.38
2016-01-1636.142.925
2016-01-1736.342.535294118
2016-01-1837.742.266666667
2016-01-1933.341.794736842
2016-01-2032.541.33
2016-01-2137.641.152380952
2016-01-2245.641.354545455
2016-01-234541.513043478
2016-01-2453.642.016666667
2016-01-2552.142.42
2016-01-2648.442.65
2016-01-2752.743.022222222
2016-01-2843.443.035714286
2016-01-2952.143.348275862
2016-01-3045.943.433333333
2016-01-3148.343.673333333
2016-02-0154.143.816666667
2016-02-0246.843.843333333
2016-02-0341.843.723333333
2016-02-0449.543.85
2016-02-0550.644.08
2016-02-0650.644.263333333
2016-02-0745.244.366666667
2016-02-0846.344.326666667
2016-02-0941.644.283333333
2016-02-1042.244.26
2016-02-1137.844.123333333
2016-02-1238.744.066666667
2016-02-1338.944.063333333
2016-02-1438.744.113333333
2016-02-1536.644.13
2016-02-1636.544.136666667
2016-02-1739.644.2
2016-02-1839.344.4
2016-02-1941.144.686666667
2016-02-2051.445.146666667
2016-02-215545.46
2016-02-2247.745.55
2016-02-2342.345.173333333
2016-02-2438.344.713333333
2016-02-2535.844.293333333
2016-02-2639.143.84
2016-02-2740.243.733333333
2016-02-2840.643.35
2016-02-2940.643.173333333
2016-03-0147.843.156666667
2016-03-024242.753333333
2016-03-0343.342.636666667
2016-03-0441.242.616666667
2016-03-0537.442.213333333
2016-03-0638.941.823333333
2016-03-0739.641.456666667
2016-03-0838.841.243333333
2016-03-0946.241.24
2016-03-1043.741.31
2016-03-1140.841.263333333
2016-03-1243.241.443333333
2016-03-1342.741.576666667
2016-03-144441.746666667
2016-03-1542.941.886666667
2016-03-1645.242.173333333
2016-03-1743.842.416666667
2016-03-1841.142.466666667
2016-03-1944.342.633333333
2016-03-2045.342.773333333
2016-03-2147.742.65
2016-03-2245.742.34
2016-03-2347.442.33
2016-03-2446.542.47
2016-03-2550.842.886666667
2016-03-264943.326666667
2016-03-2746.343.566666667
2016-03-2846.843.786666667
2016-03-2945.843.96
2016-03-3047.144.176666667
2016-03-3147.144.153333333
2016-04-0145.944.283333333
2016-04-0250.344.516666667
2016-04-0351.844.87
2016-04-0450.845.316666667
2016-04-0551.145.723333333
2016-04-0647.745.993333333
2016-04-0744.646.186666667
2016-04-0847.746.236666667
2016-04-0946.346.323333333

Step-by-Step

Moving averages rely on the use of Window Functions, which you can read more about here.  

To find a running total we'll use the AVG window function:

AVG( <expr1> ) OVER ( [ PARTITION BY <expr2> ] [ ORDER BY <expr3> [ ASC | DESC ] [ <window_frame> ] ] )

Where:

  • expr1 is an expression that evaluates to a numeric data type (INTEGER, FLOAT, DECIMAL, etc.).
  • expr2 is the optional expression to partition by.
  • expr3 is the optional expression to order by within each partition. (This does not control the order of the entire query output.)
  • Or, written for a human:

    AVG(column) OVER (PARTITION BY col_to_group_by, ORDER BY column_to_order_by)
    

    In the following example we'll take the running total of hours of Netflix I watched in a given week:

    SELECT 
      DATE_TRUNC('WEEK', START_TIME) WEEK,
      DATE_PART('WEEKDAY', START_TIME) DAY, 
      SUM(DURATION) / (60 * 60) DAILY_HOURS_WATCHED 
    FROM 
      PUBLIC.NETFLIX 
    WHERE 
      WEEK = '2018-11-26'
    GROUP BY WEEK, DAY
    ORDER BY WEEK, DAY
    WEEKDAYDAILY_HOURS_WATCHED
    2018-11-26T00:00:00.000Z04.412777778
    2018-11-26T00:00:00.000Z11.467222222
    2018-11-26T00:00:00.000Z20.6561111111
    2018-11-26T00:00:00.000Z40.5063888889
    2018-11-26T00:00:00.000Z50.5261111111
    2018-11-26T00:00:00.000Z60.8688888889

    Now if I want to find the moving 3-day average of hours watched that week, I can do the following:

    SELECT
      DAY, DAILY_HOURS_WATCHED,
      AVG(DAILY_HOURS_WATCHED) OVER (
        ORDER BY DAY ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
      ) "3_DAY_AVG"
    FROM ONE_WEEK
    DAYDAILY_HOURS_WATCHED3_DAY_AVG
    04.4127777784.412777778
    11.4672222222.94
    20.65611111112.178703704
    40.50638888890.8765740741
    50.52611111110.5628703704
    60.86888888890.6337962963

    We can see our new 3_DAY_AVG update as the average between the current row and the 2 previous rows.

    To check we can make sure that day 2 is the average of Day 0, Day 1, and Day 2 hours watched:

    SELECT (4.413 + 1.467 + 0.656) / 3 AS day_2_result
    
    DAY_2_RESULT
    2.178666667

    This example had no partition, but if we wanted to compare my viewing habits across two different weeks, we would need to PARTITION BY week:

    SELECT 
      DATE_TRUNC('WEEK', START_TIME) WEEK,
      DATE_PART('WEEKDAY', START_TIME) DAY, 
      SUM(DURATION) / (60 * 60) DAILY_HOURS_WATCHED 
    FROM 
      PUBLIC.NETFLIX 
    WHERE 
      WEEK IN ('2018-11-19', '2018-11-26')
    GROUP BY WEEK, DAY
    ORDER BY WEEK, DAY
    WEEKDAYDAILY_HOURS_WATCHED
    2018-11-19T00:00:00.000Z01.326666667
    2018-11-19T00:00:00.000Z10.4775
    2018-11-19T00:00:00.000Z20.8708333333
    2018-11-19T00:00:00.000Z60.3
    2018-11-26T00:00:00.000Z04.412777778
    2018-11-26T00:00:00.000Z11.467222222
    2018-11-26T00:00:00.000Z20.6561111111
    2018-11-26T00:00:00.000Z40.5063888889
    2018-11-26T00:00:00.000Z50.5261111111
    2018-11-26T00:00:00.000Z60.8688888889

    And now to find the running total across the two weeks

    SELECT 
      WEEK, DAY, DAILY_HOURS_WATCHED,
      AVG(DAILY_HOURS_WATCHED) OVER(
        PARTITION BY WEEK ORDER BY DAY ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
      ) "7_DAY_AVG"
    FROM TWO_WEEKS
    ORDER BY WEEK, DAY
    WEEKDAYDAILY_HOURS_WATCHED7_DAY_AVG
    2018-11-19T00:00:00.000Z01.3266666671.326666667
    2018-11-19T00:00:00.000Z10.47750.9020833333
    2018-11-19T00:00:00.000Z20.87083333330.8916666667
    2018-11-19T00:00:00.000Z60.30.74375
    2018-11-26T00:00:00.000Z04.4127777784.412777778
    2018-11-26T00:00:00.000Z11.4672222222.94
    2018-11-26T00:00:00.000Z20.65611111112.178703704
    2018-11-26T00:00:00.000Z40.50638888891.760625
    2018-11-26T00:00:00.000Z50.52611111111.513722222
    2018-11-26T00:00:00.000Z60.86888888891.40625

    Now we see the RUNNING_TOTAL restart for the 2nd week.  

    How We Built This

    This page was built using Count, the first notebook built around SQL. It combines the best features of a SQL IDE, Data Visualization Tool, and Computational Notebooks. In the Count notebook, each cell acts like a CTE, meaning you can reference any other cell in your queries.

    This makes not only for far more readable reports (like this one) but also a much faster and more powerful way to do your analysis, essentially turning your analysis into a connected graph of data frames rather than one-off convoluted queries and CSV files. And with a built-in visualization framework, you won't have to export your data to make your charts. Go from raw data to interactive report in one document.

    Subscribe to newsletter

    Subscribe to receive the latest blog posts to your inbox every week.

    By subscribing you agree to our Privacy Policy.
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.

    Start solving your organization's biggest problems with Count today.

    Stay up to date with all things count, data and problem solving.