SELECT knowledge FROM sql_resources WHERE category='snowflake' AND slug='moving-average'

Moving averages

Moving averages in Snowflake are an excellent way to discover patterns in your data.

For example, the chart below shows the daily temperature in London as points, and a 30 day moving average as a line. Plotting both lets us see the variation of temperature day-by-day, without obscuring longer-term patterns.

Loading code...
DATETEMP30_DAY_AVG_TEMP
2016-01-0141.141.1
2016-01-0249.845.45
2016-01-034645.633333333
2016-01-0445.445.575
2016-01-0545.745.6
2016-01-0643.745.283333333
2016-01-0745.145.257142857
2016-01-0842.144.8625
2016-01-0947.545.155555556
2016-01-1042.944.93
2016-01-1142.944.745454545
2016-01-1241.944.508333333
2016-01-1340.444.192307692
2016-01-143943.821428571
2016-01-1537.243.38
2016-01-1636.142.925
2016-01-1736.342.535294118
2016-01-1837.742.266666667
2016-01-1933.341.794736842
2016-01-2032.541.33
2016-01-2137.641.152380952
2016-01-2245.641.354545455
2016-01-234541.513043478
2016-01-2453.642.016666667
2016-01-2552.142.42
2016-01-2648.442.65
2016-01-2752.743.022222222
2016-01-2843.443.035714286
2016-01-2952.143.348275862
2016-01-3045.943.433333333
2016-01-3148.343.673333333
2016-02-0154.143.816666667
2016-02-0246.843.843333333
2016-02-0341.843.723333333
2016-02-0449.543.85
2016-02-0550.644.08
2016-02-0650.644.263333333
2016-02-0745.244.366666667
2016-02-0846.344.326666667
2016-02-0941.644.283333333
2016-02-1042.244.26
2016-02-1137.844.123333333
2016-02-1238.744.066666667
2016-02-1338.944.063333333
2016-02-1438.744.113333333
2016-02-1536.644.13
2016-02-1636.544.136666667
2016-02-1739.644.2
2016-02-1839.344.4
2016-02-1941.144.686666667
2016-02-2051.445.146666667
2016-02-215545.46
2016-02-2247.745.55
2016-02-2342.345.173333333
2016-02-2438.344.713333333
2016-02-2535.844.293333333
2016-02-2639.143.84
2016-02-2740.243.733333333
2016-02-2840.643.35
2016-02-2940.643.173333333
2016-03-0147.843.156666667
2016-03-024242.753333333
2016-03-0343.342.636666667
2016-03-0441.242.616666667
2016-03-0537.442.213333333
2016-03-0638.941.823333333
2016-03-0739.641.456666667
2016-03-0838.841.243333333
2016-03-0946.241.24
2016-03-1043.741.31
2016-03-1140.841.263333333
2016-03-1243.241.443333333
2016-03-1342.741.576666667
2016-03-144441.746666667
2016-03-1542.941.886666667
2016-03-1645.242.173333333
2016-03-1743.842.416666667
2016-03-1841.142.466666667
2016-03-1944.342.633333333
2016-03-2045.342.773333333
2016-03-2147.742.65
2016-03-2245.742.34
2016-03-2347.442.33
2016-03-2446.542.47
2016-03-2550.842.886666667
2016-03-264943.326666667
2016-03-2746.343.566666667
2016-03-2846.843.786666667
2016-03-2945.843.96
2016-03-3047.144.176666667
2016-03-3147.144.153333333
2016-04-0145.944.283333333
2016-04-0250.344.516666667
2016-04-0351.844.87
2016-04-0450.845.316666667
2016-04-0551.145.723333333
2016-04-0647.745.993333333
2016-04-0744.646.186666667
2016-04-0847.746.236666667
2016-04-0946.346.323333333

Step-by-Step

Moving averages rely on the use of Window Functions, which you can read more about here.  

To find a running total we'll use the AVG window function:

Loading code...
  • expr1 is an expression that evaluates to a numeric data type (INTEGER, FLOAT, DECIMAL, etc.).
  • expr2 is the optional expression to partition by.
  • expr3 is the optional expression to order by within each partition. (This does not control the order of the entire query output.)

Or, written for a human:

Loading code...

In the following example we'll take the running total of hours of Netflix I watched in a given week:

Loading code...
WEEKDAYDAILY_HOURS_WATCHED
2018-11-26T00:00:00.000Z04.412777778
2018-11-26T00:00:00.000Z11.467222222
2018-11-26T00:00:00.000Z20.6561111111
2018-11-26T00:00:00.000Z40.5063888889
2018-11-26T00:00:00.000Z50.5261111111
2018-11-26T00:00:00.000Z60.8688888889

Now if I want to find the moving 3-day average of hours watched that week, I can do the following:

Loading code...
DAYDAILY_HOURS_WATCHED3_DAY_AVG
04.4127777784.412777778
11.4672222222.94
20.65611111112.178703704
40.50638888890.8765740741
50.52611111110.5628703704
60.86888888890.6337962963

We can see our new 3_DAY_AVG update as the average between the current row and the 2 previous rows.

To check we can make sure that day 2 is the average of Day 0, Day 1, and Day 2 hours watched:

Loading code...
DAY_2_RESULT
2.178666667

This example had no partition, but if we wanted to compare my viewing habits across two different weeks, we would need to PARTITION BY week:

Loading code...
WEEKDAYDAILY_HOURS_WATCHED
2018-11-19T00:00:00.000Z01.326666667
2018-11-19T00:00:00.000Z10.4775
2018-11-19T00:00:00.000Z20.8708333333
2018-11-19T00:00:00.000Z60.3
2018-11-26T00:00:00.000Z04.412777778
2018-11-26T00:00:00.000Z11.467222222
2018-11-26T00:00:00.000Z20.6561111111
2018-11-26T00:00:00.000Z40.5063888889
2018-11-26T00:00:00.000Z50.5261111111
2018-11-26T00:00:00.000Z60.8688888889

And now to find the running total across the two weeks

WEEKDAYDAILY_HOURS_WATCHED7_DAY_AVG
2018-11-19T00:00:00.000Z01.3266666671.326666667
2018-11-19T00:00:00.000Z10.47750.9020833333
2018-11-19T00:00:00.000Z20.87083333330.8916666667
2018-11-19T00:00:00.000Z60.30.74375
2018-11-26T00:00:00.000Z04.4127777784.412777778
2018-11-26T00:00:00.000Z11.4672222222.94
2018-11-26T00:00:00.000Z20.65611111112.178703704
2018-11-26T00:00:00.000Z40.50638888891.760625
2018-11-26T00:00:00.000Z50.52611111111.513722222
2018-11-26T00:00:00.000Z60.86888888891.40625

Now we see the RUNNING_TOTAL restart for the 2nd week.  

How We Built This

This page was built using Count, the first notebook built around SQL. It combines the best features of a SQL IDE, Data Visualization Tool, and Computational Notebooks. In the Count notebook, each cell acts like a CTE, meaning you can reference any other cell in your queries.

This makes not only for far more readable reports (like this one) but also a much faster and more powerful way to do your analysis, essentially turning your analysis into a connected graph of data frames rather than one-off convoluted queries and CSV files. And with a built-in visualization framework, you won't have to export your data to make your charts. Go from raw data to interactive report in one document.

Loading code...