SQL Resources/Snowflake/Moving averages

Moving averages

Moving averages in Snowflake are an excellent way to discover patterns in your data.

For example, the chart below shows the daily temperature in London as points, and a 30 day moving average as a line. Plotting both lets us see the variation of temperature day-by-day, without obscuring longer-term patterns.

SELECT
  DATE, TEMP,
  AVG(TEMP) OVER (
    ORDER BY DATE ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
  ) "30_DAY_AVG_TEMP"
FROM PUBLIC.LONDON_WEATHER 
DATE
2016-01-01
2016-01-02
2016-01-03
2016-01-04
2016-01-05
2016-01-06
2016-01-07
2016-01-08
2016-01-09
2016-01-10
2016-01-11
2016-01-12
2016-01-13
2016-01-14
2016-01-15
2016-01-16
2016-01-17
2016-01-18
2016-01-19
2016-01-20
2016-01-21
2016-01-22
2016-01-23
2016-01-24
2016-01-25
2016-01-26
2016-01-27
2016-01-28
2016-01-29
2016-01-30
2016-01-31
2016-02-01
2016-02-02
2016-02-03
2016-02-04
2016-02-05
2016-02-06
2016-02-07
2016-02-08
2016-02-09
2016-02-10
2016-02-11
2016-02-12
2016-02-13
2016-02-14
2016-02-15
2016-02-16
2016-02-17
2016-02-18
2016-02-19
2016-02-20
2016-02-21
2016-02-22
2016-02-23
2016-02-24
2016-02-25
2016-02-26
2016-02-27
2016-02-28
2016-02-29
2016-03-01
2016-03-02
2016-03-03
2016-03-04
2016-03-05
2016-03-06
2016-03-07
2016-03-08
2016-03-09
2016-03-10
2016-03-11
2016-03-12
2016-03-13
2016-03-14
2016-03-15
2016-03-16
2016-03-17
2016-03-18
2016-03-19
2016-03-20
2016-03-21
2016-03-22
2016-03-23
2016-03-24
2016-03-25
2016-03-26
2016-03-27
2016-03-28
2016-03-29
2016-03-30
2016-03-31
2016-04-01
2016-04-02
2016-04-03
2016-04-04
2016-04-05
2016-04-06
2016-04-07
2016-04-08
2016-04-09
TEMP
41.1
49.8
46
45.4
45.7
43.7
45.1
42.1
47.5
42.9
42.9
41.9
40.4
39
37.2
36.1
36.3
37.7
33.3
32.5
37.6
45.6
45
53.6
52.1
48.4
52.7
43.4
52.1
45.9
48.3
54.1
46.8
41.8
49.5
50.6
50.6
45.2
46.3
41.6
42.2
37.8
38.7
38.9
38.7
36.6
36.5
39.6
39.3
41.1
51.4
55
47.7
42.3
38.3
35.8
39.1
40.2
40.6
40.6
47.8
42
43.3
41.2
37.4
38.9
39.6
38.8
46.2
43.7
40.8
43.2
42.7
44
42.9
45.2
43.8
41.1
44.3
45.3
47.7
45.7
47.4
46.5
50.8
49
46.3
46.8
45.8
47.1
47.1
45.9
50.3
51.8
50.8
51.1
47.7
44.6
47.7
46.3
30_DAY_AVG_TEMP
41.1
45.45
45.633333333
45.575
45.6
45.283333333
45.257142857
44.8625
45.155555556
44.93
44.745454545
44.508333333
44.192307692
43.821428571
43.38
42.925
42.535294118
42.266666667
41.794736842
41.33
41.152380952
41.354545455
41.513043478
42.016666667
42.42
42.65
43.022222222
43.035714286
43.348275862
43.433333333
43.673333333
43.816666667
43.843333333
43.723333333
43.85
44.08
44.263333333
44.366666667
44.326666667
44.283333333
44.26
44.123333333
44.066666667
44.063333333
44.113333333
44.13
44.136666667
44.2
44.4
44.686666667
45.146666667
45.46
45.55
45.173333333
44.713333333
44.293333333
43.84
43.733333333
43.35
43.173333333
43.156666667
42.753333333
42.636666667
42.616666667
42.213333333
41.823333333
41.456666667
41.243333333
41.24
41.31
41.263333333
41.443333333
41.576666667
41.746666667
41.886666667
42.173333333
42.416666667
42.466666667
42.633333333
42.773333333
42.65
42.34
42.33
42.47
42.886666667
43.326666667
43.566666667
43.786666667
43.96
44.176666667
44.153333333
44.283333333
44.516666667
44.87
45.316666667
45.723333333
45.993333333
46.186666667
46.236666667
46.323333333

Step-by-Step

Moving averages rely on the use of Window Functions, which you can read more about here.

To find a running total we'll use the AVG window function:

AVG( <expr1> ) OVER ( [ PARTITION BY <expr2> ] [ ORDER BY <expr3> [ ASC | DESC ] [ <window_frame> ] ] )

Where:

  • expr1 is an expression that evaluates to a numeric data type (INTEGER, FLOAT, DECIMAL, etc.).
  • expr2 is the optional expression to partition by.
  • expr3 is the optional expression to order by within each partition. (This does not control the order of the entire query output.)

Or, written for a human:

AVG(column) OVER (PARTITION BY col_to_group_by, ORDER BY column_to_order_by)

In the following example we'll take the running total of hours of Netflix I watched in a given week:

SELECT 
  DATE_TRUNC('WEEK', START_TIME) WEEK,
  DATE_PART('WEEKDAY', START_TIME) DAY, 
  SUM(DURATION) / (60 * 60) DAILY_HOURS_WATCHED 
FROM 
  PUBLIC.NETFLIX 
WHERE 
  WEEK = '2018-11-26'
GROUP BY WEEK, DAY
ORDER BY WEEK, DAY
WEEK
2018-11-26T00:00:00.000Z
2018-11-26T00:00:00.000Z
2018-11-26T00:00:00.000Z
2018-11-26T00:00:00.000Z
2018-11-26T00:00:00.000Z
2018-11-26T00:00:00.000Z
DAY
0
1
2
4
5
6
DAILY_HOURS_WATCHED
4.412777778
1.467222222
0.6561111111
0.5063888889
0.5261111111
0.8688888889

Now if I want to find the moving 3-day average of hours watched that week, I can do the following:

SELECT
  DAY, DAILY_HOURS_WATCHED,
  AVG(DAILY_HOURS_WATCHED) OVER (
    ORDER BY DAY ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ) "3_DAY_AVG"
FROM ONE_WEEK
DAY
0
1
2
4
5
6
DAILY_HOURS_WATCHED
4.412777778
1.467222222
0.6561111111
0.5063888889
0.5261111111
0.8688888889
3_DAY_AVG
4.412777778
2.94
2.178703704
0.8765740741
0.5628703704
0.6337962963

We can see our new 3_DAY_AVG update as the average between the current row and the 2 previous rows.

To check we can make sure that day 2 is the average of Day 0, Day 1, and Day 2 hours watched:

SELECT (4.413 + 1.467 + 0.656) / 3 AS day_2_result
DAY_2_RESULT
2.178666667

This example had no partition, but if we wanted to compare my viewing habits across two different weeks, we would need to PARTITION BY week:

SELECT 
  DATE_TRUNC('WEEK', START_TIME) WEEK,
  DATE_PART('WEEKDAY', START_TIME) DAY, 
  SUM(DURATION) / (60 * 60) DAILY_HOURS_WATCHED 
FROM 
  PUBLIC.NETFLIX 
WHERE 
  WEEK IN ('2018-11-19', '2018-11-26')
GROUP BY WEEK, DAY
ORDER BY WEEK, DAY
WEEK
2018-11-19T00:00:00.000Z
2018-11-19T00:00:00.000Z
2018-11-19T00:00:00.000Z
2018-11-19T00:00:00.000Z
2018-11-26T00:00:00.000Z
2018-11-26T00:00:00.000Z
2018-11-26T00:00:00.000Z
2018-11-26T00:00:00.000Z
2018-11-26T00:00:00.000Z
2018-11-26T00:00:00.000Z
DAY
0
1
2
6
0
1
2
4
5
6
DAILY_HOURS_WATCHED
1.326666667
0.4775
0.8708333333
0.3
4.412777778
1.467222222
0.6561111111
0.5063888889
0.5261111111
0.8688888889

And now to find the running total across the two weeks

SELECT 
  WEEK, DAY, DAILY_HOURS_WATCHED,
  AVG(DAILY_HOURS_WATCHED) OVER(
    PARTITION BY WEEK ORDER BY DAY ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) "7_DAY_AVG"
FROM TWO_WEEKS
ORDER BY WEEK, DAY
WEEK
2018-11-19T00:00:00.000Z
2018-11-19T00:00:00.000Z
2018-11-19T00:00:00.000Z
2018-11-19T00:00:00.000Z
2018-11-26T00:00:00.000Z
2018-11-26T00:00:00.000Z
2018-11-26T00:00:00.000Z
2018-11-26T00:00:00.000Z
2018-11-26T00:00:00.000Z
2018-11-26T00:00:00.000Z
DAY
0
1
2
6
0
1
2
4
5
6
DAILY_HOURS_WATCHED
1.326666667
0.4775
0.8708333333
0.3
4.412777778
1.467222222
0.6561111111
0.5063888889
0.5261111111
0.8688888889
7_DAY_AVG
1.326666667
0.9020833333
0.8916666667
0.74375
4.412777778
2.94
2.178703704
1.760625
1.513722222
1.40625

Now we see the RUNNING_TOTAL restart for the 2nd week.

How We Built This

This page was built using Count, the first notebook built around SQL. It combines the best features of a SQL IDE, Data Visualization Tool, and Computational Notebooks. In the Count notebook, each cell acts like a CTE, meaning you can reference any other cell in your queries.

This makes not only for far more readable reports (like this one) but also a much faster and more powerful way to do your analysis, essentially turning your analysis into a connected graph of data frames rather than one-off convoluted queries and CSV files. And with a built-in visualization framework, you won't have to export your data to make your charts. Go from raw data to interactive report in one document.

Share SQL code
Write SQL with your team in real-time.
Share SQL code
Write SQL with your team in real-time.