Moving averages in Snowflake are an excellent way to discover patterns in your data.
For example, the chart below shows the daily temperature in London as points, and a 30 day moving average as a line. Plotting both lets us see the variation of temperature day-by-day, without obscuring longer-term patterns.
SELECTDATE, TEMP,
AVG(TEMP) OVER (
ORDERBYDATEROWSBETWEEN29 PRECEDING ANDCURRENTROW ) "30_DAY_AVG_TEMP"
FROM PUBLIC.LONDON_WEATHER
DATE
TEMP
30_DAY_AVG_TEMP
2016-01-01
41.1
41.1
2016-01-02
49.8
45.45
2016-01-03
46
45.633333333
2016-01-04
45.4
45.575
2016-01-05
45.7
45.6
2016-01-06
43.7
45.283333333
2016-01-07
45.1
45.257142857
2016-01-08
42.1
44.8625
2016-01-09
47.5
45.155555556
2016-01-10
42.9
44.93
2016-01-11
42.9
44.745454545
2016-01-12
41.9
44.508333333
2016-01-13
40.4
44.192307692
2016-01-14
39
43.821428571
2016-01-15
37.2
43.38
2016-01-16
36.1
42.925
2016-01-17
36.3
42.535294118
2016-01-18
37.7
42.266666667
2016-01-19
33.3
41.794736842
2016-01-20
32.5
41.33
2016-01-21
37.6
41.152380952
2016-01-22
45.6
41.354545455
2016-01-23
45
41.513043478
2016-01-24
53.6
42.016666667
2016-01-25
52.1
42.42
2016-01-26
48.4
42.65
2016-01-27
52.7
43.022222222
2016-01-28
43.4
43.035714286
2016-01-29
52.1
43.348275862
2016-01-30
45.9
43.433333333
2016-01-31
48.3
43.673333333
2016-02-01
54.1
43.816666667
2016-02-02
46.8
43.843333333
2016-02-03
41.8
43.723333333
2016-02-04
49.5
43.85
2016-02-05
50.6
44.08
2016-02-06
50.6
44.263333333
2016-02-07
45.2
44.366666667
2016-02-08
46.3
44.326666667
2016-02-09
41.6
44.283333333
2016-02-10
42.2
44.26
2016-02-11
37.8
44.123333333
2016-02-12
38.7
44.066666667
2016-02-13
38.9
44.063333333
2016-02-14
38.7
44.113333333
2016-02-15
36.6
44.13
2016-02-16
36.5
44.136666667
2016-02-17
39.6
44.2
2016-02-18
39.3
44.4
2016-02-19
41.1
44.686666667
2016-02-20
51.4
45.146666667
2016-02-21
55
45.46
2016-02-22
47.7
45.55
2016-02-23
42.3
45.173333333
2016-02-24
38.3
44.713333333
2016-02-25
35.8
44.293333333
2016-02-26
39.1
43.84
2016-02-27
40.2
43.733333333
2016-02-28
40.6
43.35
2016-02-29
40.6
43.173333333
2016-03-01
47.8
43.156666667
2016-03-02
42
42.753333333
2016-03-03
43.3
42.636666667
2016-03-04
41.2
42.616666667
2016-03-05
37.4
42.213333333
2016-03-06
38.9
41.823333333
2016-03-07
39.6
41.456666667
2016-03-08
38.8
41.243333333
2016-03-09
46.2
41.24
2016-03-10
43.7
41.31
2016-03-11
40.8
41.263333333
2016-03-12
43.2
41.443333333
2016-03-13
42.7
41.576666667
2016-03-14
44
41.746666667
2016-03-15
42.9
41.886666667
2016-03-16
45.2
42.173333333
2016-03-17
43.8
42.416666667
2016-03-18
41.1
42.466666667
2016-03-19
44.3
42.633333333
2016-03-20
45.3
42.773333333
2016-03-21
47.7
42.65
2016-03-22
45.7
42.34
2016-03-23
47.4
42.33
2016-03-24
46.5
42.47
2016-03-25
50.8
42.886666667
2016-03-26
49
43.326666667
2016-03-27
46.3
43.566666667
2016-03-28
46.8
43.786666667
2016-03-29
45.8
43.96
2016-03-30
47.1
44.176666667
2016-03-31
47.1
44.153333333
2016-04-01
45.9
44.283333333
2016-04-02
50.3
44.516666667
2016-04-03
51.8
44.87
2016-04-04
50.8
45.316666667
2016-04-05
51.1
45.723333333
2016-04-06
47.7
45.993333333
2016-04-07
44.6
46.186666667
2016-04-08
47.7
46.236666667
2016-04-09
46.3
46.323333333
Step-by-Step
Moving averages rely on the use of Window Functions, which you can read more about here.
To find a running total we'll use the AVG window function:
expr1 is an expression that evaluates to a numeric data type (INTEGER, FLOAT, DECIMAL, etc.).
expr2 is the optional expression to partition by.
expr3 is the optional expression to order by within each partition. (This does not control the order of the entire query output.)
Or, written for a human:
AVG(column) OVER (PARTITIONBY col_to_group_by, ORDERBY column_to_order_by)
In the following example we'll take the running total of hours of Netflix I watched in a given week:
SELECT DATE_TRUNC('WEEK', START_TIME) WEEK,
DATE_PART('WEEKDAY', START_TIME) DAY,
SUM(DURATION) / (60*60) DAILY_HOURS_WATCHED
FROM PUBLIC.NETFLIX
WHERE WEEK ='2018-11-26'GROUPBY WEEK, DAYORDERBY WEEK, DAY
WEEK
DAY
DAILY_HOURS_WATCHED
2018-11-26T00:00:00.000Z
0
4.412777778
2018-11-26T00:00:00.000Z
1
1.467222222
2018-11-26T00:00:00.000Z
2
0.6561111111
2018-11-26T00:00:00.000Z
4
0.5063888889
2018-11-26T00:00:00.000Z
5
0.5261111111
2018-11-26T00:00:00.000Z
6
0.8688888889
Now if I want to find the moving 3-day average of hours watched that week, I can do the following:
SELECTDAY, DAILY_HOURS_WATCHED,
AVG(DAILY_HOURS_WATCHED) OVER (
ORDERBYDAYROWSBETWEEN2 PRECEDING ANDCURRENTROW ) "3_DAY_AVG"
FROM ONE_WEEK
DAY
DAILY_HOURS_WATCHED
3_DAY_AVG
0
4.412777778
4.412777778
1
1.467222222
2.94
2
0.6561111111
2.178703704
4
0.5063888889
0.8765740741
5
0.5261111111
0.5628703704
6
0.8688888889
0.6337962963
We can see our new 3_DAY_AVG update as the average between the current row and the 2 previous rows.
To check we can make sure that day 2 is the average of Day 0, Day 1, and Day 2 hours watched:
SELECT (4.413+1.467+0.656) /3AS day_2_result
DAY_2_RESULT
2.178666667
This example had no partition, but if we wanted to compare my viewing habits across two different weeks, we would need to PARTITION BY week:
SELECT DATE_TRUNC('WEEK', START_TIME) WEEK,
DATE_PART('WEEKDAY', START_TIME) DAY,
SUM(DURATION) / (60*60) DAILY_HOURS_WATCHED
FROM PUBLIC.NETFLIX
WHERE WEEK IN ('2018-11-19', '2018-11-26')
GROUPBY WEEK, DAYORDERBY WEEK, DAY
WEEK
DAY
DAILY_HOURS_WATCHED
2018-11-19T00:00:00.000Z
0
1.326666667
2018-11-19T00:00:00.000Z
1
0.4775
2018-11-19T00:00:00.000Z
2
0.8708333333
2018-11-19T00:00:00.000Z
6
0.3
2018-11-26T00:00:00.000Z
0
4.412777778
2018-11-26T00:00:00.000Z
1
1.467222222
2018-11-26T00:00:00.000Z
2
0.6561111111
2018-11-26T00:00:00.000Z
4
0.5063888889
2018-11-26T00:00:00.000Z
5
0.5261111111
2018-11-26T00:00:00.000Z
6
0.8688888889
And now to find the running total across the two weeks
SELECT WEEK, DAY, DAILY_HOURS_WATCHED,
AVG(DAILY_HOURS_WATCHED) OVER(
PARTITIONBY WEEK ORDERBYDAYROWSBETWEEN6 PRECEDING ANDCURRENTROW ) "7_DAY_AVG"
FROM TWO_WEEKS
ORDERBY WEEK, DAY
WEEK
DAY
DAILY_HOURS_WATCHED
7_DAY_AVG
2018-11-19T00:00:00.000Z
0
1.326666667
1.326666667
2018-11-19T00:00:00.000Z
1
0.4775
0.9020833333
2018-11-19T00:00:00.000Z
2
0.8708333333
0.8916666667
2018-11-19T00:00:00.000Z
6
0.3
0.74375
2018-11-26T00:00:00.000Z
0
4.412777778
4.412777778
2018-11-26T00:00:00.000Z
1
1.467222222
2.94
2018-11-26T00:00:00.000Z
2
0.6561111111
2.178703704
2018-11-26T00:00:00.000Z
4
0.5063888889
1.760625
2018-11-26T00:00:00.000Z
5
0.5261111111
1.513722222
2018-11-26T00:00:00.000Z
6
0.8688888889
1.40625
Now we see the RUNNING_TOTAL restart for the 2nd week.
How We Built This
This page was built using Count, the first notebook built around SQL. It combines the best features of a SQL IDE, Data Visualization Tool, and Computational Notebooks. In the Count notebook, each cell acts like a CTE, meaning you can reference any other cell in your queries.
This makes not only for far more readable reports (like this one) but also a much faster and more powerful way to do your analysis, essentially turning your analysis into a connected graph of data frames rather than one-off convoluted queries and CSV files. And with a built-in visualization framework, you won't have to export your data to make your charts. Go from raw data to interactive report in one document.
Subscribe to newsletter
Subscribe to receive the latest blog posts to your inbox every week.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Start solving your organization's biggest problems with Count today.
Stay up to date with all things count, data and problem solving.