Moving averages
Moving averages in Snowflake are an excellent way to discover patterns in your data.
For example, the chart below shows the daily temperature in London as points, and a 30 day moving average as a line. Plotting both lets us see the variation of temperature day-by-day, without obscuring longer-term patterns.
DATE | TEMP | 30_DAY_AVG_TEMP |
---|---|---|
2016-01-01 | 41.1 | 41.1 |
2016-01-02 | 49.8 | 45.45 |
2016-01-03 | 46 | 45.633333333 |
2016-01-04 | 45.4 | 45.575 |
2016-01-05 | 45.7 | 45.6 |
2016-01-06 | 43.7 | 45.283333333 |
2016-01-07 | 45.1 | 45.257142857 |
2016-01-08 | 42.1 | 44.8625 |
2016-01-09 | 47.5 | 45.155555556 |
2016-01-10 | 42.9 | 44.93 |
2016-01-11 | 42.9 | 44.745454545 |
2016-01-12 | 41.9 | 44.508333333 |
2016-01-13 | 40.4 | 44.192307692 |
2016-01-14 | 39 | 43.821428571 |
2016-01-15 | 37.2 | 43.38 |
2016-01-16 | 36.1 | 42.925 |
2016-01-17 | 36.3 | 42.535294118 |
2016-01-18 | 37.7 | 42.266666667 |
2016-01-19 | 33.3 | 41.794736842 |
2016-01-20 | 32.5 | 41.33 |
2016-01-21 | 37.6 | 41.152380952 |
2016-01-22 | 45.6 | 41.354545455 |
2016-01-23 | 45 | 41.513043478 |
2016-01-24 | 53.6 | 42.016666667 |
2016-01-25 | 52.1 | 42.42 |
2016-01-26 | 48.4 | 42.65 |
2016-01-27 | 52.7 | 43.022222222 |
2016-01-28 | 43.4 | 43.035714286 |
2016-01-29 | 52.1 | 43.348275862 |
2016-01-30 | 45.9 | 43.433333333 |
2016-01-31 | 48.3 | 43.673333333 |
2016-02-01 | 54.1 | 43.816666667 |
2016-02-02 | 46.8 | 43.843333333 |
2016-02-03 | 41.8 | 43.723333333 |
2016-02-04 | 49.5 | 43.85 |
2016-02-05 | 50.6 | 44.08 |
2016-02-06 | 50.6 | 44.263333333 |
2016-02-07 | 45.2 | 44.366666667 |
2016-02-08 | 46.3 | 44.326666667 |
2016-02-09 | 41.6 | 44.283333333 |
2016-02-10 | 42.2 | 44.26 |
2016-02-11 | 37.8 | 44.123333333 |
2016-02-12 | 38.7 | 44.066666667 |
2016-02-13 | 38.9 | 44.063333333 |
2016-02-14 | 38.7 | 44.113333333 |
2016-02-15 | 36.6 | 44.13 |
2016-02-16 | 36.5 | 44.136666667 |
2016-02-17 | 39.6 | 44.2 |
2016-02-18 | 39.3 | 44.4 |
2016-02-19 | 41.1 | 44.686666667 |
2016-02-20 | 51.4 | 45.146666667 |
2016-02-21 | 55 | 45.46 |
2016-02-22 | 47.7 | 45.55 |
2016-02-23 | 42.3 | 45.173333333 |
2016-02-24 | 38.3 | 44.713333333 |
2016-02-25 | 35.8 | 44.293333333 |
2016-02-26 | 39.1 | 43.84 |
2016-02-27 | 40.2 | 43.733333333 |
2016-02-28 | 40.6 | 43.35 |
2016-02-29 | 40.6 | 43.173333333 |
2016-03-01 | 47.8 | 43.156666667 |
2016-03-02 | 42 | 42.753333333 |
2016-03-03 | 43.3 | 42.636666667 |
2016-03-04 | 41.2 | 42.616666667 |
2016-03-05 | 37.4 | 42.213333333 |
2016-03-06 | 38.9 | 41.823333333 |
2016-03-07 | 39.6 | 41.456666667 |
2016-03-08 | 38.8 | 41.243333333 |
2016-03-09 | 46.2 | 41.24 |
2016-03-10 | 43.7 | 41.31 |
2016-03-11 | 40.8 | 41.263333333 |
2016-03-12 | 43.2 | 41.443333333 |
2016-03-13 | 42.7 | 41.576666667 |
2016-03-14 | 44 | 41.746666667 |
2016-03-15 | 42.9 | 41.886666667 |
2016-03-16 | 45.2 | 42.173333333 |
2016-03-17 | 43.8 | 42.416666667 |
2016-03-18 | 41.1 | 42.466666667 |
2016-03-19 | 44.3 | 42.633333333 |
2016-03-20 | 45.3 | 42.773333333 |
2016-03-21 | 47.7 | 42.65 |
2016-03-22 | 45.7 | 42.34 |
2016-03-23 | 47.4 | 42.33 |
2016-03-24 | 46.5 | 42.47 |
2016-03-25 | 50.8 | 42.886666667 |
2016-03-26 | 49 | 43.326666667 |
2016-03-27 | 46.3 | 43.566666667 |
2016-03-28 | 46.8 | 43.786666667 |
2016-03-29 | 45.8 | 43.96 |
2016-03-30 | 47.1 | 44.176666667 |
2016-03-31 | 47.1 | 44.153333333 |
2016-04-01 | 45.9 | 44.283333333 |
2016-04-02 | 50.3 | 44.516666667 |
2016-04-03 | 51.8 | 44.87 |
2016-04-04 | 50.8 | 45.316666667 |
2016-04-05 | 51.1 | 45.723333333 |
2016-04-06 | 47.7 | 45.993333333 |
2016-04-07 | 44.6 | 46.186666667 |
2016-04-08 | 47.7 | 46.236666667 |
2016-04-09 | 46.3 | 46.323333333 |
Step-by-Step
Moving averages rely on the use of Window Functions, which you can read more about here.
To find a running total we'll use the AVG
window function:
expr1
is an expression that evaluates to a numeric data type (INTEGER, FLOAT, DECIMAL, etc.).expr2
is the optional expression to partition by.expr3
is the optional expression to order by within each partition. (This does not control the order of the entire query output.)
Or, written for a human:
In the following example we'll take the running total of hours of Netflix I watched in a given week:
WEEK | DAY | DAILY_HOURS_WATCHED |
---|---|---|
2018-11-26T00:00:00.000Z | 0 | 4.412777778 |
2018-11-26T00:00:00.000Z | 1 | 1.467222222 |
2018-11-26T00:00:00.000Z | 2 | 0.6561111111 |
2018-11-26T00:00:00.000Z | 4 | 0.5063888889 |
2018-11-26T00:00:00.000Z | 5 | 0.5261111111 |
2018-11-26T00:00:00.000Z | 6 | 0.8688888889 |
Now if I want to find the moving 3-day average of hours watched that week, I can do the following:
DAY | DAILY_HOURS_WATCHED | 3_DAY_AVG |
---|---|---|
0 | 4.412777778 | 4.412777778 |
1 | 1.467222222 | 2.94 |
2 | 0.6561111111 | 2.178703704 |
4 | 0.5063888889 | 0.8765740741 |
5 | 0.5261111111 | 0.5628703704 |
6 | 0.8688888889 | 0.6337962963 |
We can see our new 3_DAY_AVG
update as the average between the current row and the 2 previous rows.
To check we can make sure that day 2 is the average of Day 0, Day 1, and Day 2 hours watched:
DAY_2_RESULT |
---|
2.178666667 |
This example had no partition, but if we wanted to compare my viewing habits across two different weeks, we would need to PARTITION BY week:
WEEK | DAY | DAILY_HOURS_WATCHED |
---|---|---|
2018-11-19T00:00:00.000Z | 0 | 1.326666667 |
2018-11-19T00:00:00.000Z | 1 | 0.4775 |
2018-11-19T00:00:00.000Z | 2 | 0.8708333333 |
2018-11-19T00:00:00.000Z | 6 | 0.3 |
2018-11-26T00:00:00.000Z | 0 | 4.412777778 |
2018-11-26T00:00:00.000Z | 1 | 1.467222222 |
2018-11-26T00:00:00.000Z | 2 | 0.6561111111 |
2018-11-26T00:00:00.000Z | 4 | 0.5063888889 |
2018-11-26T00:00:00.000Z | 5 | 0.5261111111 |
2018-11-26T00:00:00.000Z | 6 | 0.8688888889 |
And now to find the running total across the two weeks
WEEK | DAY | DAILY_HOURS_WATCHED | 7_DAY_AVG |
---|---|---|---|
2018-11-19T00:00:00.000Z | 0 | 1.326666667 | 1.326666667 |
2018-11-19T00:00:00.000Z | 1 | 0.4775 | 0.9020833333 |
2018-11-19T00:00:00.000Z | 2 | 0.8708333333 | 0.8916666667 |
2018-11-19T00:00:00.000Z | 6 | 0.3 | 0.74375 |
2018-11-26T00:00:00.000Z | 0 | 4.412777778 | 4.412777778 |
2018-11-26T00:00:00.000Z | 1 | 1.467222222 | 2.94 |
2018-11-26T00:00:00.000Z | 2 | 0.6561111111 | 2.178703704 |
2018-11-26T00:00:00.000Z | 4 | 0.5063888889 | 1.760625 |
2018-11-26T00:00:00.000Z | 5 | 0.5261111111 | 1.513722222 |
2018-11-26T00:00:00.000Z | 6 | 0.8688888889 | 1.40625 |
Now we see the RUNNING_TOTAL restart for the 2nd week.
How We Built This
This page was built using Count, the first notebook built around SQL. It combines the best features of a SQL IDE, Data Visualization Tool, and Computational Notebooks. In the Count notebook, each cell acts like a CTE, meaning you can reference any other cell in your queries.
This makes not only for far more readable reports (like this one) but also a much faster and more powerful way to do your analysis, essentially turning your analysis into a connected graph of data frames rather than one-off convoluted queries and CSV files. And with a built-in visualization framework, you won't have to export your data to make your charts. Go from raw data to interactive report in one document.