Moving averages in Snowflake are an excellent way to discover patterns in your data.
Moving averages in Snowflake are an excellent way to discover patterns in your data.
For example, the chart below shows the daily temperature in London as points, and a 30 day moving average as a line. Plotting both lets us see the variation of temperature day-by-day, without obscuring longer-term patterns.
SELECT
DATE, TEMP,
AVG(TEMP) OVER (
ORDER BY DATE ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
) "30_DAY_AVG_TEMP"
FROM PUBLIC.LONDON_WEATHER| DATE | TEMP | 30_DAY_AVG_TEMP |
|---|---|---|
| 2016-01-01 | 41.1 | 41.1 |
| 2016-01-02 | 49.8 | 45.45 |
| 2016-01-03 | 46 | 45.633333333 |
| 2016-01-04 | 45.4 | 45.575 |
| 2016-01-05 | 45.7 | 45.6 |
| 2016-01-06 | 43.7 | 45.283333333 |
| 2016-01-07 | 45.1 | 45.257142857 |
| 2016-01-08 | 42.1 | 44.8625 |
| 2016-01-09 | 47.5 | 45.155555556 |
| 2016-01-10 | 42.9 | 44.93 |
| 2016-01-11 | 42.9 | 44.745454545 |
| 2016-01-12 | 41.9 | 44.508333333 |
| 2016-01-13 | 40.4 | 44.192307692 |
| 2016-01-14 | 39 | 43.821428571 |
| 2016-01-15 | 37.2 | 43.38 |
| 2016-01-16 | 36.1 | 42.925 |
| 2016-01-17 | 36.3 | 42.535294118 |
| 2016-01-18 | 37.7 | 42.266666667 |
| 2016-01-19 | 33.3 | 41.794736842 |
| 2016-01-20 | 32.5 | 41.33 |
| 2016-01-21 | 37.6 | 41.152380952 |
| 2016-01-22 | 45.6 | 41.354545455 |
| 2016-01-23 | 45 | 41.513043478 |
| 2016-01-24 | 53.6 | 42.016666667 |
| 2016-01-25 | 52.1 | 42.42 |
| 2016-01-26 | 48.4 | 42.65 |
| 2016-01-27 | 52.7 | 43.022222222 |
| 2016-01-28 | 43.4 | 43.035714286 |
| 2016-01-29 | 52.1 | 43.348275862 |
| 2016-01-30 | 45.9 | 43.433333333 |
| 2016-01-31 | 48.3 | 43.673333333 |
| 2016-02-01 | 54.1 | 43.816666667 |
| 2016-02-02 | 46.8 | 43.843333333 |
| 2016-02-03 | 41.8 | 43.723333333 |
| 2016-02-04 | 49.5 | 43.85 |
| 2016-02-05 | 50.6 | 44.08 |
| 2016-02-06 | 50.6 | 44.263333333 |
| 2016-02-07 | 45.2 | 44.366666667 |
| 2016-02-08 | 46.3 | 44.326666667 |
| 2016-02-09 | 41.6 | 44.283333333 |
| 2016-02-10 | 42.2 | 44.26 |
| 2016-02-11 | 37.8 | 44.123333333 |
| 2016-02-12 | 38.7 | 44.066666667 |
| 2016-02-13 | 38.9 | 44.063333333 |
| 2016-02-14 | 38.7 | 44.113333333 |
| 2016-02-15 | 36.6 | 44.13 |
| 2016-02-16 | 36.5 | 44.136666667 |
| 2016-02-17 | 39.6 | 44.2 |
| 2016-02-18 | 39.3 | 44.4 |
| 2016-02-19 | 41.1 | 44.686666667 |
| 2016-02-20 | 51.4 | 45.146666667 |
| 2016-02-21 | 55 | 45.46 |
| 2016-02-22 | 47.7 | 45.55 |
| 2016-02-23 | 42.3 | 45.173333333 |
| 2016-02-24 | 38.3 | 44.713333333 |
| 2016-02-25 | 35.8 | 44.293333333 |
| 2016-02-26 | 39.1 | 43.84 |
| 2016-02-27 | 40.2 | 43.733333333 |
| 2016-02-28 | 40.6 | 43.35 |
| 2016-02-29 | 40.6 | 43.173333333 |
| 2016-03-01 | 47.8 | 43.156666667 |
| 2016-03-02 | 42 | 42.753333333 |
| 2016-03-03 | 43.3 | 42.636666667 |
| 2016-03-04 | 41.2 | 42.616666667 |
| 2016-03-05 | 37.4 | 42.213333333 |
| 2016-03-06 | 38.9 | 41.823333333 |
| 2016-03-07 | 39.6 | 41.456666667 |
| 2016-03-08 | 38.8 | 41.243333333 |
| 2016-03-09 | 46.2 | 41.24 |
| 2016-03-10 | 43.7 | 41.31 |
| 2016-03-11 | 40.8 | 41.263333333 |
| 2016-03-12 | 43.2 | 41.443333333 |
| 2016-03-13 | 42.7 | 41.576666667 |
| 2016-03-14 | 44 | 41.746666667 |
| 2016-03-15 | 42.9 | 41.886666667 |
| 2016-03-16 | 45.2 | 42.173333333 |
| 2016-03-17 | 43.8 | 42.416666667 |
| 2016-03-18 | 41.1 | 42.466666667 |
| 2016-03-19 | 44.3 | 42.633333333 |
| 2016-03-20 | 45.3 | 42.773333333 |
| 2016-03-21 | 47.7 | 42.65 |
| 2016-03-22 | 45.7 | 42.34 |
| 2016-03-23 | 47.4 | 42.33 |
| 2016-03-24 | 46.5 | 42.47 |
| 2016-03-25 | 50.8 | 42.886666667 |
| 2016-03-26 | 49 | 43.326666667 |
| 2016-03-27 | 46.3 | 43.566666667 |
| 2016-03-28 | 46.8 | 43.786666667 |
| 2016-03-29 | 45.8 | 43.96 |
| 2016-03-30 | 47.1 | 44.176666667 |
| 2016-03-31 | 47.1 | 44.153333333 |
| 2016-04-01 | 45.9 | 44.283333333 |
| 2016-04-02 | 50.3 | 44.516666667 |
| 2016-04-03 | 51.8 | 44.87 |
| 2016-04-04 | 50.8 | 45.316666667 |
| 2016-04-05 | 51.1 | 45.723333333 |
| 2016-04-06 | 47.7 | 45.993333333 |
| 2016-04-07 | 44.6 | 46.186666667 |
| 2016-04-08 | 47.7 | 46.236666667 |
| 2016-04-09 | 46.3 | 46.323333333 |
Moving averages rely on the use of Window Functions, which you can read more about here.
To find a running total we'll use the AVG window function:
AVG( <expr1> ) OVER ( [ PARTITION BY <expr2> ] [ ORDER BY <expr3> [ ASC | DESC ] [ <window_frame> ] ] )expr1 is an expression that evaluates to a numeric data type (INTEGER, FLOAT, DECIMAL, etc.).expr2 is the optional expression to partition by.expr3 is the optional expression to order by within each partition. (This does not control the order of the entire query output.)Or, written for a human:
AVG(column) OVER (PARTITION BY col_to_group_by, ORDER BY column_to_order_by)In the following example we'll take the running total of hours of Netflix I watched in a given week:
SELECT
DATE_TRUNC('WEEK', START_TIME) WEEK,
DATE_PART('WEEKDAY', START_TIME) DAY,
SUM(DURATION) / (60 * 60) DAILY_HOURS_WATCHED
FROM
PUBLIC.NETFLIX
WHERE
WEEK = '2018-11-26'
GROUP BY WEEK, DAY
ORDER BY WEEK, DAY| WEEK | DAY | DAILY_HOURS_WATCHED |
|---|---|---|
| 2018-11-26T00:00:00.000Z | 0 | 4.412777778 |
| 2018-11-26T00:00:00.000Z | 1 | 1.467222222 |
| 2018-11-26T00:00:00.000Z | 2 | 0.6561111111 |
| 2018-11-26T00:00:00.000Z | 4 | 0.5063888889 |
| 2018-11-26T00:00:00.000Z | 5 | 0.5261111111 |
| 2018-11-26T00:00:00.000Z | 6 | 0.8688888889 |
Now if I want to find the moving 3-day average of hours watched that week, I can do the following:
SELECT
DAY, DAILY_HOURS_WATCHED,
AVG(DAILY_HOURS_WATCHED) OVER (
ORDER BY DAY ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) "3_DAY_AVG"
FROM ONE_WEEK| DAY | DAILY_HOURS_WATCHED | 3_DAY_AVG |
|---|---|---|
| 0 | 4.412777778 | 4.412777778 |
| 1 | 1.467222222 | 2.94 |
| 2 | 0.6561111111 | 2.178703704 |
| 4 | 0.5063888889 | 0.8765740741 |
| 5 | 0.5261111111 | 0.5628703704 |
| 6 | 0.8688888889 | 0.6337962963 |
We can see our new 3_DAY_AVG update as the average between the current row and the 2 previous rows.
To check we can make sure that day 2 is the average of Day 0, Day 1, and Day 2 hours watched:
SELECT (4.413 + 1.467 + 0.656) / 3 AS day_2_result| DAY_2_RESULT |
|---|
| 2.178666667 |
This example had no partition, but if we wanted to compare my viewing habits across two different weeks, we would need to PARTITION BY week:
SELECT
DATE_TRUNC('WEEK', START_TIME) WEEK,
DATE_PART('WEEKDAY', START_TIME) DAY,
SUM(DURATION) / (60 * 60) DAILY_HOURS_WATCHED
FROM
PUBLIC.NETFLIX
WHERE
WEEK IN ('2018-11-19', '2018-11-26')
GROUP BY WEEK, DAY
ORDER BY WEEK, DAY| WEEK | DAY | DAILY_HOURS_WATCHED |
|---|---|---|
| 2018-11-19T00:00:00.000Z | 0 | 1.326666667 |
| 2018-11-19T00:00:00.000Z | 1 | 0.4775 |
| 2018-11-19T00:00:00.000Z | 2 | 0.8708333333 |
| 2018-11-19T00:00:00.000Z | 6 | 0.3 |
| 2018-11-26T00:00:00.000Z | 0 | 4.412777778 |
| 2018-11-26T00:00:00.000Z | 1 | 1.467222222 |
| 2018-11-26T00:00:00.000Z | 2 | 0.6561111111 |
| 2018-11-26T00:00:00.000Z | 4 | 0.5063888889 |
| 2018-11-26T00:00:00.000Z | 5 | 0.5261111111 |
| 2018-11-26T00:00:00.000Z | 6 | 0.8688888889 |
And now to find the running total across the two weeks
| WEEK | DAY | DAILY_HOURS_WATCHED | 7_DAY_AVG |
|---|---|---|---|
| 2018-11-19T00:00:00.000Z | 0 | 1.326666667 | 1.326666667 |
| 2018-11-19T00:00:00.000Z | 1 | 0.4775 | 0.9020833333 |
| 2018-11-19T00:00:00.000Z | 2 | 0.8708333333 | 0.8916666667 |
| 2018-11-19T00:00:00.000Z | 6 | 0.3 | 0.74375 |
| 2018-11-26T00:00:00.000Z | 0 | 4.412777778 | 4.412777778 |
| 2018-11-26T00:00:00.000Z | 1 | 1.467222222 | 2.94 |
| 2018-11-26T00:00:00.000Z | 2 | 0.6561111111 | 2.178703704 |
| 2018-11-26T00:00:00.000Z | 4 | 0.5063888889 | 1.760625 |
| 2018-11-26T00:00:00.000Z | 5 | 0.5261111111 | 1.513722222 |
| 2018-11-26T00:00:00.000Z | 6 | 0.8688888889 | 1.40625 |
Now we see the RUNNING_TOTAL restart for the 2nd week.
This page was built using Count, the first notebook built around SQL. It combines the best features of a SQL IDE, Data Visualization Tool, and Computational Notebooks. In the Count notebook, each cell acts like a CTE, meaning you can reference any other cell in your queries.
This makes not only for far more readable reports (like this one) but also a much faster and more powerful way to do your analysis, essentially turning your analysis into a connected graph of data frames rather than one-off convoluted queries and CSV files. And with a built-in visualization framework, you won't have to export your data to make your charts. Go from raw data to interactive report in one document.
SELECT
WEEK, DAY, DAILY_HOURS_WATCHED,
AVG(DAILY_HOURS_WATCHED) OVER(
PARTITION BY WEEK ORDER BY DAY ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) "7_DAY_AVG"
FROM TWO_WEEKS
ORDER BY WEEK, DAY