Moving averages

Moving averages in Snowflake are an excellent way to discover patterns in your data.

For example, the chart below shows the daily temperature in London as points, and a 30 day moving average as a line. Plotting both lets us see the variation of temperature day-by-day, without obscuring longer-term patterns.

SELECT
  DATE, TEMP,
  AVG(TEMP) OVER (
    ORDER BY DATE ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
  ) "30_DAY_AVG_TEMP"
FROM PUBLIC.LONDON_WEATHER

DATE	TEMP	30_DAY_AVG_TEMP
2016-01-01	41.1	41.1
2016-01-02	49.8	45.45
2016-01-03	46	45.633333333
2016-01-04	45.4	45.575
2016-01-05	45.7	45.6
2016-01-06	43.7	45.283333333
2016-01-07	45.1	45.257142857
2016-01-08	42.1	44.8625
2016-01-09	47.5	45.155555556
2016-01-10	42.9	44.93
2016-01-11	42.9	44.745454545
2016-01-12	41.9	44.508333333
2016-01-13	40.4	44.192307692
2016-01-14	39	43.821428571
2016-01-15	37.2	43.38
2016-01-16	36.1	42.925
2016-01-17	36.3	42.535294118
2016-01-18	37.7	42.266666667
2016-01-19	33.3	41.794736842
2016-01-20	32.5	41.33
2016-01-21	37.6	41.152380952
2016-01-22	45.6	41.354545455
2016-01-23	45	41.513043478
2016-01-24	53.6	42.016666667
2016-01-25	52.1	42.42
2016-01-26	48.4	42.65
2016-01-27	52.7	43.022222222
2016-01-28	43.4	43.035714286
2016-01-29	52.1	43.348275862
2016-01-30	45.9	43.433333333
2016-01-31	48.3	43.673333333
2016-02-01	54.1	43.816666667
2016-02-02	46.8	43.843333333
2016-02-03	41.8	43.723333333
2016-02-04	49.5	43.85
2016-02-05	50.6	44.08
2016-02-06	50.6	44.263333333
2016-02-07	45.2	44.366666667
2016-02-08	46.3	44.326666667
2016-02-09	41.6	44.283333333
2016-02-10	42.2	44.26
2016-02-11	37.8	44.123333333
2016-02-12	38.7	44.066666667
2016-02-13	38.9	44.063333333
2016-02-14	38.7	44.113333333
2016-02-15	36.6	44.13
2016-02-16	36.5	44.136666667
2016-02-17	39.6	44.2
2016-02-18	39.3	44.4
2016-02-19	41.1	44.686666667
2016-02-20	51.4	45.146666667
2016-02-21	55	45.46
2016-02-22	47.7	45.55
2016-02-23	42.3	45.173333333
2016-02-24	38.3	44.713333333
2016-02-25	35.8	44.293333333
2016-02-26	39.1	43.84
2016-02-27	40.2	43.733333333
2016-02-28	40.6	43.35
2016-02-29	40.6	43.173333333
2016-03-01	47.8	43.156666667
2016-03-02	42	42.753333333
2016-03-03	43.3	42.636666667
2016-03-04	41.2	42.616666667
2016-03-05	37.4	42.213333333
2016-03-06	38.9	41.823333333
2016-03-07	39.6	41.456666667
2016-03-08	38.8	41.243333333
2016-03-09	46.2	41.24
2016-03-10	43.7	41.31
2016-03-11	40.8	41.263333333
2016-03-12	43.2	41.443333333
2016-03-13	42.7	41.576666667
2016-03-14	44	41.746666667
2016-03-15	42.9	41.886666667
2016-03-16	45.2	42.173333333
2016-03-17	43.8	42.416666667
2016-03-18	41.1	42.466666667
2016-03-19	44.3	42.633333333
2016-03-20	45.3	42.773333333
2016-03-21	47.7	42.65
2016-03-22	45.7	42.34
2016-03-23	47.4	42.33
2016-03-24	46.5	42.47
2016-03-25	50.8	42.886666667
2016-03-26	49	43.326666667
2016-03-27	46.3	43.566666667
2016-03-28	46.8	43.786666667
2016-03-29	45.8	43.96
2016-03-30	47.1	44.176666667
2016-03-31	47.1	44.153333333
2016-04-01	45.9	44.283333333
2016-04-02	50.3	44.516666667
2016-04-03	51.8	44.87
2016-04-04	50.8	45.316666667
2016-04-05	51.1	45.723333333
2016-04-06	47.7	45.993333333
2016-04-07	44.6	46.186666667
2016-04-08	47.7	46.236666667
2016-04-09	46.3	46.323333333

Step-by-Step

Moving averages rely on the use of Window Functions, which you can read more about here.

To find a running total we'll use the AVG window function:

AVG( <expr1> ) OVER ( [ PARTITION BY <expr2> ] [ ORDER BY <expr3> [ ASC | DESC ] [ <window_frame> ] ] )

Where:

expr1 is an expression that evaluates to a numeric data type (INTEGER, FLOAT, DECIMAL, etc.).

expr2 is the optional expression to partition by.

expr3 is the optional expression to order by within each partition. (This does not control the order of the entire query output.)

Or, written for a human:

AVG(column) OVER (PARTITION BY col_to_group_by, ORDER BY column_to_order_by)

In the following example we'll take the running total of hours of Netflix I watched in a given week:

SELECT 
  DATE_TRUNC('WEEK', START_TIME) WEEK,
  DATE_PART('WEEKDAY', START_TIME) DAY, 
  SUM(DURATION) / (60 * 60) DAILY_HOURS_WATCHED 
FROM 
  PUBLIC.NETFLIX 
WHERE 
  WEEK = '2018-11-26'
GROUP BY WEEK, DAY
ORDER BY WEEK, DAY

WEEK	DAY	DAILY_HOURS_WATCHED
2018-11-26T00:00:00.000Z	0	4.412777778
2018-11-26T00:00:00.000Z	1	1.467222222
2018-11-26T00:00:00.000Z	2	0.6561111111
2018-11-26T00:00:00.000Z	4	0.5063888889
2018-11-26T00:00:00.000Z	5	0.5261111111
2018-11-26T00:00:00.000Z	6	0.8688888889

Now if I want to find the moving 3-day average of hours watched that week, I can do the following:

SELECT
  DAY, DAILY_HOURS_WATCHED,
  AVG(DAILY_HOURS_WATCHED) OVER (
    ORDER BY DAY ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ) "3_DAY_AVG"
FROM ONE_WEEK

DAY	DAILY_HOURS_WATCHED	3_DAY_AVG
0	4.412777778	4.412777778
1	1.467222222	2.94
2	0.6561111111	2.178703704
4	0.5063888889	0.8765740741
5	0.5261111111	0.5628703704
6	0.8688888889	0.6337962963

We can see our new 3_DAY_AVG update as the average between the current row and the 2 previous rows.

To check we can make sure that day 2 is the average of Day 0, Day 1, and Day 2 hours watched:

SELECT (4.413 + 1.467 + 0.656) / 3 AS day_2_result

DAY_2_RESULT
2.178666667

This example had no partition, but if we wanted to compare my viewing habits across two different weeks, we would need to PARTITION BY week:

SELECT 
  DATE_TRUNC('WEEK', START_TIME) WEEK,
  DATE_PART('WEEKDAY', START_TIME) DAY, 
  SUM(DURATION) / (60 * 60) DAILY_HOURS_WATCHED 
FROM 
  PUBLIC.NETFLIX 
WHERE 
  WEEK IN ('2018-11-19', '2018-11-26')
GROUP BY WEEK, DAY
ORDER BY WEEK, DAY

WEEK	DAY	DAILY_HOURS_WATCHED
2018-11-19T00:00:00.000Z	0	1.326666667
2018-11-19T00:00:00.000Z	1	0.4775
2018-11-19T00:00:00.000Z	2	0.8708333333
2018-11-19T00:00:00.000Z	6	0.3
2018-11-26T00:00:00.000Z	0	4.412777778
2018-11-26T00:00:00.000Z	1	1.467222222
2018-11-26T00:00:00.000Z	2	0.6561111111
2018-11-26T00:00:00.000Z	4	0.5063888889
2018-11-26T00:00:00.000Z	5	0.5261111111
2018-11-26T00:00:00.000Z	6	0.8688888889

And now to find the running total across the two weeks

SELECT 
  WEEK, DAY, DAILY_HOURS_WATCHED,
  AVG(DAILY_HOURS_WATCHED) OVER(
    PARTITION BY WEEK ORDER BY DAY ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) "7_DAY_AVG"
FROM TWO_WEEKS
ORDER BY WEEK, DAY

WEEK	DAY	DAILY_HOURS_WATCHED	7_DAY_AVG
2018-11-19T00:00:00.000Z	0	1.326666667	1.326666667
2018-11-19T00:00:00.000Z	1	0.4775	0.9020833333
2018-11-19T00:00:00.000Z	2	0.8708333333	0.8916666667
2018-11-19T00:00:00.000Z	6	0.3	0.74375
2018-11-26T00:00:00.000Z	0	4.412777778	4.412777778
2018-11-26T00:00:00.000Z	1	1.467222222	2.94
2018-11-26T00:00:00.000Z	2	0.6561111111	2.178703704
2018-11-26T00:00:00.000Z	4	0.5063888889	1.760625
2018-11-26T00:00:00.000Z	5	0.5261111111	1.513722222
2018-11-26T00:00:00.000Z	6	0.8688888889	1.40625

Now we see the RUNNING_TOTAL restart for the 2nd week.

How We Built This

This page was built using Count, the first notebook built around SQL. It combines the best features of a SQL IDE, Data Visualization Tool, and Computational Notebooks. In the Count notebook, each cell acts like a CTE, meaning you can reference any other cell in your queries.

This makes not only for far more readable reports (like this one) but also a much faster and more powerful way to do your analysis, essentially turning your analysis into a connected graph of data frames rather than one-off convoluted queries and CSV files. And with a built-in visualization framework, you won't have to export your data to make your charts. Go from raw data to interactive report in one document.

Business Improvement

Product

Use cases

Pricing

Professional Services

Customers

Resources

Step-by-Step

How We Built This

Start solving your organization's biggest problems with Count today.