Histograms
Histograms are essential tools in understanding how our data is shaped or distributed. Mastering the art of building and customizing histograms in SQL can be tricky at first, but once you've got it down it's a recipe you'll be coming back to again and again.
Step-by-step
We'll use some historical weather data for London for this example. We can see the temperature (in Fahrenheit) for each day below:
Select DATE, TEMP from PUBLIC.LONDON_WEATHER| DATE | TEMP |
|---|---|
| 2018-04-11 | 48.5 |
| 2016-10-31 | 54.7 |
| 2018-11-23 | 44.2 |
| 2018-01-09 | 41.6 |
| 2017-01-26 | 31 |
| 2017-02-13 | 42.8 |
| 2016-12-31 | 42.8 |
| 2018-05-31 | 65.7 |
| 2017-01-27 | 37.8 |
| 2018-03-02 | 30.6 |
| 2019-02-16 | 47.4 |
| 2018-02-19 | 47.7 |
| 2017-01-31 | 45 |
| 2019-12-04 | 38.9 |
| 2017-11-02 | 47.5 |
| 2017-03-11 | 54.2 |
| 2018-05-29 | 59.5 |
| 2017-03-16 | 48.7 |
| 2018-03-24 | 46.2 |
| 2016-12-28 | 36 |
| 2019-11-09 | 40.4 |
| 2018-05-30 | 60 |
| 2018-04-14 | 55.2 |
| 2019-12-28 | 46.4 |
| 2017-02-06 | 39.2 |
| 2019-04-02 | 42.6 |
| 2019-02-28 | 48.8 |
| 2018-03-04 | 43 |
| 2017-10-18 | 56.1 |
| 2016-12-14 | 48.8 |
| 2018-11-01 | 48 |
| 2016-11-21 | 47.8 |
| 2017-08-21 | 62.1 |
| 2017-12-10 | 34.9 |
| 2016-06-04 | 59.9 |
| 2019-11-08 | 40.7 |
| 2018-10-05 | 60.5 |
| 2017-02-14 | 44.6 |
| 2019-04-30 | 53.5 |
| 2018-10-06 | 50.1 |
| 2019-02-17 | 49 |
| 2018-11-28 | 53.3 |
| 2016-04-11 | 52.5 |
| 2016-12-10 | 52.8 |
| 2018-02-20 | 47.3 |
| 2016-01-17 | 36.3 |
| 2017-12-02 | 40.2 |
| 2016-09-22 | 61.4 |
| 2018-10-20 | 53.1 |
| 2019-01-11 | 45.5 |
| 2017-05-07 | 54.5 |
| 2018-03-09 | 43.6 |
| 2019-04-06 | 47.4 |
| 2018-02-28 | 24.6 |
| 2018-03-10 | 51.5 |
| 2018-05-25 | 59.5 |
| 2019-06-25 | 67.9 |
| 2019-06-13 | 54.8 |
| 2018-04-02 | 46.9 |
| 2018-12-25 | 41.5 |
| 2017-04-04 | 53.1 |
| 2018-07-29 | 64.6 |
| 2019-02-08 | 48.6 |
| 2019-01-23 | 34.8 |
| 2018-05-27 | 69.3 |
| 2018-04-22 | 61.7 |
| 2019-12-20 | 46.8 |
| 2016-06-09 | 62.7 |
| 2016-04-15 | 50 |
| 2018-09-12 | 56.4 |
| 2017-12-07 | 48 |
| 2019-11-16 | 43.1 |
| 2017-09-19 | 53.7 |
| 2016-11-03 | 43.5 |
| 2016-04-02 | 50.3 |
| 2016-05-17 | 55.3 |
| 2019-02-26 | 50.7 |
| 2018-11-26 | 43.1 |
| 2016-12-01 | 35 |
| 2019-11-18 | 43.6 |
| 2019-04-23 | 60.4 |
| 2019-08-25 | 76.3 |
| 2017-03-10 | 49.1 |
| 2018-06-02 | 65.4 |
| 2019-01-29 | 35.4 |
| 2018-10-25 | 49.6 |
| 2018-11-14 | 52.1 |
| 2018-02-22 | 37.7 |
| 2017-04-03 | 51.5 |
| 2018-05-09 | 57.1 |
| 2018-03-21 | 41.5 |
| 2016-04-01 | 45.9 |
| 2016-04-21 | 51.1 |
| 2019-01-16 | 45.6 |
| 2018-09-11 | 66.6 |
| 2018-11-09 | 50.9 |
| 2016-09-05 | 64.6 |
| 2017-10-13 | 61.4 |
| 2018-04-03 | 51.8 |
| 2019-06-08 | 58.4 |
1. Create bins
Histograms count the number of values that fall in a certain bin, so the first thing we'll do is create our bins. We'll do this using the FLOOR function, which will round down a value to the nearest integer.
If we want to create bins of 5 degrees, we can divide our temperature by 5, round down, then multiply that by 5 to get the lower end of the bin.
For example:
FLOOR(value / bin_size) * bin_size| TEMP | FLOOR_DIV_5 | BIN_START | BIN_END | BIN_RANGE |
|---|---|---|---|---|
| 24.6 | 4 | 20 | 25 | 20 - 25 |
| 27.3 | 5 | 25 | 30 | 25 - 30 |
| 27.9 | 5 | 25 | 30 | 25 - 30 |
| 30.6 | 6 | 30 | 35 | 30 - 35 |
| 31 | 6 | 30 | 35 | 30 - 35 |
| 31.4 | 6 | 30 | 35 | 30 - 35 |
| 31.7 | 6 | 30 | 35 | 30 - 35 |
| 32 | 6 | 30 | 35 | 30 - 35 |
| 32.3 | 6 | 30 | 35 | 30 - 35 |
| 32.5 | 6 | 30 | 35 | 30 - 35 |
| 32.7 | 6 | 30 | 35 | 30 - 35 |
| 33.3 | 6 | 30 | 35 | 30 - 35 |
| 33.7 | 6 | 30 | 35 | 30 - 35 |
| 33.8 | 6 | 30 | 35 | 30 - 35 |
| 34.2 | 6 | 30 | 35 | 30 - 35 |
| 34.3 | 6 | 30 | 35 | 30 - 35 |
| 34.4 | 6 | 30 | 35 | 30 - 35 |
| 34.5 | 6 | 30 | 35 | 30 - 35 |
| 34.7 | 6 | 30 | 35 | 30 - 35 |
| 34.8 | 6 | 30 | 35 | 30 - 35 |
| 34.9 | 6 | 30 | 35 | 30 - 35 |
| 35 | 7 | 35 | 40 | 35 - 40 |
| 35.1 | 7 | 35 | 40 | 35 - 40 |
| 35.2 | 7 | 35 | 40 | 35 - 40 |
| 35.3 | 7 | 35 | 40 | 35 - 40 |
| 35.4 | 7 | 35 | 40 | 35 - 40 |
| 35.5 | 7 | 35 | 40 | 35 - 40 |
| 35.6 | 7 | 35 | 40 | 35 - 40 |
| 35.7 | 7 | 35 | 40 | 35 - 40 |
| 35.8 | 7 | 35 | 40 | 35 - 40 |
| 35.9 | 7 | 35 | 40 | 35 - 40 |
| 36 | 7 | 35 | 40 | 35 - 40 |
| 36.1 | 7 | 35 | 40 | 35 - 40 |
| 36.2 | 7 | 35 | 40 | 35 - 40 |
| 36.3 | 7 | 35 | 40 | 35 - 40 |
| 36.4 | 7 | 35 | 40 | 35 - 40 |
| 36.5 | 7 | 35 | 40 | 35 - 40 |
| 36.6 | 7 | 35 | 40 | 35 - 40 |
| 36.7 | 7 | 35 | 40 | 35 - 40 |
| 36.8 | 7 | 35 | 40 | 35 - 40 |
| 36.9 | 7 | 35 | 40 | 35 - 40 |
| 37.1 | 7 | 35 | 40 | 35 - 40 |
| 37.2 | 7 | 35 | 40 | 35 - 40 |
| 37.3 | 7 | 35 | 40 | 35 - 40 |
| 37.4 | 7 | 35 | 40 | 35 - 40 |
| 37.5 | 7 | 35 | 40 | 35 - 40 |
| 37.6 | 7 | 35 | 40 | 35 - 40 |
| 37.7 | 7 | 35 | 40 | 35 - 40 |
| 37.8 | 7 | 35 | 40 | 35 - 40 |
| 37.9 | 7 | 35 | 40 | 35 - 40 |
| 38 | 7 | 35 | 40 | 35 - 40 |
| 38.1 | 7 | 35 | 40 | 35 - 40 |
| 38.3 | 7 | 35 | 40 | 35 - 40 |
| 38.4 | 7 | 35 | 40 | 35 - 40 |
| 38.5 | 7 | 35 | 40 | 35 - 40 |
| 38.7 | 7 | 35 | 40 | 35 - 40 |
| 38.8 | 7 | 35 | 40 | 35 - 40 |
| 38.9 | 7 | 35 | 40 | 35 - 40 |
| 39 | 7 | 35 | 40 | 35 - 40 |
| 39.1 | 7 | 35 | 40 | 35 - 40 |
| 39.2 | 7 | 35 | 40 | 35 - 40 |
| 39.3 | 7 | 35 | 40 | 35 - 40 |
| 39.4 | 7 | 35 | 40 | 35 - 40 |
| 39.5 | 7 | 35 | 40 | 35 - 40 |
| 39.6 | 7 | 35 | 40 | 35 - 40 |
| 39.8 | 7 | 35 | 40 | 35 - 40 |
| 39.9 | 7 | 35 | 40 | 35 - 40 |
| 40 | 8 | 40 | 45 | 40 - 45 |
| 40.1 | 8 | 40 | 45 | 40 - 45 |
| 40.2 | 8 | 40 | 45 | 40 - 45 |
| 40.3 | 8 | 40 | 45 | 40 - 45 |
| 40.4 | 8 | 40 | 45 | 40 - 45 |
| 40.5 | 8 | 40 | 45 | 40 - 45 |
| 40.6 | 8 | 40 | 45 | 40 - 45 |
| 40.7 | 8 | 40 | 45 | 40 - 45 |
| 40.8 | 8 | 40 | 45 | 40 - 45 |
| 40.9 | 8 | 40 | 45 | 40 - 45 |
| 41 | 8 | 40 | 45 | 40 - 45 |
| 41.1 | 8 | 40 | 45 | 40 - 45 |
| 41.2 | 8 | 40 | 45 | 40 - 45 |
| 41.4 | 8 | 40 | 45 | 40 - 45 |
| 41.5 | 8 | 40 | 45 | 40 - 45 |
| 41.6 | 8 | 40 | 45 | 40 - 45 |
| 41.7 | 8 | 40 | 45 | 40 - 45 |
| 41.8 | 8 | 40 | 45 | 40 - 45 |
| 41.9 | 8 | 40 | 45 | 40 - 45 |
| 42 | 8 | 40 | 45 | 40 - 45 |
| 42.1 | 8 | 40 | 45 | 40 - 45 |
| 42.2 | 8 | 40 | 45 | 40 - 45 |
| 42.3 | 8 | 40 | 45 | 40 - 45 |
| 42.4 | 8 | 40 | 45 | 40 - 45 |
| 42.5 | 8 | 40 | 45 | 40 - 45 |
| 42.6 | 8 | 40 | 45 | 40 - 45 |
| 42.7 | 8 | 40 | 45 | 40 - 45 |
| 42.8 | 8 | 40 | 45 | 40 - 45 |
| 42.9 | 8 | 40 | 45 | 40 - 45 |
| 43 | 8 | 40 | 45 | 40 - 45 |
| 43.1 | 8 | 40 | 45 | 40 - 45 |
| 43.2 | 8 | 40 | 45 | 40 - 45 |
| 43.3 | 8 | 40 | 45 | 40 - 45 |
2. Count values per bin
Once we have our bins, we can count the number of values in each bin using a simple COUNT function.
| COUNT | BIN_START | BIN_END | BIN_RANGE |
|---|---|---|---|
| 1 | 20 | 25 | 20 - 25 |
| 2 | 25 | 30 | 25 - 30 |
| 28 | 30 | 35 | 30 - 35 |
| 99 | 35 | 40 | 35 - 40 |
| 191 | 40 | 45 | 40 - 45 |
| 255 | 45 | 50 | 45 - 50 |
| 244 | 50 | 55 | 50 - 55 |
| 180 | 55 | 60 | 55 - 60 |
| 224 | 60 | 65 | 60 - 65 |
| 162 | 65 | 70 | 65 - 70 |
| 45 | 70 | 75 | 70 - 75 |
| 20 | 75 | 80 | 75 - 80 |
| 1 | 80 | 85 | 80 - 85 |
Interpretation: Most days in London fall between 45 - 50 degrees Fahrenheit.
ALTERNATE: Use WIDTH_BUCKET
Snowflake has a handy function width_bucket that will also automatically assign your data into buckets based on a pre-determined range and number of buckets:
exprThe expression for which the histogram is created. This expression must evaluate to a numeric value or to a value that can be implicitly converted to a numeric value.min_valueandmax_valueThe low and high end points of the acceptable range for the expression. The end points must also evaluate to numeric values and not be equal.num_bucketsThe desired number of buckets; must be a positive integer value. A value from the expression is assigned to each bucket, and the function then returns the corresponding bucket number.When an expression falls outside the range, the function returns:0if the expression is less thanmin_value,num_buckets + 1if the expression is greater than or equal tomax_value.
| COUNT | BUCKET |
|---|---|
| 446 | 5 |
| 424 | 6 |
| 127 | 4 |
| 386 | 7 |
| 3 | 3 |
| 1 | 9 |
| 65 | 8 |
How we built this
This page was built using Count, the first notebook built around SQL. It combines the best features of a SQL IDE, Data Visualization Tool, and Computational Notebooks. In the Count notebook, each cell acts like a CTE, meaning you can reference any other cell in your queries.
This makes not only for far more readable reports (like this one) but also a much faster and more powerful way to do your analysis, essentially turning your analysis into a connected graph of data frames rather than one-off convoluted queries and CSV files. And with a built-in visualization framework, you won't have to export your data to make your charts. Go from raw data to interactive report in one document.