Histograms
Histograms are essential tools in understanding how our data is shaped or distributed. Mastering the art of building and customizing histograms in SQL can be tricky at first, but once you've got it down it's a recipe you'll be coming back to again and again.
Step-by-step
The data
We'll use some historical weather data for London for this example. We can see the temperature (in Fahrenheit) for each day below:
Select DATE, TEMP from PUBLIC.LONDON_WEATHER
1. Create bins
Histograms count the number of values that fall in a certain bin, so the first thing we'll do is create our bins. We'll do this using the FLOOR
function, which will round down a value to the nearest integer.
If we want to create bins of 5 degrees, we can divide our temperature by 5, round down, then multiply that by 5 to get the lower end of the bin.
FLOOR(value / bin_size) * bin_size
For example:
SELECT distinct
A.TEMP,
floor(A.TEMP / 5) floor_div_5,
floor(A.TEMP/ 5) * 5 bin_start,
floor(A.TEMP / 5) * 5 + 5 bin_end,
concat(
cast(floor(A.TEMP / 5) * 5 as STRING),
' - ',
cast(floor(A.TEMP / 5) * 5 + 5 as STRING)
) bin_range
FROM A
ORDER BY TEMP
2. Count values per bin
Once we have our bins, we can count the number of values in each bin using a simple COUNT
function.
SELECT
COUNT(DATE) COUNT,
floor(A.TEMP / 5) * 5 bin_start,
floor(A.TEMP / 5) * 5 + 5 bin_end,
concat(
cast(floor(A.TEMP / 5) * 5 as STRING),
' - ',
cast(floor(A.TEMP / 5) * 5 + 5 as STRING)
) bin_range
FROM A
GROUP BY 2, 3, 4
ORDER BY BIN_START

Interpretation: Most days in London fall between 45 - 50 degrees Fahrenheit.
ALTERNATE: Use WIDTH_BUCKET
Snowflake has a handy function width_bucket
that will also automatically assign your data into buckets based on a pre-determined range and number of buckets:
WIDTH_BUCKET( <expr> , <min_value> , <max_value> , <num_buckets> )
expr
The expression for which the histogram is created. This expression must evaluate to a numeric value or to a value that can be implicitly converted to a numeric value.min_value
and max_value
The low and high end points of the acceptable range for the expression. The end points must also evaluate to numeric values and not be equal.num_buckets
The desired number of buckets; must be a positive integer value. A value from the expression is assigned to each bucket, and the function then returns the corresponding bucket number.When an expression falls outside the range, the function returns:0
if the expression is less than min_value
, num_buckets + 1
if the expression is greater than or equal to max_value
.SELECT COUNT(DATE) count, width_bucket(TEMP, 0, 100, 10) bucket
FROM A
GROUP BY bucket

How we built this
This page was built using Count, the first notebook built around SQL. It combines the best features of a SQL IDE, Data Visualization Tool, and Computational Notebooks. In the Count notebook, each cell acts like a CTE, meaning you can reference any other cell in your queries.
This makes not only for far more readable reports (like this one) but also a much faster and more powerful way to do your analysis, essentially turning your analysis into a connected graph of data frames rather than one-off convoluted queries and CSV files. And with a built-in visualization framework, you won't have to export your data to make your charts. Go from raw data to interactive report in one document.