SQL Resources
SQL
Histograms

Histograms

Histograms are essential tools in understanding how our data is shaped or distributed. Mastering the art of building and customizing histograms in SQL can be tricky at first, but once you've got it down it's a recipe you'll be coming back to again and again.

Step-by-step

The data

We'll use some historical weather data for London for this example. We can see the temperature (in Fahrenheit) for each day below:

Select DATE, TEMP from PUBLIC.LONDON_WEATHER 
DATETEMP
2018-04-1148.5
2016-10-3154.7
2018-11-2344.2
2018-01-0941.6
2017-01-2631
2017-02-1342.8
2016-12-3142.8
2018-05-3165.7
2017-01-2737.8
2018-03-0230.6
2019-02-1647.4
2018-02-1947.7
2017-01-3145
2019-12-0438.9
2017-11-0247.5
2017-03-1154.2
2018-05-2959.5
2017-03-1648.7
2018-03-2446.2
2016-12-2836
2019-11-0940.4
2018-05-3060
2018-04-1455.2
2019-12-2846.4
2017-02-0639.2
2019-04-0242.6
2019-02-2848.8
2018-03-0443
2017-10-1856.1
2016-12-1448.8
2018-11-0148
2016-11-2147.8
2017-08-2162.1
2017-12-1034.9
2016-06-0459.9
2019-11-0840.7
2018-10-0560.5
2017-02-1444.6
2019-04-3053.5
2018-10-0650.1
2019-02-1749
2018-11-2853.3
2016-04-1152.5
2016-12-1052.8
2018-02-2047.3
2016-01-1736.3
2017-12-0240.2
2016-09-2261.4
2018-10-2053.1
2019-01-1145.5
2017-05-0754.5
2018-03-0943.6
2019-04-0647.4
2018-02-2824.6
2018-03-1051.5
2018-05-2559.5
2019-06-2567.9
2019-06-1354.8
2018-04-0246.9
2018-12-2541.5
2017-04-0453.1
2018-07-2964.6
2019-02-0848.6
2019-01-2334.8
2018-05-2769.3
2018-04-2261.7
2019-12-2046.8
2016-06-0962.7
2016-04-1550
2018-09-1256.4
2017-12-0748
2019-11-1643.1
2017-09-1953.7
2016-11-0343.5
2016-04-0250.3
2016-05-1755.3
2019-02-2650.7
2018-11-2643.1
2016-12-0135
2019-11-1843.6
2019-04-2360.4
2019-08-2576.3
2017-03-1049.1
2018-06-0265.4
2019-01-2935.4
2018-10-2549.6
2018-11-1452.1
2018-02-2237.7
2017-04-0351.5
2018-05-0957.1
2018-03-2141.5
2016-04-0145.9
2016-04-2151.1
2019-01-1645.6
2018-09-1166.6
2018-11-0950.9
2016-09-0564.6
2017-10-1361.4
2018-04-0351.8
2019-06-0858.4

1. Create bins

Histograms count the number of values that fall in a certain bin, so the first thing we'll do is create our bins. We'll do this using the FLOOR function, which will round down a value to the nearest integer.

If we want to create bins of 5 degrees, we can divide our temperature by 5, round down, then multiply that by 5 to get the lower end of the bin.

FLOOR(value / bin_size) * bin_size

For example:

SELECT distinct
  A.TEMP,
  floor(A.TEMP / 5) floor_div_5, 
  floor(A.TEMP/ 5) * 5 bin_start,
  floor(A.TEMP / 5) * 5 + 5 bin_end, 
  concat(
    cast(floor(A.TEMP / 5) * 5 as STRING),
    ' - ',
    cast(floor(A.TEMP / 5) * 5 + 5 as STRING)
  ) bin_range
FROM A
ORDER BY TEMP
TEMPFLOOR_DIV_5BIN_STARTBIN_ENDBIN_RANGE
24.64202520 - 25
27.35253025 - 30
27.95253025 - 30
30.66303530 - 35
316303530 - 35
31.46303530 - 35
31.76303530 - 35
326303530 - 35
32.36303530 - 35
32.56303530 - 35
32.76303530 - 35
33.36303530 - 35
33.76303530 - 35
33.86303530 - 35
34.26303530 - 35
34.36303530 - 35
34.46303530 - 35
34.56303530 - 35
34.76303530 - 35
34.86303530 - 35
34.96303530 - 35
357354035 - 40
35.17354035 - 40
35.27354035 - 40
35.37354035 - 40
35.47354035 - 40
35.57354035 - 40
35.67354035 - 40
35.77354035 - 40
35.87354035 - 40
35.97354035 - 40
367354035 - 40
36.17354035 - 40
36.27354035 - 40
36.37354035 - 40
36.47354035 - 40
36.57354035 - 40
36.67354035 - 40
36.77354035 - 40
36.87354035 - 40
36.97354035 - 40
37.17354035 - 40
37.27354035 - 40
37.37354035 - 40
37.47354035 - 40
37.57354035 - 40
37.67354035 - 40
37.77354035 - 40
37.87354035 - 40
37.97354035 - 40
387354035 - 40
38.17354035 - 40
38.37354035 - 40
38.47354035 - 40
38.57354035 - 40
38.77354035 - 40
38.87354035 - 40
38.97354035 - 40
397354035 - 40
39.17354035 - 40
39.27354035 - 40
39.37354035 - 40
39.47354035 - 40
39.57354035 - 40
39.67354035 - 40
39.87354035 - 40
39.97354035 - 40
408404540 - 45
40.18404540 - 45
40.28404540 - 45
40.38404540 - 45
40.48404540 - 45
40.58404540 - 45
40.68404540 - 45
40.78404540 - 45
40.88404540 - 45
40.98404540 - 45
418404540 - 45
41.18404540 - 45
41.28404540 - 45
41.48404540 - 45
41.58404540 - 45
41.68404540 - 45
41.78404540 - 45
41.88404540 - 45
41.98404540 - 45
428404540 - 45
42.18404540 - 45
42.28404540 - 45
42.38404540 - 45
42.48404540 - 45
42.58404540 - 45
42.68404540 - 45
42.78404540 - 45
42.88404540 - 45
42.98404540 - 45
438404540 - 45
43.18404540 - 45
43.28404540 - 45
43.38404540 - 45

2. Count values per bin

Once we have our bins, we can count the number of values in each bin using a simple COUNT function.

SELECT
  COUNT(DATE) COUNT,
  floor(A.TEMP / 5) * 5 bin_start,
  floor(A.TEMP / 5) * 5 + 5 bin_end, 
  concat(
    cast(floor(A.TEMP / 5) * 5 as STRING),
    ' - ',
    cast(floor(A.TEMP / 5) * 5 + 5 as STRING)
  ) bin_range
FROM A 
GROUP BY 2, 3, 4
ORDER BY BIN_START
COUNTBIN_STARTBIN_ENDBIN_RANGE
1202520 - 25
2253025 - 30
28303530 - 35
99354035 - 40
191404540 - 45
255455045 - 50
244505550 - 55
180556055 - 60
224606560 - 65
162657065 - 70
45707570 - 75
20758075 - 80
1808580 - 85
Interpretation: Most days in London fall between 45 - 50 degrees Fahrenheit.

ALTERNATE: Use WIDTH_BUCKET

Snowflake has a handy function width_bucket that will also automatically assign your data into buckets based on a pre-determined range and number of buckets:

WIDTH_BUCKET( <expr> , <min_value> , <max_value> , <num_buckets> )
  • expr The expression for which the histogram is created. This expression must evaluate to a numeric value or to a value that can be implicitly converted to a numeric value.
  • min_value and max_valueThe low and high end points of the acceptable range for the expression. The end points must also evaluate to numeric values and not be equal.
  • num_buckets The desired number of buckets; must be a positive integer value. A value from the expression is assigned to each bucket, and the function then returns the corresponding bucket number.When an expression falls outside the range, the function returns:0 if the expression is less than min_value, num_buckets + 1 if the expression is greater than or equal to max_value.
  • SELECT COUNT(DATE) count, width_bucket(TEMP, 0, 100, 10) bucket
    FROM A
    GROUP BY bucket
    COUNTBUCKET
    4465
    4246
    1274
    3867
    33
    19
    658

    How we built this

    This page was built using Count, the first notebook built around SQL. It combines the best features of a SQL IDE, Data Visualization Tool, and Computational Notebooks. In the Count notebook, each cell acts like a CTE, meaning you can reference any other cell in your queries.

    This makes not only for far more readable reports (like this one) but also a much faster and more powerful way to do your analysis, essentially turning your analysis into a connected graph of data frames rather than one-off convoluted queries and CSV files. And with a built-in visualization framework, you won't have to export your data to make your charts. Go from raw data to interactive report in one document.

    Subscribe to newsletter

    Subscribe to receive the latest blog posts to your inbox every week.

    By subscribing you agree to our Privacy Policy.
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.

    Start solving your organization's biggest problems with Count today.

    Stay up to date with all things count, data and problem solving.