Dates and Times
Introduction
BigQuery offers no shortage of functionality to help you get the most out of date and time data, but it can be hard to know what to use when. This notebook will review and compare BigQuery's date and time functionality.
The Basics
Data types
In BigQuery, dates and times can be one of the following datatypes:
DATE
: calendar date (e.g. 2020-01-01)DATETIME
: calendar date and time (e.g. 2020-01-01 13:04:11)TIMEZONE
: a particular moment in time, can include timezone but defaults to UTC (e.g. 2020-01-01 13:04:11-5:00)TIME
: a time as seen on a watch (e.g. 13:04:11)In general, if you want to work with time zones, you'll need to stick with TIMESTAMP
s. Otherwise DATETIME
is the most flexible since you can take advantage of date and time functionality.
Converting to date/time data types
If your data isn't in one of these data types, you can convert them by using CAST
, one of the PARSE
functions, or unix time conversion functions.
Converting using CAST
To convert your STRING
to one of the Date data types, your STRING
must be in the following formats:
DATE
: YYYY-MM-DD
DATETIME
: YYYY-MM-DD HH:MM:SS
TIMESTAMP
: YYYY-MM-DD HH:MM:SS [timezone]
TIME
: HH:MM:SS
SELECT
CAST('2017-06-04 14:44:00' AS DATETIME) AS datetime,
CAST('2017-06-04 14:44:00 Europe/Berlin' AS TIMESTAMP) AS timestamp,
CAST('2017-06-04' AS DATE) AS date,
CAST( '14:44:00' as TIME) time
Converting from STRING
using PARSE
To use one of the PARSE
functions, your STRING
can be formatted any way you like, you'll just tell the function how it should read it. There is a PARSE
function for each Date/Time Data type:
DATE
: PARSE_DATE(format_string, date_string)
DATETIME
: PARSE_DATETIME(format_string, datetime_string)
TIMESTAMP
: PARSE_TIMESTAMP(format_string, timestamp_string[, timezone])
TIME
: PARSE_TIME(format_string, time_string)
.You can see the full list of format stringshere
For example, if my date was in this format: Thursday, January 7, 2021 12:44:02
, I could use the following:
SELECT
PARSE_DATETIME('%A %B %d, %Y %H:%M:%S', 'Thursday January 7, 2021 12:04:33') AS parsed_datetime
Converting from Unix time
BigQuery offers several helper functions to get dates represented as numbers converted to a Date/Time date type.
These include:
DATE
:DATE_FROM_UNIX_DATE(days since 1970-01-01 00:00:00 UTC)
TIMESTAMP
:TIMESTAMP_SECONDS(seconds since 1970-01-01 00:00:00 UTC)
TIMESTAMP_MILLIS(milliseconds since 1970-01-01 00:00:00 UTC)
TIMESTAMP_MICROS(microseconds since 1970-01-01 00:00:00 UTC)
SELECT
TIMESTAMP_SECONDS(1577836800)
If you need to BigQuery also offers a series of functions to convert your Date/Time data types into Unix time.
Formatting your date/times
When working with dates and times, it's often handy (and helpful for anyone else looking at the data) so see the data in a more approachable format. To do that we can make use of the FORMAT
functions in BigQuery.
There is a FORMAT
function for each Date/Time Datatype:
DATE
: FORMAT_DATE(format_string, date)
DATETIME
: FORMAT_DATETIME(format_string, datetime)
TIMESTAMP
: FORMAT_TIMESTAMP(format_string, timestamp[, timezone])
TIME
: FORMAT_TIME(format_string, time)
The format strings here are the same as for the PARSE
function here.
So if I want to show my dates in a more common format of: YYYY/DD/MM
then I could do FORMAT_DATETIME('%Y/%d/%m', datetime_column
SELECT
FORMAT_DATETIME('%Y/%d/%m', CAST('2017-06-30 14:44:00' AS DATETIME)) AS day_first,
FORMAT_DATETIME('%A', CAST('2017-06-30 14:44:00' AS DATETIME)) AS weekday,
FORMAT_DATETIME('%r', CAST('2017-06-30 14:44:00' AS DATETIME)) AS time_of_day
⚠️ It's worth noting that FORMAT
functions return STRING
s so if you wanted to use the result of FORMAT
as a DATE
, it won't work.
For example, this query:
SELECT
DATE_DIFF(CAST('2017-06-04' AS DATE), FORMAT_DATE('%Y/%m/%d', CAST('2017-06-04' AS DATE)), DAY)
will return:
No matching signature for function DATE_DIFF for argument types: DATE, STRING, DATE_TIME_PART. Supported signatures: DATE_DIFF(DATE, DATE, DATE_TIME_PART); DATE_DIFF(DATETIME, DATETIME, DATE_TIME_PART); DATE_DIFF(TIMESTAMP, TIMESTAMP, DATE_TIME_PART)
Comparing Dates
One of the most common things we use dates for is to compare them. We'll do this to filter for the last week of data, or only for data that was within certain dates.
Comparing with operators
The easiest way to compare dates is to use a comparison operator:
<
, <=
, >
, >=
, =
, !=
or <>
, [NOT] BETWEEN
, [NOT] LIKE
, [NOT] IN
SELECT
date
FROM
(
SELECT
CAST('2021-01-01' AS DATE) AS date
UNION ALL
( SELECT
CAST('2021-01-15' AS DATE) AS date)
UNION ALL
( SELECT
CAST('2021-02-01' AS DATE) AS date)
) AS table_3
WHERE
(date BETWEEN '2021-01-01' AND '2021-01-31')
BETWEEN is inclusive, so will include all dates between both dates, including the dates specified
Alternatively, you can use >=
,<
to achieve the same result:
SELECT
date
FROM
(
SELECT
CAST('2021-01-01' AS DATE) AS date
UNION ALL
( SELECT
CAST('2021-01-15' AS DATE) AS date)
UNION ALL
( SELECT
CAST('2021-02-01' AS DATE) AS date)
) AS table_3
WHERE
((date >= '2021-01-01') AND (date < '2021-02-01'))
For filtering dates you can useSTRING
s in the formatYYYY-MM-DD
Dynamic comparisons
Sometimes we want our query to always pull the last n days so comparing to a single date means we'll have to keep updating our query. For that we can use dynamic comparisons instead.
The most common functions we'll use for dynamic comparisons are CURRENT_[date_part]
and [date_part]_DIFF
To return the current date, datetime, time, or timestamp, you can use the CURRENT_[date part]
function in BigQuery. They will return the type you've specified, so you can use it to compare against other dates.
CURRENT_DATE([time_zone])
CURRENT_DATETIME([time_zone])
CURRENT_TIMESTAMP()
CURRENT_TIME([time_zone])
SELECT
CURRENT_DATE() AS current_date,
CURRENT_DATETIME() AS current_datetime,
CURRENT_TIMESTAMP() AS current_timestamp,
CURRENT_TIME() AS current_time
To find the difference in two dates, use [date_part]_DIFF
:
DATE_DIFF(date_expression_a, date_expression_b, part)
(parts available)DATETIME_DIFF(datetime_expression_a, datetime_expression_b, part)
(parts available)TIMESTAMP_DIFF(timestamp_expression_a, timestamp_expression_b, part)
(parts available)TIME_DIFF(time_expression_a, time_expression_b, part)
(parts available)What if we wanted to filter our data for the last 12 months? To do that, we'll compare our date column to the CURRENT_DATE
and make sure the difference is 12 or fewer months.
WHERE date_diff(current_date(),date,DAY) <= X
SELECT
date
FROM
(
SELECT
CAST('2021-04-06' AS DATE) AS date
UNION ALL
( SELECT
CAST('2021-06-05' AS DATE) AS date)
UNION ALL
( SELECT
CAST('2020-04-04' AS DATE) AS date)
) AS table_3
WHERE
(DATE_DIFF(CURRENT_DATE(), date, MONTH) < 12)
What if we wanted to find the difference between 2 dates- the expected arrival date, and the actual arrival date, but we wanted to exclude Sundays when no deliveries take place?
SELECT
(DATE_DIFF(actual_delivery, est_delivery, DAY) - DATE_DIFF(actual_delivery, est_delivery, WEEK)) AS days_late
FROM
(
SELECT
CAST('2021-01-22' AS DATE) AS est_delivery,
CAST('2021-01-25' AS DATE) AS actual_delivery
) AS table_1
To do this we can take the total days (3) and subtract the weeks difference (1) which will include Sundays.
Adding/Subtracting Dates
To perform any kind of data transformations like adding a year to a date, or subtracting 2 weeks, then we can use the [date_part]_ADD
and [date_part]_SUB
functions.
To add an interval to a date/time in BigQuery we can use any of:
DATE_ADD(date_expression, INTERVAL int64_expression part)
DATETIME_ADD(datetime_expression, INTERVAL int64_expression part)
TIMESTAMP_ADD(timestamp_expression, INTERVAL int64_expression part)
TIME_ADD(time_expression, INTERVAL int64_expression part)
SELECT
CAST('2020-08-05 12:00:00' AS DATETIME) AS original_date,
DATETIME_ADD(CAST('2020-08-05 12:00:00' AS DATETIME), INTERVAL 1 HOUR) AS one_hour_later,
DATETIME_ADD(CAST('2020-08-05 12:00:00' AS DATETIME), INTERVAL 1 WEEK) AS one_week_later,
DATETIME_ADD(CAST('2020-08-05 12:00:00' AS DATETIME), INTERVAL 1 QUARTER) AS one_quarter_later
What if we had contract start dates, contract durations, and we wanted to calculate the contract end date?
SELECT
*,
DATE_ADD(CAST(contract_start AS DATE), INTERVAL duration_months MONTH) AS contract_end
FROM
(
SELECT
'2019-05-20' AS contract_start,
8 AS duration_months
UNION ALL
( SELECT
'2019-09-18' AS contract_start,
3 AS duration_months)
UNION ALL
( SELECT
'2019-12-11' AS contract_start,
18 AS duration_months)
) AS table_3
To subtract an interval from a date/time in BigQuery, we can use any of:
DATE_SUB(date_expression, INTERVAL int64_expression part)
DATETIME_SUB(datetime_expression, INTERVAL int64_expression part)
TIMESTAMP_SUB(timestamp_expression, INTERVAL int64_expression part)
TIME_SUB(time_expression, INTERVAL int64_expression part)
SELECT
CAST('2020-08-05 12:00:00' AS DATETIME) AS original_date,
DATETIME_SUB(CAST('2020-08-05 12:00:00' AS DATETIME), INTERVAL 1 HOUR) AS one_hour_earlier,
DATETIME_SUB(CAST('2020-08-05 12:00:00' AS DATETIME), INTERVAL 1 WEEK) AS one_week_earlier,
DATETIME_SUB(CAST('2020-08-05 12:00:00' AS DATETIME), INTERVAL 1 QUARTER) AS one_quarter_earlier
How to subtract 3 business days from a date?
For this we'll use different DATE_SUB
intervals depending on what day of the week it is.
SELECT
due_date,
FORMAT_DATE('%A', due_date) AS Weekday,
CASE WHEN EXTRACT(DAYOFWEEK FROM due_date) IN (2, 3, 4) THEN DATE_SUB(due_date, INTERVAL 5 DAY) WHEN (EXTRACT(DAYOFWEEK FROM due_date) = 1) THEN DATE_SUB(due_date, INTERVAL 4 DAY) ELSE DATE_SUB(due_date, INTERVAL 3 DAY) END AS three_days_ago,
FORMAT_DATE('%A', CASE WHEN EXTRACT(DAYOFWEEK FROM due_date) IN (2, 3, 4) THEN DATE_SUB(due_date, INTERVAL 5 DAY) WHEN (EXTRACT(DAYOFWEEK FROM due_date) = 1) THEN DATE_SUB(due_date, INTERVAL 4 DAY) ELSE DATE_SUB(due_date, INTERVAL 3 DAY) END) AS weekday_3d_ago
FROM
(
SELECT
CAST('2019-05-20' AS DATE) AS due_date
UNION ALL
( SELECT
CAST('2019-09-18' AS DATE) AS due_date)
UNION ALL
( SELECT
CAST('2019-12-15' AS DATE) AS due_date)
) AS table_3
Grouping Dates
When we're analysing date/time data we want to group our data by a different date part (e.g. Yearly, Quarterly, etc.). There are a few ways to group our Date/Times BigQuery.
Truncating dates and times
Truncating a date / time means you group the date by a specific date part. For example truncating Tuesday 15 December 2020
to the WEEK
would return the first day of the week: Sunday 13 December 2020
, to the YEAR
would return Wednesday 1 Jan 2020
, etc.
DATE_TRUNC(date_expression, part)
(parts available)DATETIME_TRUNC(datetime_expression, part)
(parts available)TIMESTAMP_TRUNC(timestamp_expression, part)
(parts available)TIME_TRUNC(time_expression, part)
(parts available)SELECT
CAST('2020-12-15' AS DATE) AS original_date,
DATE_TRUNC(CAST('2020-12-15' AS DATE), WEEK) AS first_day_of_week,
DATE_TRUNC(CAST('2020-12-15' AS DATE), MONTH) AS first_day_of_month
⚠️ It's important to note that TRUNC
functions return another Date / Time object. So you can use the results to compare to other date / times.
To get the LAST day of each date_part you can use the LAST_DAY
function instead of the TRUNC
functions above.
SELECT
CAST('2020-12-15' AS DATE) AS original_date,
LAST_DAY(CAST('2020-12-15' AS DATE), WEEK) AS last_day_of_week,
LAST_DAY(CAST('2020-12-15' AS DATE), MONTH) AS last_day_of_month
Extracting date and time parts
Alternatively, you can EXTRACT
a date_part
from a Date / Time. This is helpful if you want to do some arithmetic with your date_part
, like comparing the number of visitors on your website regardless of the date.
EXTRACT(part FROM date_expression)
EXTRACT(part FROM datetime_expression)
EXTRACT(part FROM timestamp_expression)
EXTRACT(part FROM time_expression)
SELECT
date,
EXTRACT(YEAR FROM date) AS year,
EXTRACT(WEEK FROM date) AS week,
EXTRACT(DAYOFWEEK FROM date) AS weekday
FROM
UNNEST(GENERATE_DATE_ARRAY('2015-12-23', '2016-01-09')) AS date
ORDER BY
date ASC
The difference between EXTRACT
and TRUNC
The key difference is the data types returned. TRUNC
will return a DATE
, DATETIME
, TIMESTAMP
, TIME
object, and in most cases EXTRACT
returns an INT64
.
SELECT
CAST('2020-04-02 13:22:44' AS DATETIME) AS original,
EXTRACT(HOUR FROM CAST('2020-04-02 13:22:44' AS DATETIME)) AS hour_extacted,
DATETIME_TRUNC(CAST('2020-04-02 13:22:44' AS DATETIME), HOUR) AS hour_truncated