Dates and Times
Introduction
BigQuery offers no shortage of functionality to help you get the most out of date and time data, but it can be hard to know what to use when. This notebook will review and compare BigQuery's date and time functionality.
The Basics
Data types
In BigQuery, dates and times can be one of the following datatypes:
DATE
: calendar date (e.g. 2020-01-01)
DATETIME
: calendar date and time (e.g. 2020-01-01 13:04:11)
TIMEZONE
: a particular moment in time, can include timezone but defaults to UTC (e.g. 2020-01-01 13:04:11-5:00)
TIME
: a time as seen on a watch (e.g. 13:04:11)
In general, if you want to work with time zones, you'll need to stick with TIMESTAMPs. Otherwise DATETIME is the most flexible since you can take advantage of date and time functionality.
Converting to date/time data types
If your data isn't in one of these data types, you can convert them by using CAST , one of the PARSE functions, or unix time conversion functions.
Converting using CAST
To convert your STRING to one of the Date data types, your STRING must be in the following formats:
DATE
:
YYYY-MM-DD
DATETIME
:
YYYY-MM-DD HH:MM:SS
TIMESTAMP
:
YYYY-MM-DD HH:MM:SS [timezone]
TIME
:
HH:MM:SS
| datetime | timestamp | date | time |
|---|---|---|---|
| 2017-06-04T14:44:00 | 2017-06-04T12:44:00.000Z | 2017-06-04 | 14:44:00 |
Converting from STRING using PARSE
To use one of the PARSE functions, your STRING can be formatted any way you like, you'll just tell the function how it should read it. There is a PARSE function for each Date/Time Data type:
DATE
:
PARSE_DATE(format_string, date_string)
DATETIME
:
PARSE_DATETIME(format_string, datetime_string)
TIMESTAMP
:
PARSE_TIMESTAMP(format_string, timestamp_string[, timezone])
TIME
:
PARSE_TIME(format_string, time_string)
.You can see the full list of format stringshere
For example, if my date was in this format: Thursday, January 7, 2021 12:44:02, I could use the following:
| parsed_datetime |
|---|
| 2021-01-07T12:04:33 |
Converting from Unix time
BigQuery offers several helper functions to get dates represented as numbers converted to a Date/Time date type.
These include:
DATE
:
DATE_FROM_UNIX_DATE(days since 1970-01-01 00:00:00 UTC)TIMESTAMP
:
TIMESTAMP_SECONDS(seconds since 1970-01-01 00:00:00 UTC)TIMESTAMP_MILLIS(milliseconds since 1970-01-01 00:00:00 UTC)TIMESTAMP_MICROS(microseconds since 1970-01-01 00:00:00 UTC)SELECT
TIMESTAMP_SECONDS(1577836800)| f0_ |
|---|
| 2020-01-01T00:00:00.000Z |
If you need to BigQuery also offers a series of functions to convert your Date/Time data types into Unix time.
Formatting your date/times
When working with dates and times, it's often handy (and helpful for anyone else looking at the data) so see the data in a more approachable format. To do that we can make use of the FORMAT functions in BigQuery.
There is a FORMAT function for each Date/Time Datatype:
DATE
:
FORMAT_DATE(format_string, date)
DATETIME
:
FORMAT_DATETIME(format_string, datetime)
TIMESTAMP
:
FORMAT_TIMESTAMP(format_string, timestamp[, timezone])
TIME
:
FORMAT_TIME(format_string, time)
The format strings here are the same as for the PARSE function here.
So if I want to show my dates in a more common format of: YYYY/DD/MM then I could do FORMAT_DATETIME('%Y/%d/%m', datetime_column
| day_first | weekday | time_of_day |
|---|---|---|
| 2017/30/06 | Friday | 02:44:00 PM |
⚠️ It's worth noting that FORMAT functions return STRINGs so if you wanted to use the result of FORMAT as a DATE, it won't work.
For example, this query:
will return:
Comparing Dates
One of the most common things we use dates for is to compare them. We'll do this to filter for the last week of data, or only for data that was within certain dates.
Comparing with operators
The easiest way to compare dates is to use a comparison operator:
<
,
<=
,
>
,
>=
,
=
,
!=
or
<>
,
[NOT] BETWEEN
,
[NOT] LIKE
,
[NOT] IN
| date |
|---|
| 2021-01-01 |
| 2021-01-15 |
BETWEEN is inclusive, so will include all dates between both dates, including the dates specified
Alternatively, you can use >=,< to achieve the same result:
| date |
|---|
| 2021-01-01 |
| 2021-01-15 |
For filtering dates you can use
STRING
s in the format
YYYY-MM-DD
Dynamic comparisons
Sometimes we want our query to always pull the last n days so comparing to a single date means we'll have to keep updating our query. For that we can use dynamic comparisons instead.
The most common functions we'll use for dynamic comparisons are CURRENT_[date_part] and [date_part]_DIFF
To return the current date, datetime, time, or timestamp, you can use the CURRENT_[date part] function in BigQuery. They will return the type you've specified, so you can use it to compare against other dates.
CURRENT_DATE([time_zone])
CURRENT_DATETIME([time_zone])
CURRENT_TIMESTAMP()
CURRENT_TIME([time_zone])
| current_date | current_datetime | current_timestamp | current_time |
|---|---|---|---|
| 2021-08-19 | 2021-08-19T12:54:39.642536 | 2021-08-19T12:54:39.642Z | 12:54:39.642536 |
To find the difference in two dates, use [date_part]_DIFF:
DATE_DIFF(date_expression_a, date_expression_b, part)
(
parts available
)
DATETIME_DIFF(datetime_expression_a, datetime_expression_b, part)
(
parts available
)
TIMESTAMP_DIFF(timestamp_expression_a, timestamp_expression_b, part)
(
parts available
)
TIME_DIFF(time_expression_a, time_expression_b, part)
(
parts available
)
What if we wanted to filter our data for the last 12 months? To do that, we'll compare our date column to the CURRENT_DATE and make sure the difference is 12 or fewer months.
WHERE date_diff(current_date(),date,DAY) <= X| date |
|---|
| 2021-04-06 |
| 2021-06-05 |
What if we wanted to find the difference between 2 dates- the expected arrival date, and the actual arrival date, but we wanted to exclude Sundays when no deliveries take place?
| days_late |
|---|
| 2 |
To do this we can take the total days (3) and subtract the weeks difference (1) which will include Sundays.
Adding/Subtracting Dates
To perform any kind of data transformations like adding a year to a date, or subtracting 2 weeks, then we can use the [date_part]_ADD and [date_part]_SUB functions.
To add an interval to a date/time in BigQuery we can use any of:
DATE_ADD(date_expression, INTERVAL int64_expression part)
DATETIME_ADD(datetime_expression, INTERVAL int64_expression part)
TIMESTAMP_ADD(timestamp_expression, INTERVAL int64_expression part)
TIME_ADD(time_expression, INTERVAL int64_expression part)
| original_date | one_hour_later | one_week_later | one_quarter_later |
|---|---|---|---|
| 2020-08-05T12:00:00 | 2020-08-05T13:00:00 | 2020-08-12T12:00:00 | 2020-11-05T12:00:00 |
What if we had contract start dates, contract durations, and we wanted to calculate the contract end date?
| contract_start | duration_months | contract_end |
|---|---|---|
| 2019-05-20 | 8 | 2020-01-20 |
| 2019-09-18 | 3 | 2019-12-18 |
| 2019-12-11 | 18 | 2021-06-11 |
To subtract an interval from a date/time in BigQuery, we can use any of:
DATE_SUB(date_expression, INTERVAL int64_expression part)
DATETIME_SUB(datetime_expression, INTERVAL int64_expression part)
TIMESTAMP_SUB(timestamp_expression, INTERVAL int64_expression part)
TIME_SUB(time_expression, INTERVAL int64_expression part)
| original_date | one_hour_earlier | one_week_earlier | one_quarter_earlier |
|---|---|---|---|
| 2020-08-05T12:00:00 | 2020-08-05T11:00:00 | 2020-07-29T12:00:00 | 2020-05-05T12:00:00 |
How to subtract 3 business days from a date?
For this we'll use different DATE_SUB intervals depending on what day of the week it is.
| due_date | Weekday | three_days_ago | weekday_3d_ago |
|---|---|---|---|
| 2019-05-20 | Monday | 2019-05-15 | Wednesday |
| 2019-09-18 | Wednesday | 2019-09-13 | Friday |
| 2019-12-15 | Sunday | 2019-12-11 | Wednesday |
Grouping Dates
When we're analysing date/time data we want to group our data by a different date part (e.g. Yearly, Quarterly, etc.). There are a few ways to group our Date/Times BigQuery.
Truncating dates and times
Truncating a date / time means you group the date by a specific date part. For example truncating Tuesday 15 December 2020 to the WEEK would return the first day of the week: Sunday 13 December 2020, to the YEAR would return Wednesday 1 Jan 2020, etc.
DATE_TRUNC(date_expression, part)
(
parts available
)
DATETIME_TRUNC(datetime_expression, part)
(
parts available
)
TIMESTAMP_TRUNC(timestamp_expression, part)
(
parts available
)
TIME_TRUNC(time_expression, part)
(
parts available
)
| original_date | first_day_of_week | first_day_of_month |
|---|---|---|
| 2020-12-15 | 2020-12-13 | 2020-12-01 |
⚠️ It's important to note that TRUNC functions return another Date / Time object. So you can use the results to compare to other date / times.
To get the LAST day of each date_part you can use the LAST_DAY function instead of the TRUNC functions above.
| original_date | last_day_of_week | last_day_of_month |
|---|---|---|
| 2020-12-15 | 2020-12-19 | 2020-12-31 |
Extracting date and time parts
Alternatively, you can EXTRACT a date_part from a Date / Time. This is helpful if you want to do some arithmetic with your date_part, like comparing the number of visitors on your website regardless of the date.
EXTRACT(part FROM date_expression)
EXTRACT(part FROM datetime_expression)
EXTRACT(part FROM timestamp_expression)
EXTRACT(part FROM time_expression)
| date | year | week | weekday |
|---|---|---|---|
| 2015-12-23 | 2015 | 51 | 4 |
| 2015-12-24 | 2015 | 51 | 5 |
| 2015-12-25 | 2015 | 51 | 6 |
| 2015-12-26 | 2015 | 51 | 7 |
| 2015-12-27 | 2015 | 52 | 1 |
| 2015-12-28 | 2015 | 52 | 2 |
| 2015-12-29 | 2015 | 52 | 3 |
| 2015-12-30 | 2015 | 52 | 4 |
| 2015-12-31 | 2015 | 52 | 5 |
| 2016-01-01 | 2016 | 0 | 6 |
| 2016-01-02 | 2016 | 0 | 7 |
| 2016-01-03 | 2016 | 1 | 1 |
| 2016-01-04 | 2016 | 1 | 2 |
| 2016-01-05 | 2016 | 1 | 3 |
| 2016-01-06 | 2016 | 1 | 4 |
| 2016-01-07 | 2016 | 1 | 5 |
| 2016-01-08 | 2016 | 1 | 6 |
| 2016-01-09 | 2016 | 1 | 7 |
The difference between EXTRACT and TRUNC
The key difference is the data types returned. TRUNC will return a DATE, DATETIME, TIMESTAMP, TIME object, and in most cases EXTRACT returns an INT64.