Dates
Introduction
Dates and times can often trip up new SQL users due to the surprising amount of functionality exposed by most SQL databases. Getting the manipulation of dates and times correct when writing a query is often crucial in generating the right results, so it's important to understand the basics.
Date and time functionality and syntax differs between databases. In this article the code snippets are written in the Google BigQuery Standard SQL syntax
The basics
Data types
Most databases support a variety of date and time data types appropriate for different situations.
Most often you'll see
DATE: calendar date (e.g. 2020-01-01)DATETIME: calendar date and time (e.g. 2020-01-01 13:04:11)TIMESTAMP: a particular moment in time, can include timezone but defaults to UTC (e.g. 2020-01-01 13:04:11-5:00)TIME: a time as seen on a watch (e.g. 13:04:11)
For example,
| date | datetime | timestamp | time |
|---|---|---|---|
| 2020-01-01 | 2020-01-01T00:00:00 | 2020-01-01T00:00:00.000Z | 00:00:00 |
The 'Z' you see in a timestamp refers to the 'Zulu' timezone, the old nautical name for GMT or UTC.
Converting between date & time data types
There are many ways to convert between these data types, often a source of confusion. Here are the common categories of conversion functions.
The CAST function is an explicit instruction to the database to interpret a value as the given type (as opposed to implicit type conversions which commonly occur, for example, when using the SQLite database).
| date | datetime | timestamp | date_2 |
|---|---|---|---|
| 2020-01-01 | 2020-01-01T00:00:00 | 2020-01-01T00:00:00.000Z | 2020-01-01 |
Converting from a string
This function has different names in different databases, e.g. PARSE / PARSE_DATE / CONVERT / DATE_PARSE. These functions all have something in common though - they take a string and return some form of date / time type.
Some also require the explicit date/time format string too - note how the following all return the same date:
| iso_date | uk_date | ill_allow_it | just_no |
|---|---|---|---|
| 2020-01-01 | 2020-01-01 | 2020-01-01 | 2020-01-01 |
Converting from a number
Dates and times are most commonly represented as a number in the form of unix time - the number of seconds (or milliseconds) since the start of 1970. Again, these conversion functions have a variety of different names, such as TIMESTAMP_SECONDS / FROM_UNIXTIME / UNIX_TIMESTAMP.
It is important to know what format the arguments need to take (e.g. seconds / milliseconds / days), and what format the output takes (e.g. date / timestamp).
| timestamp | date |
|---|---|
| 2020-01-01T00:00:00.000Z | 2020-01-01 |
Formatting dates and times
This is often the opposite of the process above - converting a date/time object into a human-readable string. These functions will have different names (e.g. TO_CHAR / DATE_FORMAT / FORMAT) but have one thing in common - you'll have to provide a format string to tell the database exactly how to format the date/time.
Some common parts of format strings include:
%Y- the full year%d- the day of the month%m- the month as a number%b- the month as an abbreviated name
| year_only | year_month | month_day | oh_do_grow_up |
|---|---|---|---|
| 2020 | 2020-01 | Jan-01 | January 1 (💩) |
Date arithmetic
Dates and times represent a single point in time, so directly adding or subtracting two dates doesn't really make sense. Instead, you should think of adding or subtracting intervals
Date + Interval = DateDate - Interval = DateInterval + Interval = IntervalDate + Date = ??
The function names you may see include DATE_ADD / DATETIME_ADD / TIMESTAMP_ADD / DATEADD / ADDDATE / DATE_SUB / DATESUB / SUBDATE etc.
| tomorrow | yesterday | all_our_troubles_seemed_so_far_away |
|---|---|---|
| 2020-01-02 | 2019-12-31 | 2030-01-01 |
Grouping dates and times
When grouping by a date/time column in a SQL query, it is often appropriate to round the date/time to the nearest day/month/week etc - there tend not to be that many events occurring in the same millisecond!
There are multiple ways to do this:
- Truncation - removing the extra, finer time information and creating a new date/time object
- Extraction - extracting the relevant part of the date/time as a number
Truncating
Truncation rounds date/times down to the nearest 'datetime part' that you specify. These functions often have 'TRUNC' in the name, for example DATE_TRUNC / DATETIME_TRUNC / TRUNC.
Note in the following that the output timestamp retains the same precision to the millisecond, but has been rounded down.
| to_year | to_month | to_day |
|---|---|---|
| 2007-01-01T00:00:00.000Z | 2007-06-01T00:00:00.000Z | 2007-06-05T00:00:00.000Z |
Extracting
Extraction takes a part of a date/time and returns it as a simple number which has no notion of coming from a date/time object. For this reason it is best to leave any use of the EXTRACT / DATEPART function until the end of your analysis.
| year | month | day |
|---|---|---|
| 2020 | 2 | 1 |