Dates and times can often trip up new SQL users due to the surprising amount of functionality exposed by most SQL databases. Getting the manipulation of dates and times correct when writing a query is often crucial in generating the right results, so it's important to understand the basics.
Most databases support a variety of date and time data types appropriate for different situations.
Most often you'll see
select date('2020-01-01') as date, datetime(2020, 1, 1, 0, 0, 0) as datetime, timestamp('2020-01-01T00:00:00.000Z') as timestamp, time(0, 0, 0) as time
There are many ways to convert between these data types, often a source of confusion. Here are the common categories of conversion functions.
The CAST function is an explicit instruction to the database to interpret a value as the given type (as opposed to implicit type conversions which commonly occur, for example, when using the SQLite database).
select -- string -> date cast('2020-01-01' as date) as date, -- string -> datetime, cast('2020-01-01' as datetime) as datetime, -- string -> timestamp, cast('2020-01-01' as timestamp) as timestamp, -- string -> datetime -> date. Note that the time part has disappeared 😟 cast(cast('2020-01-01 00:00:00' as datetime) as date) as date_2,
This function has different names in different databases, e.g. PARSE / PARSE_DATE / CONVERT / DATE_PARSE. These functions all have something in common though - they take a string and return some form of date / time type.
Some also require the explicit date/time format string too - note how the following all return the same date:
select parse_date('%Y-%m-%d', '2020-01-01') as iso_date, parse_date('%d-%m-%Y', '01-01-2020') as uk_date, parse_date('%d/%m/%Y', '01/01/2020') as ill_allow_it, parse_date('%d💩%m💩%Y', '01💩01💩2020') as just_no,
Dates and times are most commonly represented as a number in the form of unix time - the number of seconds (or milliseconds) since the start of 1970. Again, these conversion functions have a variety of different names, such as TIMESTAMP_SECONDS / FROM_UNIXTIME / UNIX_TIMESTAMP.
It is important to know what format the arguments need to take (e.g. seconds / milliseconds / days), and what format the output takes (e.g. date / timestamp).
select -- This is the number of seconds between the start of 1970 and 2020 -- This function happens to return a timestamp timestamp_seconds(1577836800) as timestamp, -- This is the number of days between the start of 1970 and 2020 -- ... and this function returns a date date_from_unix_date(18262) as date,
This is often the opposite of the process above - converting a date/time object into a human-readable string. These functions will have different names (e.g. TO_CHAR / DATE_FORMAT / FORMAT) but have one thing in common - you'll have to provide a format string to tell the database exactly how to format the date/time.
Some common parts of format strings include:
with my_date as ( select date('2020-01-01') as date, ) select format_date('%Y', date) as year_only, format_date('%Y-%m', date) as year_month, format_date('%b-%d', date) as month_day, format_date('%B %e (💩)', date) as oh_do_grow_up, from my_date
Dates and times represent a single point in time, so directly adding or subtracting two dates doesn't really make sense. Instead, you should think of adding or subtracting intervals
The function names you may see include DATE_ADD / DATETIME_ADD / TIMESTAMP_ADD / DATEADD / ADDDATE / DATE_SUB / DATESUB / SUBDATE etc.
with my_date as ( select date('2020-01-01') as date, ) select -- This is special BigQuery syntax for intervals date_add(date, interval 1 day) as tomorrow, date_sub(date, interval 1 day) as yesterday, date_add(date, interval 10 year) as all_our_troubles_seemed_so_far_away, from my_date
When grouping by a date/time column in a SQL query, it is often appropriate to round the date/time to the nearest day/month/week etc - there tend not to be that many events occurring in the same millisecond!
There are multiple ways to do this:
Truncation rounds date/times down to the nearest 'datetime part' that you specify. These functions often have 'TRUNC' in the name, for example DATE_TRUNC / DATETIME_TRUNC / TRUNC.
Note in the following that the output timestamp retains the same precision to the millisecond, but has been rounded down.
select timestamp_trunc('2007-06-05T04:03:02.100Z', year) as to_year, timestamp_trunc('2007-06-05T04:03:02.100Z', month) as to_month, timestamp_trunc('2007-06-05T04:03:02.100Z', day) as to_day,
Extraction takes a part of a date/time and returns it as a simple number which has no notion of coming from a date/time object. For this reason it is best to leave any use of the EXTRACT / DATEPART function until the end of your analysis.
with my_date as ( select date('2020-02-01') as date, ) select extract(year from date) as year, extract(month from date) as month, extract(day from date) as day, from my_date