Data Types
BigQuery supports several data types, some of which are standard (e.g. STRING, INT64), and others that are more complex (e.g. ARRAY, STRUCT).
In general though they easily fall into the following categories or Supertypes:
SELECT cast('12' as INT64) int
NUMERIC
&BIGNUMERIC
Integers are always whole numbers; they have no decimals or fractional components.
- Range:
-9,223,372,036,854,775,808
to9,223,372,036,854,775,807
Properties
- ✅ Can be NULL
- ✅ Can be used in an ORDER BY clause
- ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
- ✅ Can be compared to other
INT64
s - CAST to:
BOOL
,INT64
,NUMERIC
,BIGNUMERIC
,FLOAT64
,STRING
- Coercion To*:
NUMERIC
,BIGNUMERIC
,FLOAT64
Floating point numbers are approximate numeric values with decimal/fractional components.
float |
---|
12.44 |
- There are non-numeric values to consider:
NaN
and+/-inf
- The order of values from least to greatest is:
- Negative numbers
- Positive numbers
Properties
- ✅ Can be NULL
- ✅ Can be used in an ORDER BY clause
- ✅ Can be used in a GROUP BY , DISTINCT, but NOT PARTITION BY
- ✅ Can be compared to other
FLOAT64
s - CAST to:
INT64
,NUMERIC
,BIGNUMERIC
,FLOAT64
,STRING
- Coercion To*: N/A
NUMERIC
and BIGNUMERIC
Decimals are numeric values with fixed precision and scale. Precision = the number of digits, and scale = the number of these digits appear after the decimal point.
numeric | bignumeric |
---|---|
1240000000000000000 | 124000000000000000000000000000000000000 |
- Often used for financial calculations
- Precision:
38
- Min :
-9.9999999999999999999999999999999999999E+29
- Max:
9.9999999999999999999999999999999999999E+29
- BIGNUMERIC
- Precision:
76.76
(the 77th digit is partial) - Min:
-5.7896044618658097711785492504343953926634992332820282019728792003956564819968E+38
- Max:
5.7896044618658097711785492504343953926634992332820282019728792003956564819968E+38
Properties
- ✅ Can be NULL
- ✅ Can be used in an ORDER BY clause
- ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
- ✅ Can be compared to other
NUMERICS
s andBIGNUMERIC
s - CAST to:
INT64
,NUMERIC
,BIGNUMERIC
,FLOAT64
,STRING
- Coercion To*:
BIGNUMERIC
,FLOAT64
Booleans are True or False
boolean |
---|
true |
- Represented by keywords: TRUE and FALSE
- They are case insensitive
- Boolean values are sorted from least to greatest:
Properties
- ✅ Can be NULL
- ✅ Can be used in an ORDER BY clause
- ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
- ✅ Can be compared to other
BOOL
s - CAST to:
INT64
,BOOL
,STRING
- Coercion To*: N/A
Variable-length text (unicode) data
- Uses UTF-8 encoding
- All STRING functions operate on Unicode characters rather than bytes
- Most STRING functions are also defined on BYTES
- Casting from STRING-> BYTES does UTF-8 encoding and decoding
- BYTES -> STRING will return in error if BYTES are not valid UTF-8
Properties
- ✅ Can be NULL
- ✅ Can be used in an ORDER BY clause
- ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
- ✅ Can be compared to other
STRING
s - CAST to:
BOOL
,INT64
,NUMERIC
,BIGNUMERIC
,FLOAT64
,STRING
,BYTES
,DATE
,DATETIME
,TIME
,TIMESTAMP
- Coercion To*: N/A
Variable-length binary data
bytes |
---|
104,105 |
- Most STRING functions are also defined on BYTES
- Casting from STRING-> BYTES does UTF-8 encoding and decoding
- BYTES -> STRING will return in error if BYTES are not valid UTF-8
Properties
- ✅ Can be NULL
- ✅ Can be used in an ORDER BY clause
- ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
- ✅ Can be compared to other
BYTE
s - CAST to:
STRING
,BYTES
- Coercion To*: N/A
Represents a logical calendar date (e.g. 2020-01-03)
date |
---|
2020-08-24 |
- Independent of time zone.
- Represents a calendar day, not a specific 24 hour window.
- Range:
0001-01-01
to9999-12-31
. - Format:
YYYY:[M]M:[D]D
or4-digit year : one or two digit month : one or two digit day
Properties
- ✅ Can be NULL
- ✅ Can be used in an ORDER BY clause
- ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
- ✅ Can be compared to other
DATE
s - CAST to:
STRING
,DATE
,DATETIME
,TIMESTAMP
- Coercion To*: N/A
Represents a logical calendar datetime (e.g. 2020-01-03 12:04:22)
datetime |
---|
2020-08-24T13:11:05 |
- Independent of time zone.
- Represents a date and time as they would be displayed on a calendar or clock, not a specific 24 hour window.
- Range:
0001-01-01 00:00:00
to9999-12-31 23:59:59.999999
- Format:
YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.DDDDDD]]
YYYY
: Four-digit year[M]M
: One or two digit month[D]D
: One or two digit day( |T)
: A space or a `T` separator[H]H
: One or two digit hour (valid values from 00 to 23)[M]M
: One or two digit minutes (valid values from 00 to 59)[S]S
: One or two digit seconds (valid values from 00 to 59)[.DDDDDD]
: Up to six fractional digits (microsecond precision)
Properties
- ✅ Can be NULL
- ✅ Can be used in an ORDER BY clause
- ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
- ✅ Can be compared to other
DATETIME
s - CAST to:
STRING
,DATE
,DATETIME
,TIMESTAMP
,TIME
- Coercion To*: N/A
Represents an absolute point in time (e.g. 2020-01-03 12:04:22 UTC)
timestamp |
---|
2020-08-24T08:11:05.000Z |
- Independent of time zone.
- Represents a date and time as they would be displayed on a calendar or clock, not a specific 24 hour window.
- Range:
0001-01-01 00:00:00
to9999-12-31 23:59:59.999999 UTC
- Format:
YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.DDDDDD]][time zone]
YYYY
: Four-digit year[M]M
: One or two digit month[D]D
: One or two digit day( |T)
: A space or a `T` separator[H]H
: One or two digit hour (valid values from 00 to 23)[M]M
: One or two digit minutes (valid values from 00 to 59)[S]S
: One or two digit seconds (valid values from 00 to 59)[.DDDDDD]
: Up to six fractional digits (microsecond precision)[time zone]
: String representing the time zone. When a time zone is not explicitly specified, the default time zone, UTC, is used. See the time zones section for details.
Properties
- ✅ Can be NULL
- ✅ Can be used in an ORDER BY clause
- ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
- ✅ Can be compared to other
TIMESTAMP
s - CAST to:
STRING
,DATE
,DATETIME
,TIMESTAMP
,TIME
- Coercion To*: N/A
Represents a time as might be displayed on a watch (e.g. 10:14:00)
time |
---|
13:11:05 |
- Independent of a specific date.
- Independent of a timezone.
- Range:
00:00:00
to23:59:59.99999
. - Format:
[H]H:[M]M:[S]S[.DDDDDD]
[H]H
: One or two digit hour (valid values from 00 to 23)[M]M
: One or two digit minutes (valid values from 00 to 59)[S]S
: One or two digit seconds (valid values from 00 to 59)[.DDDDDD]
: Up to six fractional digits (microsecond precision)
Properties
- ✅ Can be NULL
- ✅ Can be used in an ORDER BY clause
- ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
- ✅ Can be compared to other
TIME
's - CAST to:
STRING
,TIME
- Coercion To*: N/A
Ordered list of 0+ elements of non-ARRAY type
array_ |
---|
1,2,3 |
- ARRAYs of ARRAYs are not allowed (use STRUCT instead)
- There are currently 2 limitations with respect to NULLs and ARRAYs:
- BigQuery raises an error if a query result contains ARRAYs which contain
NULL
elements, although such ARRAYs can be used inside the query. - BigQuery translates
NULL
into an empty ARRAY in the query result, although inside the queryNULL
and empty ARRAYs are two distinct values. - Declaring an ARRAY type:
ARRAY
, where the type of elements in the ARRAY are contained inside the<
and>
. ARRAY
= Simple ARRAY of integers.ARRAY>
: ARRAY of STRUCTs that contain 2 integers.
Properties
- 🚫 Cannot be NULL
- 🚫 Cannot be used in an ORDER BY clause
- 🚫 Cannot be used in a GROUP BY , DISTINCT, PARTITION BY
- 🚫 Can be compared to other
ARRAY
s - CAST to:
ARRAY
- Coercion To*: N/A
Container of ordered fields, each with a type (required) and field name (optional).
struct_ |
---|
[object Object] |
- Declaring a STRUCT type:
STRUCT
, where the type of elements in the STRUCT are contained inside the<
and>
. STRUCT
= Simple STRUCT of integers.STRUCT>
: A STRUCT with a nested STRUCT namedx
inside it. The STRUCTx
has two fields,y
andz
, both of which are 64-bit integers.- Constructing a STRUCT:
(expr1, expr2 [, ... ])
- There must be at least 2 expressions specified, otherwise it's indistinguishable from the element that is defined
- If column names are used(e.g.
(x, x+y)
), the data types are derived from the column types - Comparing STRUCTs:
- You can use any of the following operators:
- Not Equal (
!=
or<>
) - These compare pairwise in ordinal order so you cannot be sure that you're comparing the same fields. To do that, compare the fields directly.
Properties
- ✅ Can be NULL
- 🚫 Cannot be used in an ORDER BY clause
- 🚫 Cannot be used in a GROUP BY , DISTINCT, PARTITION BY
- 🟡 LIMITED comparisons to other
STRUCT
s - CAST to:
STRUCT
- Coercion To*: N/A
A collection of points, lines, and polygons, which is represented as a point set or a subset of the surface of the Earth.
geography |
---|
POINT(1 1) |
- Based on the OGC Simple Features specification (SFS).
- A
GEOGRAPHY
is the result of, or an argument to, a Geography Function. - To compare GEOGRAPHY values, use
ST_Equals
Properties
- ✅ Can be NULL
- 🚫 Cannot be used in an ORDER BY clause
- 🚫 Cannot be used in a GROUP BY , DISTINCT, PARTITION BY
- 🟡 LIMITED comparisons to other
GEOGRAPHY
s
Coercion is an implicit type conversion, which BigQuery performs automatically between certain data types.
SELECT cast('12.44' as FLOAT64) float
SELECT CAST(123 AS STRING) AS string
SELECT cast('hi' as BYTES) bytes
SELECT CAST((1,'a') as STRUCT) struct_
SELECT CAST(ST_GEOGFROMTEXT('POINT(1 1)') as GEOGRAPHY) geography