SUBSTR function. Definition, syntax, examples and common errors using BigQuery Standard SQL. The substring allows you to extract a section of a longer string in BigQuery.
The SUBSTR function allows you to extract a section of a longer string in BigQuery.
| location | state |
|---|---|
| New York, NY | NY |
| Miami, FL | FL |
The query above gets the last 2 characters of the location string.
SUBSTR works with STRING or BYTES data typesposition argument is 1-based, so the first character is at position 1-1 to indicate the last character positionlength < 0 then the function returns an errorSUBSTR(value, position[, length])What if we have a series of email addresses: abc@gmail.com, xy@yahoo.com, etc. and we want to find the domain (e.g. gmail.com, yahoo.com), we can use SUBSTR in combination with STRPOS and LENGTH to dynamically extract everything after the @.
| domain | |
|---|---|
| abc@gmail.com | gmail.com |
| xy@yahoo.com | yahoo.com |
Alternatively, you can use REGEXP_EXTRACT to achieve the same result:
SELECT
location,
SUBSTR(location, (-2), 2) AS state
FROM
(
SELECT
'New York, NY' AS location
UNION ALL
( SELECT
'Miami, FL' AS location)
) AS table_2| domain | |
|---|---|
| abc@gmail.com | gmail.com |
| xy@yahoo.com | yahoo.com |
It's rare SUBSTR returns an error unless you call it with the wrong data types (e.g. pass a STRING to the position parameter), but it's often that you don't get what you want.
The best thing to do here is to either use REGEXP functions, or start get your query working with 1-2 example rows before trying to apply it to the entire column.
SELECT
email,
SUBSTR(email, (STRPOS(email, '@') + 1), (LENGTH(email) - STRPOS(email, '@'))) AS domain
FROM
(
SELECT
'abc@gmail.com' AS email
UNION ALL
( SELECT
'xy@yahoo.com' AS email)
) AS table_2SELECT
email,
REGEXP_EXTRACT(email, r'@([a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)') AS domain
FROM
(
SELECT
'abc@gmail.com' AS email
UNION ALL
( SELECT
'xy@yahoo.com' AS email)
) AS table_2