The SUBSTR function allows you to extract a section of a longer string in BigQuery.
SUBSTR(value, position[, length])
SELECT location, SUBSTR(location, (-2), 2) AS state FROM ( SELECT 'New York, NY' AS location UNION ALL ( SELECT 'Miami, FL' AS location) ) AS table_2
What if we have a series of email addresses: [email protected], [email protected], etc. and we want to find the domain (e.g. gmail.com, yahoo.com), we can use SUBSTR in combination with STRPOS and LENGTH to dynamically extract everything after the @.
SELECT email, SUBSTR(email, (STRPOS(email, '@') + 1), (LENGTH(email) - STRPOS(email, '@'))) AS domain FROM ( SELECT '[email protected]' AS email UNION ALL ( SELECT '[email protected]' AS email) ) AS table_2
Alternatively, you can use REGEXP_EXTRACT to achieve the same result:
SELECT email, REGEXP_EXTRACT(email, r'@([a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)') AS domain FROM ( SELECT '[email protected]' AS email UNION ALL ( SELECT '[email protected]' AS email) ) AS table_2
It's rare SUBSTR returns an error unless you call it with the wrong data types (e.g. pass a STRING to the position parameter), but it's often that you don't get what you want.
The best thing to do here is to either use REGEXP functions, or start get your query working with 1-2 example rows before trying to apply it to the entire column.