When dealing with categories it is common to encounter data that may be under such format :
id
toppings
1
salad;tomatoes;onions
2
cheese;bacon
3
null
Output
Ideally, the structure we would like to have to enable data analysis on such data format would be the following :
id
toppings
1
salad
1
tomatoes
1
onions
2
cheese
2
bacon
Example query
Let's see an example in order to achieve this result :
-- this is our fake databaseWITHdatabaseAS ( SELECT1AS id,"salad;tomatoes;onions"AS toppings UNION ALL SELECT2AS id,"cheese;bacon"AS toppings UNION ALL SELECT3AS id,nullAS toppings )-- this is the output of the model-- 1. we start by selecting our two columns "id" and "toppings"-- 2. then we split the toppings using the ; delimiter character-- this will generatate an array-- https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#split-- 3. Then we unnest the array, and flatten it using unnest and cross join-- this will give us the expected output format-- https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#flattening_arraysSELECT id, toppingsFROMdatabase CROSS JOIN UNNEST(SPLIT(toppings, ";")) as toppings
withdatabase as (select1 as id,'salad;tomatoes;onions' as toppingsunion allselect2 as id,'cheese;bacon' as toppingsunion allselect3 as id,null as toppings )select d.id, f.value::string as toppingsfromdatabase d,lateral flatten(input=>split(d.toppings, ';')) f