When dealing with categories it is common to encounter data that may be under such format :
id
toppings
1
salad;tomatoes;onions
2
cheese;bacon
3
null
Output
Ideally, the structure we would like to have to enable data analysis on such data format would be the following :
id
toppings
1
salad
1
tomatoes
1
onions
2
cheese
2
bacon
Example query
Let's see an example in order to achieve this result :
-- this is our fake databaseWITHdatabaseAS (SELECT1AS id,"salad;tomatoes;onions"AS toppingsUNION ALLSELECT2AS id,"cheese;bacon"AS toppingsUNION ALLSELECT3AS id,nullAS toppings )-- this is the output of the model-- 1. we start by selecting our two columns "id" and "toppings"-- 2. then we split the toppings using the ; delimiter character-- this will generatate an array-- https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#split-- 3. Then we unnest the array, and flatten it using unnest and cross join-- this will give us the expected output format-- https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#flattening_arraysSELECT id, toppingsFROMdatabaseCROSS JOIN UNNEST(SPLIT(toppings, ";")) as toppings
withdatabaseas (select1as id,'salad;tomatoes;onions'as toppingsunion allselect2as id,'cheese;bacon'as toppingsunion allselect3as id,nullas toppings )select d.id, f.value::string as toppingsfromdatabase d,lateral flatten(input=>split(d.toppings, ';')) f