# Flattening categories

## Input

When dealing with categories it is common to encounter data that may be under such format :&#x20;

| id | toppings              |
| -- | --------------------- |
| 1  | salad;tomatoes;onions |
| 2  | cheese;bacon          |
| 3  | null                  |

## Output

Ideally, the structure we would like to have to enable data analysis on such data format would be the following :&#x20;

| id | toppings |
| -- | -------- |
| 1  | salad    |
| 1  | tomatoes |
| 1  | onions   |
| 2  | cheese   |
| 2  | bacon    |

\
Example query
-------------

Let's see an example in order to achieve this result :&#x20;

{% tabs %}
{% tab title="BigQuery" %}

```sql
-- this is our fake database
WITH
  database AS (
    SELECT
      1 AS id,
      "salad;tomatoes;onions" AS toppings
    UNION ALL
    SELECT
      2 AS id,
      "cheese;bacon" AS toppings
    UNION ALL
    SELECT
      3 AS id,
      null AS toppings
  )
  
-- this is the output of the model
-- 1. we start by selecting our two columns "id" and "toppings"

-- 2. then we split the toppings using the ; delimiter character
--    this will generatate an array
--    https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#split

-- 3. Then we unnest the array, and flatten it using unnest and cross join
--    this will give us the expected output format
--    https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#flattening_arrays
SELECT
  id,
  toppings
FROM
  database
  CROSS JOIN UNNEST(SPLIT(toppings, ";")) as toppings
```

{% endtab %}

{% tab title="Snowflake" %}

```sql
with
  database as (
    select
      1 as id,
      'salad;tomatoes;onions' as toppings
    union all
    select
      2 as id,
      'cheese;bacon' as toppings
    union all
    select
      3 as id,
      null as toppings
  )
select d.id, f.value::string as toppings
from database d,
lateral flatten
(input=>split(d.toppings, ';')) f
```

{% endtab %}
{% endtabs %}
