# SQL Fanout

If you're a SQL user, some of the first SQL concepts you probably learned about were joins and aggregate functions (such as `COUNT` and `SUM`). One thing that is not always taught is how these two concepts can interact and sometimes produce incorrect results. In this article we'll discuss what to look out for, the concept of a "fanout," and why it matters to SQL writers.

This is one of the reason why there is an old wisdom that "JOINs are tricky to get right", even for experience SQL writer.

## Starting with a simple join

Let's start off with a simple example, where we'll join together a couple of tables. Our first table will show our customers' names and the number of visits each customer has made to our e-commerce website:\
&#x20;

**customer**

| customer\_id | first\_name | last\_name     | visits |
| ------------ | ----------- | -------------- | ------ |
| 1            | Jean        | de la Fontaine | 2      |
| 2            | Émile       | Zola           | 2      |
| 3            | René        | Descartes      | 4      |

\
Our second table will include all the orders that those customers have placed. You can see that each order is linked to the customer who placed it by the customer's ID.\
&#x20;

**order**

| order\_id | amount | customer\_id |
| --------- | ------ | ------------ |
| 1         | 25.00  | 1            |
| 2         | 50.00  | 1            |
| 3         | 75.00  | 2            |
| 4         | 100.00 | 3            |

\
Joining these tables together in SQL would be pretty simple:

```
SELECT
    *
FROM
    customer
LEFT JOIN order
USING (customer_id)
```

\
The result of that query would be this table:

| customer\_id | first\_name | last\_name     | visits | order\_id | amount |
| ------------ | ----------- | -------------- | ------ | --------- | ------ |
| 1            | Jean        | de la Fontaine | 2      | 1         | 25.00  |
| 1            | Jean        | de la Fontaine | 2      | 2         | 50.00  |
| 2            | Émile       | Zola           | 2      | 3         | 75.00  |
| 3            | René        | Descartes      | 4      | 4         | 100.00 |

## Aggregate functions gone bad

\
Now that we have a joined table, we need to be careful about how we use aggregate functions like `COUNT` and `SUM`.\
&#x20;

### Aggregate functions on a single table

\
Let's consider the customer table all by itself again. If we want to know the total number of customers, we can execute a simple query like this:

```sql
SELECT
    COUNT(*) as count
FROM   customer
```

SQL will count up the rows in the table as follows:\
&#x20;

**customer**

| `count`        | customer\_id | first\_name | last\_name     | visits |
| -------------- | ------------ | ----------- | -------------- | ------ |
| + 1            | 1            | Jean        | de la Fontaine | 2      |
| + 1            | 2            | Émile       | Zola           | 2      |
| + 1            | 3            | René        | Descartes      | 4      |
| **Results ⤵️** |              |             |                |        |
| `count`        | 3 (✅)        |             |                |        |

\
We'll get a count of 3, which is correct 👍

Or, if we want to know the total number of customer visits, we can execute another straightforward query like this:

```sql
SELECT
    SUM(visits) as sum
FROM
    customer
```

SQL will add up the number of visits in the table as follows:

**customer**

| customer\_id   | visits | sum | first\_name | last\_name     |
| -------------- | ------ | --- | ----------- | -------------- |
| 1              | 2      | + 2 | Jean        | de la Fontaine |
| 2              | 2      | + 2 | Émile       | Zola           |
| 3              | 4      | + 4 | René        | Descartes      |
| **Results ⤵️** |        |     |             |                |
| `sum`          | 8 (✅)  |     |             |                |

\
We'll get a result of 8, which is also correct 👍\
&#x20;

### Aggregate functions on the joined table

\
So far, so good. However, if we try to use the same aggregate functions on either of our joined tables, we'll start to see incorrect results.

Running a basic count on the joined table, we will no longer get the correct number of customers:

```sql
SELECT
    COUNT(*) as count
FROM
    customer
LEFT JOIN order
USING (customer_id)
```

SQL will count up the rows in the table as follows:

<table data-header-hidden><thead><tr><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th></tr></thead><tbody><tr><td><pre><code>count
</code></pre></td><td>customer_id</td><td>first_name</td><td>last_name</td><td>visits</td><td>order_id</td><td>amount</td><td>customer_id</td></tr><tr><td>+ 1</td><td>1</td><td>Jean</td><td>de la Fontaine</td><td>2</td><td>1</td><td>25.00</td><td>1</td></tr><tr><td>+ 1</td><td>1</td><td>Jean</td><td>de la Fontaine</td><td>2</td><td>2</td><td>50.00</td><td>1</td></tr><tr><td>+ 1</td><td>2</td><td>Émile</td><td>Zola</td><td>2</td><td>3</td><td>75.00</td><td>2</td></tr><tr><td>+ 1</td><td>3</td><td>René</td><td>Descartes</td><td>4</td><td>4</td><td>100.00</td><td>3</td></tr><tr><td><strong>Results ⤵️</strong></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr><tr><td><code>count</code></td><td>4 (❌)</td><td></td><td></td><td></td><td></td><td></td><td></td></tr></tbody></table>

\
We'll get a result of 4, even though there are really only 3 customers. You can see that Jean is counted twice.

Similarly, if we try to sum the number of visits, we will no longer get the correct result:

```sql
SELECT
    SUM(visits) as sum
FROM
    customer
LEFT JOIN order
USING (customer_id)
```

SQL will add up the number of visits in the table as follows:

<table><thead><tr><th width="176">customer_id</th><th>visits</th><th>sum</th><th>first_name</th><th>last_name</th><th>order_id</th><th>amount</th><th>customer_id</th></tr></thead><tbody><tr><td>1</td><td>2</td><td>+ 2</td><td>Jean</td><td>de la Fontaine</td><td>1</td><td>25.00</td><td>1</td></tr><tr><td>1</td><td>2</td><td>+ 2</td><td>Jean</td><td>de la Fontaine</td><td>2</td><td>50.00</td><td>1</td></tr><tr><td>2</td><td>2</td><td>+ 2</td><td>Émile</td><td>Zola</td><td>3</td><td>75.00</td><td>2</td></tr><tr><td>3</td><td>4</td><td>+ 4</td><td>René</td><td>Descartes</td><td>4</td><td>100.00</td><td>3</td></tr><tr><td><strong>Results ⤵️</strong></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr><tr><td><code>SUM</code></td><td>10 (❌)</td><td></td><td></td><td> </td><td></td><td></td><td></td></tr></tbody></table>

\
We'll get a result of 10, even though there are only 8 visits. Jean's 2 visits are added twice.\
&#x20;

## A fanout happened while you weren't looking

\
In the example we've been looking at, the primary table (customer) had only three rows. The "primary table" is the table that is in the `FROM` clause of our SQL queries. After the join, we now have 4 rows. Since the joined table has more rows than the primary table, we say that a *fanout* has occurred.

To avoid a fanout, we can write the join in the opposite order. So, instead of this:

```sql
SELECT *
FROM customer
LEFT JOIN order
USING (customer_id)
```

We would do this:

```sql
SELECT *
FROM order
LEFT JOIN customer
USING (customer_id)
```

Now our joined table looks like this:

| order\_id | amount | customer\_id | first\_name | last\_name     | visits |
| --------- | ------ | ------------ | ----------- | -------------- | ------ |
| 1         | 25.00  | 1            | Jean        | de la Fontaine | 2      |
| 2         | 50.00  | 1            | Jean        | de la Fontaine | 2      |
| 3         | 75.00  | 2            | Émile       | Zola           | 2      |
| 4         | 100.00 | 3            | René        | Descartes      | 4      |

\
The eagle-eyed observer will recognize that this is exactly like our other joined table, except for the order of the columns. That is true, and it's why most writers of SQL think of our two different joins as exactly the same.

However, when we do the join in this order, we do not have a fanout. Our original table (order) had four rows, and our joined table also has four rows, so there has been no fanout.&#x20;

{% hint style="info" %}
Herein lies a key point: To help avoid fanouts, begin your joins with the most granular table.
{% endhint %}

### Fanouts, schmanouts, who cares?

\
In both the fanout and the non-fanout cases, we still need to worry about the accuracy of aggregate functions. However, there is a subtle difference in the type of problems we're going to see:

* **No fanout** 😨 — You can trust aggregate functions on your primary table but not necessarily on your joined tables.
* **Fanout 😱** — You cannot necessarily trust aggregate functions on either your primary table or your joined tables.

To drive home this point, let's take a look at our previous examples.

### Fanout example

\
If you'll recall from the previous example, the following join resulted in a fanout, because while the customer table only had three rows, the joined table had four rows.

```sql
SELECT *
FROM customer
LEFT JOIN order
USING (customer_id)
```

| customer\_id | first\_name | last\_name     | visits | order\_id | amount |
| ------------ | ----------- | -------------- | ------ | --------- | ------ |
| 1            | Jean        | de la Fontaine | 2      | 1         | 25.00  |
| 1            | Jean        | de la Fontaine | 2      | 2         | 50.00  |
| 2            | Émile       | Zola           | 2      | 3         | 75.00  |
| 3            | René        | Descartes      | 4      | 4         | 100.00 |

\
Since we are in a fanout situation, we cannot trust that aggregate functions will work on the primary table (customer). As we saw previously, `SUM(visits)` will give us a value of 10, even though only 8 visits have actually occurred.\
&#x20;

### No fanout example

\
When we reversed the join, we did not get a fanout, **because the order table had four rows, and the joined table also had four rows.**

```sql
SELECT *
FROM order
LEFT JOIN customer
USING (customer_id)
```

| order\_id | amount | customer\_id | first\_name | last\_name     | visits |
| --------- | ------ | ------------ | ----------- | -------------- | ------ |
| 1         | 25.00  | 1            | Jean        | de la Fontaine | 2      |
| 2         | 50.00  | 1            | Jean        | de la Fontaine | 2      |
| 3         | 75.00  | 2            | Émile       | Zola           | 2      |
| 4         | 100.00 | 3            | René        | Descartes      | 4      |

\
Without any fanout, we can trust that aggregate functions will work on the primary table (order).&#x20;

For example, `SUM(amount)` will give us a value of 250.00, which is the correct amount of money collected 👍

However, we can't trust any aggregation done on the secondary table (customer), such as `SUM(visits)` = 10 (❌)\
&#x20;

## Two friendly joins, one frenemy join

\
If we want to avoid fanouts, it's important that we understand the three different types of joins.

### One-to-one (friendly)

If one row of your primary table only ever matches up with one and only one row of your joined table, you have a one-to-one join. This type of join will not result in a fanout, and aggregate functions will be accurate no matter where you use them.

**Example**: Suppose you have a person table and a DNA table. Since only one person can be matched with one DNA record and one DNA record with one person, this is a one-to-one join.

### Many-to-one / Belongs to (friendly)

If many rows of your primary table match up with the same row in your joined table, you have a many-to-one join. This type of join will also not result in a fanout, and aggregate functions will at least be accurate on the primary table.

**Example**: Suppose you have a order table and a customer table. Since many order can be made from a single customer, you have a many-to-one relationship.

### One-to-many (frenemy)

If one row of your primary table can match up with multiple rows in your joined table, you have a one-to-many join. This type of join can result in a fanout, and aggregate functions are not necessarily accurate anywhere.

**Example**: Suppose you have a customer table and an order table. Since one customer can have more than one order, this is a one-to-many join.

## A fanout witch hunt

### Understand your join type

The first, and preferred, method to check for a fanout is to understand the type of join that is occurring. One-to-one and many-to-one joins won't ever result in a fanout. However, if you know you are in a one-to-many situation, then there will always be the risk of a fanout. Even if a fanout has not already occurred, it will be a risk in the future if new rows are added.

### Count rows before and after the join

The second method you can use to check for a fanout is to query a `COUNT` before and after the join. The queries would look like this:

```
SELECT COUNT(*)
FROM   my_primary_table

SELECT    COUNT(*)
FROM      my_primary_table
LEFT JOIN my_joined_table
USING (my_join_column)
```

If the count increases between the two queries, we know that a fanout has occurred. Since we're looking for an increase, it's important that the second query use a `LEFT JOIN`. We don't want to artificially decrease the number of rows being reported just because a row in the primary table doesn't have a corresponding row in the joined table.

Unfortunately, this method cannot tell you if there is a risk of a future fanout. To know that, you need to understand the type of join that is occurring. This is lies in how defined is your business models and what is the functional relationships between your tables.

## Whaly to the rescue

\
If you're one of the lucky folks who use Whaly, we got you covered.

Thanks to:

* the type of the relationships (Has Many / Belongs To / One to One) that you defined when creating your models
* &#x20;to the Primary Keys that can help deduplicate the records in your tables

Whenever Whaly needs to generate a SQL query that will contains a fan-out, our SQL engine generator will avoid it by generating dynamics sub-queries that are using the DISTINCT keywords to avoid getting false results for your aggregates calculations.

This way, any JOINs that are generated through Explorations and Related Data configured inside it will be safe 🤗

## Quick summary

\
To summarize everything we've just covered:

* Aggregate functions like `SUM` and `COUNT` can misbehave if used against joined tables.
* If a join has a one-to-one relationship, aggregate functions will work just fine.
* If a join has a many-to-one relationship, aggregate functions will work on the primary table but might not work on the joined tables.
* If a join has a one-to-many relationship, aggregate functions may not work anywhere.
* If you're using Whaly and joining tables in your Explore, you don't need to think about any of that again 🤗 🐳


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.whaly.io/misc/sql-fanout.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
