Event schema

Introduction

The event schema is a data modeling paradigm to make data modeling simple, fast, and reliable.

The activity schema aims for these design goals

  • only one definition for each concept - know exactly where to find the right data

  • simple definitions - no thousand-line SQL queries to model data

  • analyses can be run anywhere — analyses can be shared and reused across companies with vastly different data.

  • incremental updates - no more rebuilding data models on every update

At its core an event schema consists of transforming raw tables into a single, time series table called an event schema.

Conceptual Overview

An event schema models an entity taking a sequence of events over time. For example, a customer (entity) viewed a web page (event).

An event schema is implemented as a time series table with (optionally) a set of enrichment tables for additional metadata.

Each row in the table represents a single activity taken by the entity at a point in time. In other words, it has an event identifier, an entity identifier, a timestamp, and some related enrichment tables. The specific structure is covered in the next section.

Entities

Entities are the subject, or actor in the data. Every activity in the activity schema is an action taken by a specific entity with a unique identifier

The most common entity is a customer, but there can be other types as well. For example, a bike-sharing company could also have a bike entity to analyze things like repair frequency, mileage over time, etc.

An activity schema table will only have one entity type and is typically named <entity>_stream. For example, an activity schema implementation for customers would be customer_stream, and one for bikes would be bike_stream

Events

Events are specific actions taken by an entity. For example, if the entity is a customer an activity could be 'opened an email' or 'submitted a support ticket'. Each row in a table modeled by an event schema is a single instance of an activity taken by a specific entity.

Events are intended to model real business processes. Taken together, the series of event for a given entity would represent all relevant interactions that entity has had with a company.

Metadata

Every event has metadata associated with it beyond the customer, the activity, and the timestamp. A 'viewed page' activity will want to store the actual page viewed, while an 'invoice paid' activity will store the total amount paid.

The stream tables of an event schema have a finite number of metadata columns that can be associated with each event.

Structure

The primary benefit of an event schema is that all data is in a consistent format. This means that it requires tables with specific names, types, and numbers of columns.

Tables

An event schema uses a single table to store all events. No new tables are created as the data evolves — any future events will create rows in the same table.

There are three types of tables in an Activity Schema

  1. event stream (one per event schema)

  2. entity table (optional - one per event schema)

  3. enrichment tables (optional - as needed)

The activity stream table, (typically called <entity> - Stream) stores all events, their timestamps, the entity's identifier, and some metadata.

The entity table (typically called <entities> - Entity) stores metadata for each entity. For example, a Customers - Entity table can store date of birth, first and last name, etc.

The enrichment tables (typically called <enrichments> - Enrichment) stores metadata for each event types. For example, a Orders - Enrichment table can store the source, status, etc.

Event Stream

The event stream table is the primary table in an activity schema and is the only one required. It houses the bulk of the modeled data in the warehouse.

Column

Description

Type

ts

Timestamp in UTC for when the activity occurred

timestamp

entity

Globally unique identifier for the entity (ex: customer id)

string

event

Name of the activity (ex. 'completed_order')

string

value

Value for the activity. (ex: 1 if the customer just completed one order)

number

Entity Table

The entity table stores an unlimited number of metadata columns. By convention it takes its name directly from the entity. For example, for an activity schema with an entity named 'customer', the table would be called Customers - Entity. For an activity schema about bikes the table would be called Bikes - Entity.

Enrichment Tables

Enrichment tables serve the same purpose as the entity table, but for specific activities. They allow adding any arbitrary amount of metadata to an activity.

An enrichment table is typically named after the activity it enriches, taking the structure <name> - Enrichment

An enrichment table has one required columns - enriched_object_id, which is used to join it into the event stream. From there it can have any number of additional columns of any type.

Column

Description

Type

enriched_object_id

Identifier of the object that this row will enrich

string

feature columns

(optional) These columns are the additional features used to enrich an activity or activities.

various

Last updated