Event schema
Introduction
The event schema is a data modeling paradigm to make data modeling simple, fast, and reliable.
The activity schema aims for these design goals
only one definition for each concept - know exactly where to find the right data
simple definitions - no thousand-line SQL queries to model data
analyses can be run anywhere — analyses can be shared and reused across companies with vastly different data.
incremental updates - no more rebuilding data models on every update
At its core an event schema consists of transforming raw tables into a single, time series table called an event schema.
Conceptual Overview
An event schema models an entity taking a sequence of events over time. For example, a customer (entity) viewed a web page (event).
An event schema is implemented as a time series table with (optionally) a set of enrichment tables for additional metadata.
Each row in the table represents a single activity taken by the entity at a point in time. In other words, it has an event identifier, an entity identifier, a timestamp, and some related enrichment tables. The specific structure is covered in the next section.
Entities
Entities are the subject, or actor in the data. Every activity in the activity schema is an action taken by a specific entity with a unique identifier
The most common entity is a customer, but there can be other types as well. For example, a bike-sharing company could also have a bike entity to analyze things like repair frequency, mileage over time, etc.
An activity schema table will only have one entity type and is typically named <entity>_stream
. For example, an activity schema implementation for customers would be customer_stream
, and one for bikes would be bike_stream
Events
Events are specific actions taken by an entity. For example, if the entity is a customer an activity could be 'opened an email' or 'submitted a support ticket'. Each row in a table modeled by an event schema is a single instance of an activity taken by a specific entity.
Events are intended to model real business processes. Taken together, the series of event for a given entity would represent all relevant interactions that entity has had with a company.
Metadata
Every event has metadata associated with it beyond the customer, the activity, and the timestamp. A 'viewed page' activity will want to store the actual page viewed, while an 'invoice paid' activity will store the total amount paid.
The stream tables of an event schema have a finite number of metadata columns that can be associated with each event.
Structure
The primary benefit of an event schema is that all data is in a consistent format. This means that it requires tables with specific names, types, and numbers of columns.
Tables
An event schema uses a single table to store all events. No new tables are created as the data evolves — any future events will create rows in the same table.
There are three types of tables in an Activity Schema
event stream (one per event schema)
entity table (optional - one per event schema)
enrichment tables (optional - as needed)
The activity stream table, (typically called <entity> - Stream
) stores all events, their timestamps, the entity's identifier, and some metadata.
The entity table (typically called <entities> - Entity
) stores metadata for each entity. For example, a Customers - Entity
table can store date of birth, first and last name, etc.
The enrichment tables (typically called <enrichments> - Enrichment
) stores metadata for each event types. For example, a Orders - Enrichment
table can store the source, status, etc.
Event Stream
The event stream table is the primary table in an activity schema and is the only one required. It houses the bulk of the modeled data in the warehouse.
Column | Description | Type |
ts | Timestamp in UTC for when the activity occurred | timestamp |
entity | Globally unique identifier for the entity (ex: customer id) | string |
event | Name of the activity (ex. 'completed_order') | string |
value | Value for the activity. (ex: 1 if the customer just completed one order) | number |
Entity Table
The entity table stores an unlimited number of metadata columns. By convention it takes its name directly from the entity. For example, for an activity schema with an entity named 'customer', the table would be called Customers - Entity
. For an activity schema about bikes the table would be called Bikes - Entity
.
Enrichment Tables
Enrichment tables serve the same purpose as the entity table, but for specific activities. They allow adding any arbitrary amount of metadata to an activity.
An enrichment table is typically named after the activity it enriches, taking the structure <name> - Enrichment
An enrichment table has one required columns - enriched_object_id, which is used to join it into the event stream. From there it can have any number of additional columns of any type.
Column | Description | Type |
enriched_object_id | Identifier of the object that this row will enrich | string |
feature columns | (optional) These columns are the additional features used to enrich an activity or activities. | various |
Last updated