What is dimensional modeling?
Dimensional modeling is a pattern for optimizing the structure of a data warehouse around fact and dimension tables.
This pattern was developed in the 1990’s when database storage and processing was expensive but joining data was inexpensive. Dimensional modeling ensures that the warehouse is efficient to operate.
Dimensional modeling was popularized through the publishing of The Data Warehouse Toolkit in 1996 by Ralph Kimball. It is also known as Kimball modeling as a result.
The dbt-GA4 package loosely follows dimensional modeling. However, now storage and processing are also inexpensive, so you will see some deviation from the pattern for convenience’s sake.
Facts and Dimensions
Facts are like verbs, actions people take when interacting with and events that are important to your organization, and fact tables mostly contain the measures that quantify the verbs.
With GA4 data, your fact tables will be your event tables; purchase and the view in page view are both verbs and they are both important events for ecommerce sites.
A purchase
table should contain things like gross and net revenue, shipping, tax, item quantity, and time.
Dimensions are like adjectives. They describe the facts. The main characteristic of a dimension table is that it has a key that links to the fact table so that you can break down your facts by that dimension.
In dbt-GA4, the main keys are session_key
, client_key
, user_id
, and event_key
.
The session_key
, for example, lets you link session data, like session_source
and session_medium
, to your purchase and other events.
The client_key
works similarly with the cid
, client ID, set by the Google Tag in the browser and the user_id
with any custom user ID that you add to your events.
It is worth noting that Google Analytics decides how to select user ID or client ID; when using the dbt-GA4 package, you’ll need to decide when and how to select each ID when you join client or user tables to event tables.
The event_key
is used mostly to link ecommerce item data, which is another dimension, to the relevant ecommerce event.
Star Schema
You will sometimes hear dimensional modeling referred to as star schema and that comes from the pattern of multiple dimensional models linking to fact models.
In the case of dbt-GA4, you have session, client, user, and item dimensional tables linking with each event fact table giving you a four-pointed star.
The star schema image starts to break down when you realize that the central event node is just one of many events that those four dimensional tables link to making the central node fan out kind of like a diamond with internal facets.
The image works as long as we do not ask too much of it.
Data Marts
Your base fact and dimension models are the core of what data professionals would call your data mart. A data mart is like a market for data where everything is packaged nicely for the people to use it.
The dbt-GA4 package includes a number of staging models. These models are like the various manufacturing steps to get your nicely packaged data mart tables ready for consumption.
You do not want people to use these models so they are not part of your data mart. They do things like transform the user_pseudo_id
field, which seems like it is a unique ID but in fact is not and is thus not suitable for joining to other tables, into the client_key
which is a unique ID that should be used to join to other tables.
Building on Dimensional Models
As you work through the various dbt-GA4 courses, you will follow a bottom-up model where you build the core fact and dimension models and then iterate on top of them based on business requirements or needs driven by infrastructure performance.
In the advanced course, we will take a dashboard mock-up and map those requirements to your base dimensional models and build tables as needed. Then we will remap those requirements to tables optimized to serve your dashboard as fast and inexpensively as possible and build out those optimized tables.
There is also a top-down model for designing your data warehouse first popularized by Bill Inmon where you identify the key business objects and then model your data to support that structure.
A non-profit, for example, would typically want a “donor” object which would be served by a table that pulls data from various data sources like GA4 and a CRM. This table might include information like the donor’s name and address alongside metrics like number of donations, total donation amount, and number of visits to the web site.
The Inmon approach is not incompatible with dimensional modeling. Your fact and dimension tables serve as a solid base for building Inmon tables on top. If you were to take an Inmon approach, you could begin by building out your fact and dimension tables while interviewing stakeholders to decide on what the final Inmon models should look like.
The dbt-GA4 package itself mostly does not go beyond your base fact and dimension tables and, while the advanced course does teach some optimization that is handled differently in an Inmon warehouse, the optimization lessons also teach important concepts like BigQuery merge strategies so there is still value there even if you prefer the idea of Inmon models.