Why use dbt-GA4?
The dbt-GA4 package takes the raw tables from the native Google Analytics 4 export to BigQuery and turns them in to report-ready tables.
By using the package, you will unlock a whole host of benefits.
Easier to query models
The raw GA4 tables exported to BigQuery include data structures that require intermediate knowledge of BigQuery-specific SQL. The dbt-GA4 package flattens these structures making it so that you do not need to know any BigQuery-specific SQL.
Even if you already know how to query those structures, querying the tables built by the dbt-GA4 package requires fewer key-strokes and is less error prone.
The flat data model is more accessible than the raw export allowing more people to use your data.
Wider data visualization tool support
The flat data structure is not just easier to work with, but it also supports more data visualization tools.
Most data visualization tools that support BigQuery can not query BigQuery Record
field types.
It is difficult and sometimes buggy to query BigQuery Repeated Record
fields in Looker Studio, and Looker uses special syntax to query those fields. These are both Google owned data visualization products.
If you want to use pretty much any other product, or keep your options open to use different products, then you will need flattened tables like the ones that we create in the dbt-GA4 package.
Save time building models
I estimate that the dbt-GA4 package saves me about 40-50 hours of implementation time over building GA4 tables from scratch per project.
The package completely handles the session and channel logic and adds last, non-direct attribution. These two features alone should give you at least 10 hours of time savings. If your dbt and BigQuery skills are not very strong, they will probably give you 20 plus hours of time-savings.
The package also provides pre-built models for standard and recommended events, configuration settings for custom parameters and query parameter filtering, and many more features that will save you time getting your GA4 production data up and running.
Improve data quality
As I write this, the package has 110 forks and 261 stars on GitHub. You do not need to fork the package in order to use it or even to customize it and most people who use the package do not star the GitHub repository.
With the addition of multi-site, it seems likely that the package is used in tens of thousands of dbt projects.
That is a lot of eyes on the package which means that data quality issues get found and fixed quickly.
Things like augmenting the ga_session_id
parameter, which is not a unique session identifier, with other data to make it unique, or Google attributing google / cpc
to google / organic
in the raw data has already been fixed in the package.
You do not even need to know that we have fixed these things in order to benefit from these fixes and there are many more issues and edge cases that have been surfaced and fixed by the developer community.
As one of the original developers, I can attest that we are seeing fewer and less-serious new issues now that we are approaching two years since the package’s inception.
When you use the package, you will benefit from all of this attention in both large and subtle ways with the corollary benefits of greater trust in the data from end users that comes with better data quality.
Faster and Cheaper to Query
The dbt-GA4 package includes partitioned models which makes it cheaper to query. There are several other things that the package does to improve query performance, but partitioning is the most important one.
Analytics warehouses like BigQuery are designed to get whole columns of data so you can create metrics from that data. Because they are optimized for columns, they get all of the data in a column.
However, querying what could be several years worth of data just to get the number of page views yesterday is wasteful. It can also take a lot of time if your table is large enough.
Partitioning splits up the table so that you only need to query the relevant partition.
The source data is partitioned daily and dbt-GA4 also partitions the relevant tables by day. Because most people want web analytics in batches demarcated by day, like the last month of data or the last seven days of data, and usually these are further grouped by day, like a graph of visits in the last month by day, partitioning by day is almost always the most efficient way to partition analytics data.
The result is faster queries that cost less data.
Benefits of DBT-GA4
The dbt-GA4 package provides a solid base for getting value from the raw GA4 export to BigQuery.
It will save you both time and costs during set-up and operation while delivering other advantages like wider tool support and better data quality that in turn helps your data users get more out of your data.