StepChange is a consultancy that works with you to modernize your data infrastructure. Often this involves optimizing cloud spend.
Recently there has been a lot of conversation about dbt. This article hopes to clarify what it is and how people are dealing with their price increases.
What is DBT?
Data build tool (dbt) is an open source transformation tool that enables data analysts and engineers to transform, test, and document data in a data warehouse through SQL more efficiently.
An ecosystem exists around dbt, with dbt Core being its foundational open source component and dbt Cloud being a closed source paid service offered by Dbt labs, the creators of dbt.
Companies are migrating away from dbt Cloud due to Dbt Labs dramatically increasing its price. One place where they are finding savings is migrating from dbt Cloud to self hosting dbt Core with open source infrastructure on hyperscalers like AWS.
What does dbt Core do?
Dbt Core creates a layer around SQL and Jinja, a python based templating engine. This layer codifies software best practices such as version control, testing, repeatability and modularity. All of this functionality is open source.
It allows users to write modular SQL queries, which dbt then runs in the correct order with the ability to track dependencies and outcomes. Dbt turns SQL from something that's purely analytical into something that can be engineered more like code.
What does dbt Cloud do?
Dbt Cloud builds on dbt Core by providing a hosted service with additional features like a web-based IDE, automated job scheduling, and advanced team collaboration tools.
Why is dbt popular?
In recent years, dbt Core has exploded in popularity. This is because it's easy to install and setup for your project and it gives you a framework for a better analytics process out of the box.
Ben Rogojan has an in depth video that dives into the history of dbt and what it's used for today. He argues that one of dbt's attractive features, from a team perspective, is the simplification of the data transformation process.
Instead of requiring your team to master a variety of tools like SSIS or Azure Data Factory for data transformation tasks, dbt allows for centralized management by a single individual. The rest of the team can focus on proficiency in SQL.
How expensive is dbt Cloud now?
In 2023, Dbt Labs raised prices on dbt Cloud, their most popular product, specifically:
- They added consumption based pricing to their plans. They charge charge by the number of successful model builds per month above a given threshold.
- They also changed features offered at different plan tiers to drive users towards more expensive enterprise based plans that cost more.
Paradime has an excellent video that contrasts the old and new pricing plans in detail. To give but one example from his video, in the old pricing model you paid a fixed price of $100 per seat per month for a team plan. For a team of 8 this would work out to $800 per month. In the new model, with consumption based pricing, assuming 15,000 models built per month, this amount would double to $1600 ($100 per seat + (models built - 15,000) * $0.01).
In this consumption based pricing model, dbt Cloud is charging based on the number of SQL transformations you run through dbt Cloud on a monthly basis.
If you are curious to learn more about the differences between dbt Core and dbt Cloud that might justify this price, Datacoves has a great article digging into this functionality, which you can read here.
How has the dbt community reacted to this?
According to a poll of over 700 people on r/dataengineering, the vast majority of small, mid-sized and big companies do not pay for dbt Cloud.
A number of comments within this post discuss transitioning from dbt Cloud to self hosting dbt Core. Here are some example comments around this theme:
“Took a day to get dbt-core set up for my team. My workflow feels very efficient through VSCode. dbt Power User extension is awesome.
Initially spun up the project through dbt-cloud. Was easy to use but money saved is always nice.“
And
"Same with us - we bumped off after the cloud price changes earlier this year around March. Self-hosting this wasn’t that hard, but setting up this stuff the first time is always annoying. The Cloud pricing change was a good forcing function to make this happen"
How do I migrate from dbt Cloud to dbt Core?
If you are curious about how to use dbt Core without dbt Cloud in production on other clouds such as AWS, this post from r/dataengineering has some good information and example setups.
If you are looking for an overview of gotcha's you might run into by self-hosting dbt Core or just general handy tips, this post provides a good list of things to consider before migrating.
A number of tutorials started appearing on sites like YouTube discussing how to run dbt models without using dbt Cloud. A good example of this is George Yates video entitled, how to run unlimited free dbt models without dbt cloud Using Cosmos.
Astronomer-cosmos is an open source package for Apache Airflow that allows you to run your dbt Core projects as Apache Airflow DAGs and Task Groups. It allows you to run and manage your dbt models via Airflow and its UI which is open source and easy to setup on numerous clouds.
Self-hosting Astronomer-cosmos a lot cheaper than dbt Cloud and because it's open source, you have the freedom to move clouds without fear of being locked in.
Alternatively you can contact StepChange and we can work with you to explore options and optimize your data infrastructure spend.
Final Word
If your data infrastructure relies on a mix of SaaS products and you're looking to cut costs without losing functionality, while not being locked in, we're here to assist. Our expertise in optimizing data workflows can help you save on your infrastructure bills while maintaining performance. Contact us to explore how we can tailor a cost-effective solution for you.