Transform Your Data Using SQL. Turning Excel Power users Into Data Experts.

Introduction

Development of the open-sourced tool dbt began in 2016 with the purpose of helping data teams manage transformation. dbt has become a popular tool among data engineers and analysts, with a thriving community of 20,000+ users and contributors. Data collection and analysis expertise could be looked at as ‘before dbt’ and ‘after dbt’.

As ELT (Extract, Load, Transform) becomes increasingly popular, combined with the benefits of cloud computing and affordable storage, on-demand data transformation systems are becoming more prevalent. Tools like Fivetran enable data to be efficiently warehoused, facilitating the transformation of raw data into valuable decision-making insights.

What is dbt?

Before understanding how dbt works, you need to understand what dbt is.

dbt is short for data build tools. It is a SQL-first transformation workflow allowing teams to collaborate and deploy analytics code with speed. dbt turns SQL from an analytical tool to more of a code-like or engineered tool.

In other words, dbt transforms your data into usable information and creates a new standard for data transformation.

How does dbt work?

Now that you have an idea of what dbt is, here is how it works:

Sources Data Extraction
This is the first step, taking data from the source. dbt can read data from many different sources.
Data Transformation
Here is where dbt does its magic. Writing in SQL code, data engineers transform data into a usable form.
Data Loading
Once the data is transformed, dbt will load everything into the data warehouse. dbt has the option to load into many types of warehouses, including Databricks, Redshift, Snowflake and Bigquery. This third step is important because data loading ensures that transformed data is available for analysis.
Scheduling
The final step. Where dbt can be set to run on a regular basis, such as once a day or every hour. Scheduling ensures data is always up-to-date and available for analysis, and your team will have peace of mind that data is always ready.

What does dbt’s foundation consist of?

As mentioned, dbt works inside your data ecosystem to present info in a readable manner. For your marketing team, dbt would pull data on customer acquisition costs across each platform. Then give your team an accurate cost that is displayed in Snowflake, Databricks, BigQuery, AWS Redshift, etc.

dbt follows best practices for development. The three main components that make it possible are: Version Control and CI/CD, Test and Document, and Develop.

Version Control and CI/CD are the best practices for using dbt. Version control helps keep everyone on track with changes to the code. CI/CD helps you with automated testing and deployment. Both ensure your code is always up to date and you can roll back changes if needed.
Test and Document is the second part of the best practices when using dbt. Testing catches errors before they make their way into the production environment. While documentation helps you track what all the code does and how it works. With dbt, users can write tests and documentation in the same SQL file as your code making everything easier to organize.
Develop is the third part of best practices for using dbt to develop your code in a collaborative manner. dbt makes it easy to work with others on the same codebase by using Git and other version control tools. This allows your team to work on different parts of the code at the same time—without worrying about conflicts or errors.

Apart from these, there are three more components that deserve to be taken a look at:

Modularization: Dbt encourages you to break down your data pipeline into modular pieces that can be reused and tested independently. This makes it easier to identify and fix issues, and it can help ensure that changes made to one part of the pipeline don’t impact other parts of the pipeline.
Documentation: Dbt allows you to document your data pipeline using Markdown, which makes it easy to share information about your pipeline with others. By documenting your pipeline, you can ensure that everyone on your team understands how data is processed and how to use your data pipeline effectively.
Data validation: Dbt allows you to write tests that validate your data as it moves through your pipeline. You can use tests to check for things like missing or duplicated data, incorrect data types, and values that fall outside of expected ranges. By using tests to validate your data, you can catch errors early and ensure that your data is accurate and consistent.

How does dbt promote community involvement?

Community is a major cornerstone for dbt, with four major ways to interact and assist other developers on their projects. It uses dbt Slack, Forums, GitHub, and Community events to bring everyone together.

dbt Slack allows your team to chat in real-time with over 35,000 passionate analytics engineers, through multiple channels. Countless value is added with just a small network of professionals. The forums on dbt have hundreds of threads with detailed support on issues others have encountered.

Since dbt is based on an open-source ecosystem based on SQL and Python, lots of useful code is shared on GitHub. All this means your organization can save time by integrating commonly used macros and models that are proven to work.

Use Cases

One of the largest industries, Airlines, has a massive amount of data to contend with. For Jetblue, that means bottlenecks in their previous centralized data teams. They wanted their team members to democratize ownership of the data that they are experts in.

JetBlue managed to move from central ownership to a shared collaboration over data. dbt allowed the airline company to improve reporting, with 99% uptime for data warehouse and pipelines. Most importantly it was done with a $0 increase in total cost. The migration process took 3 months from 26 data sources and 1200 models on dbt.

Their software stack is now complete, with Snowflake, Fivetran, Azure Blob Storage, Azure Data Factory, and dbt Cloud. The improved software stack allowed JetBlue to redesign its entire approach to data hence establishing a new foundation for data analysts.

Pricing

When it comes to pricing, dbt is a rather cost-effective solution. Broken down into three tiers: Developer, Team and Enterprise.

The Developer option is perfect for one person, your data engineer. The cost is $0 for one seat with all the baseline features. This includes Job Scheduling and reporting, Unlimited daily runs, Logging and Alerts, and GitHub support for one project.

Team offers more features at $100 per developer seat per month. More advanced features like Query SQL Proxy, browsing metrics in external tools, and dbt Cloud API access.

Enterprise has all the features and supports the Developer and Team. When you are in need of custom deployment and increased security, this is the path to take. Price reflects the premium support and custom deployments. As well as the unlimited number of projects.

Conclusion

As can be seen, data transformation tools are important in the decision-making process. Creating a data platform that is easy to use, dbt is the cost-effective solution to transform raw data into useful insights. Your data engineering and analyst team will become more streamlined as a result.

How can I learn more?

This article is part of a greater series centred around the technologies and themes found in the first edition of the TechRadar by Devoteam . To read further into these topics, please download the TechRadar by Devoteam

Want to know more about dbt?

Check out our TechRadar by Devoteam to see what our experts say about its viability in the market.

Go to TechRadar by Devoteam e-book