Deploying DBT to Optimize Aurai’s Data Transformation Workflow

Aurai Data Engineer Florian knows all too well; if you’re going to engineer data, then your Algebra teacher won’t be the last to require you to show your work: “Large quantities of data need records of processing steps and commands and businesses expect transparent and replicable work. Luckily, DBT, a data transformation workflow program, has developed new tools to make shortcuts that save time and energy by inventorying and optimizing the engineering steps as you create them.” Discover how this new collaborative software helps the entire data team safely contribute to production-grade data pipelines.

Create New Rules to Win the Game Faster

I play chess often; Fixed rules and worthy opponents can be bested with novel and subtle strategies. I compete with my chess club in an Amsterdam chess tournament to test my own game theories and mastery. With the right coding and software, the rules of light and sound themselves can be pushed by building a music visualizer along 5 meters of programmable LED light strips. Technology helps me play with the rules, create new rules and new tools to win the advantage.

I’ve always been incurably curious in math and computer science, and inevitably plunged into a B.A. in Artificial Intelligence at University of Amsterdam. With my degree complete I was eager to expand into the more practical dimensions of data engineering and execute it towards meaningful objectives. I was drawn to Aurai’s Data Engineering traineeship and their comprehensive and pragmatic modules and seminars that prepared me to be a true data professional. The traineeship gave me exactly the practical knowledge and know-how that I was seeking after my studies and the opportunity to apply it all working with diverse businesses and organizations.

Changing Dataset Approaches Through DBT

I’ve reached my initial goal; I have a Data Engineer interim position as a data warehouse consultant at Renewi, a leading European waste management company operating primarily in the Benelux region. Renewi brings new life and value to used materials by creating circular economies and leading in recycling and secondary material production. It’s exciting to help a company that has recycled or used for energy recovery 89% of the 14 million tons of waste they process. I work with their team at the Eindhoven office once a week, which I always look forward to because of their youthful and enthusiastic culture.

My primary objective at Renewi is implementing their new Data & Analytics platform, which will enable the company to use their data to its full potential. This means getting the platform production-ready, and development within the platform such as supplying data for the dashboard and setting it up for the analytics team. Our platform is driven by DBT, a relatively new tool that eases data engineering by changing the approach to creating your datasets. Say, for example, you have a data platform with a collection of cumulative source data from all your organization’s systems. The goal is to enable your data analysts and data scientists to work with all this data. The datasets must be built correctly to ensure their results are intelligible and actionable.

Florian: “At Renewi, our platform is driven by DBT, a relatively new tool that eases data engineering by changing the approach to creating your datasets.”

Built-In Solutions With DBT

Datasets have traditionally been created as a series of SQL queries, applying the required transformations every time to generate your datasets. This approach can get very messy very quickly: Documentation, definitions of fields, etc., and the data’s lineage requires manual upkeep to offer transparency and replicability for the business. It also requires implementing a unique testing framework or adopting an external tool. Even implementing basic version control can become chaotic when using SQL to transform the data.

DBT offers built-in solutions in their software for all those problems. You can implement their integrated tests or your own tests on data, and DBT generates documentation on every build describing all available tables, the models you build on them, and even matching dashboards with their resulting datasets. It also generates a lineage graph, detailing the entire lineage, or journey, of all your project’s data.

Florian: “DBT’s built-in solutions generate documentation on every build, integrated tests and detailed lineage graphs of all your project’s data.”

An Automated Record of Data Systems and Transformations

Your models can be implemented simply using SQL (or even Python in the latest version of DBT), and enriched with Jinja, a templating engine. It is used by many tools, like flask, to inject code into otherwise static documents. In DBT, it is used to add macros, reusable functions based on a template, to SQL queries and to reference other models within the lineage. DBT’s documentation ability provides a web page interface, which details all of the models in the lineage, providing a standard structure for adding your own documentation.

All other DBT functionalities (documentation, tests, and lineage) can be implemented using simple config files. After this, building a data system becomes a single command using the DBT CLI. Data engineers build each information structure unique to the situation and need to know what they’ve built and how the end system will maximize the most relevant data and insights. Automating a record of data systems and transformations saves time and energy better spent on innovating better data engineering solutions into practical use.

Would you like to know more about our experienced, result-oriented, and friendly professionals like Florian? Find out how our data tech wizards can strengthen your team as interim experts.