6 March 2024 | 4 minutes of reading time
In data engineering, transforming raw data into actionable insights is essential for organizations looking to gain a competitive edge. This process relies on effective data management and the collaboration between data engineers and analytic engineers. They work together to organize data using the Medallion Architecture, which has bronze, silver, and gold layers. It helps simplify complex data management processes and improve data quality.
Data Work Roles Defined:
In modern data warehouses, we distinguish different layers in which data is processed, enriched and aggregated. We apply different naming conventions to the different layers in a data warehouse to effectively differentiate these layers and to keep databases structured.
The bronze layer is where raw data is first processed. In the bronze layer, data is stored in its untouched form. The data model is simple at this stage.
In dbt, the bronze data refers to the source tables as defined in the source.yml file. This raw data often requires preprocessing, such as using SELECT statements to limit the dataset or UNION statements to merge information, before downstream transformations can occur. The naming convention for these tables includes prefixes like stg_. This shows that these are staging models for the initial data transformation.
This layer organizes data in a structured way, moving through a multi-hop architecture with validated and enriched transformations and aggregations. It includes:
Intermediate Models
These models act as a bridge between the raw data in the bronze layer and the refined models in the silver or gold layer.
Intermediate models are named int_ to show that they process and normalize data further. This prefix helps in identifying models that are crucial for transforming data for in-depth analysis and reporting.
Marts Models
Marts models are tailored for analytics support business intelligence and decision-making.
Marts models have names starting with mrt_ or mart_, showing they have business logic and are made for creating valuable insights. These models are essential for creating data products that deliver strategic value to the organization.
In the Gold Layer, data models reach their peak refinement, fully optimized for decision-making in terms of efficiency and effectiveness. Here, data models are highly sophisticated, designed to provide direct value to business users and decision-makers. The naming conventions for the gold layer underscore the data’s readiness for high-level analytics and reporting.
We have three common options we’d like to discuss
The first option is a naming convention based on departments. This can help benefit the adoption by the business. And prefixing with department specific codes helps clarify the purpose. Examples:
Another option is to prefix with fct_ for fact tables or dim_ for dimension tables. This follows the Kimball methodology, which emphasizes distinguishing between transactional data and dimensional context. This shows how they assist with analyzing and reporting. It makes it easier for users and analysts to use the best data models in the dbt project.
Finally, we encounter the situation where marts from the silver layer, prefixed with mrt_, are exceptionally well-constructed and can be directly utilized in activation or reporting tools. This realization opens the door to a nuanced strategy: blending custom naming conventions with the strategic use of silver marts in the gold layer. The decision to either copy these marts, create views, or directly integrate them into activation tools without any duplication presents a complex choice, each with its own set of benefits and challenges.
We must carefully navigate this complexity. It involves evaluating the impact on data governance, performance, and accessibility. This process may be challenging, but it will help you to design a data architecture that aligns with your strategic goals.
So, we arrive at the following conclusion
To optimize the gold layer, we present three distinct options. Initially, department-specific codes like sal_ for Sales or fin_ for Finance provide clear business alignment and user clarity. Next, the Kimball methodology’s fct_ for facts and dim_ for dimensions introduces a structured approach for analytical depth. Finally, leveraging mrt_ prefixed silver marts directly in reporting tools achieves streamlined efficiency. All these approaches can work; of course, it depends on your specific needs and goals.
Using clear and consistent names in dbt projects helps with identifying and organizing data. It also improves data governance and makes operations more efficient. Teams can make sure they follow standards by using the same language for data models. This helps with finding data, following data rules, and meeting business goals.
Moreover, these conventions streamline the maintenance and updating of data models. Naming models systematically helps teams understand dependencies, changes, and manage data assets more easily. This helps team members collaborate effectively. It also speeds up development.
Organizations can manage the process of turning data into valuable analytics-ready information by using structured practices and data analytics engineers. This helps them effectively utilize data in BI-tools, data activation tools and machine learning models. This allows organizations to make better decisions based on the information gathered.
The Medallion Architecture, with its emphasis on bronze, silver, and gold layers, provides a clear framework for organizing data and increasing data quality. It ensures that it properly validates, enriches, and transforms the data into actionable insights. This method makes data transformation easier and follows modern data principles, helping teams create good data products quickly.