DBT Data Modeling Project

A simple dimensional modeling exercise using DBT and a dataset from an Enterprise Resource Planning system (ERP) for a fictitious company.

The goal was to find a normalized business dataset that I hadn't seen before and create logical fact and dimension tables.

Click here to see code repository: Link to repository

Background

DBT

“is an open-source command line tool that helps analysts and engineers transform data in their warehouse more effectively”

It addresses the ‘T’ in ‘ELT’ by providing an easy way to transform data within a warehouse using only SQL scripts. DBT makes it easy to define data quality tests and documentation as well as tracking revisions of transformations scripts.

Dimensional Modeling

A data modeling technique introduced by Ralph Kimball. The goal of dimensional modeling is turning raw data into logical fact and dimension tables that represent the business. Making it more intuitive for analysts as well as making queries faster (when compared to querying 3NF models).

Project

Source Data

The data comes from an Enterprise Resource Planning (ERP) system for a fictitious company called Wide World Importers.

There are a total of 4 schemas, 29 tables, and is completely normalized.

List of tables:

applicationpurchasingsaleswarehouse
citiespurchase_order_linesbuying_groupscolors
countriespurchase_orderscustomer_categoriespackage_types
delivery_methodssupplier_categoriescustomer_transactionsstock_groups
payment_methodssupplier_transactionscustomersstock_item_holdings
peoplesuppliersinvoice_linesstock_item_stock_groups
state_provincesinvoicesstock_item_transactions
system_parametersorder_linesstock_items
transaction_typesorders
special_deals


The Entity Relationship Diagram (ERD) for the entire dataset is very complex and so below is a just a sample the 'Sales' schema (taken from here).

dbt_sales_erd

The Business Requirement

The business scenario that was chosen was that the Sales team wanted insight into how much revenue was being made throughout the year and have the ability to break it down by customer, supplier, product, as well as knowing which sales person should get credit for the sale.

The Implementation

Based on the above business requirements the following dimension tables were identified:

Sample of dim_customers:

dbt_dim_customers_example

For the fact table, the ‘order_lines’ table provided the most granular details for each order such as which product, the quantity, and the unit-price. The ‘orders’ table contained the neccesary header info and thus was joined to the details table.

Sample of fct_sales_orders:

dbt_fct_sales_orders_example

The Result

The newly created tables now formed a ‘star-schema’ for the sales transactions to be easily analyzed.

dbt_erd_sales_final

Dashboard

A simple PowerBI dashboard built using the new tables:

dbt_sales_pbi

Repository

github.com/cowboychicken/dbt_wideworldimports