Skip to content

dbt Data Catalog

  • Medallion Architecture

RAW → Staging → Intermediate → Marts — every transformation fully documented and tested.

  • 22 Models

9 staging views · 4 intermediate tables · 1 fact table · 5 dimensions · 3 seeds

  • 282 Automated Tests

not_null · unique · relationships · accepted_values · singular business-rule assertions

  • 8 Source Tables

1.55M raw rows ingested from Azure Blob Storage via Snowflake COPY INTO



Model Inventory

Light-touch cleaning only — 1:1 with RAW tables, no business logic applied.

Model Source Table Key Transforms
stg_olist__orders raw_orders UPPER status, cast timestamps
stg_olist__order_items raw_order_items Rename columns, cast decimals
stg_olist__customers raw_customers Standardise state codes
stg_olist__products raw_products Cast weight/dimensions
stg_olist__sellers raw_sellers Standardise state codes
stg_olist__payments raw_payments Cast amounts, validate types
stg_olist__reviews raw_reviews Deduplicate, cast timestamps
stg_olist__geolocation raw_geolocation Cast lat/long coordinates
stg_olist__category_translations raw_category_names EN/PT name mapping

Materialization: view — zero storage cost · Schema: STAGING

Business logic, joins, and aggregations — reusable building blocks for marts.

Model Purpose
int_sales__order_items_joined Joins orders + items; adds DQ flags (is_valid, is_delayed)
int_products__enriched Adds is_verified flag, fills missing category translations
int_customers__prep Computes is_repeat_customer (≥ 2 delivered orders)
int_order_reviews Deduplicates reviews to one row per order via MIN(review_score)

Materialization: table · Schema: INTERMEDIATE

Star schema (Kimball) — production-ready for Power BI and SQL consumers.

Fact Table

Model Grain Rows Materialization
fct_order_items One row per delivered order line item ~112K incremental

Dimension Tables

Model Description Materialization
dim_customers Customer attributes + is_repeat_customer flag table
dim_products Product catalog + English category translations table
dim_sellers Seller attributes + state geography table
dim_date Full calendar spine (2016–2018) with fiscal periods table
dim_rls_bridge Row-level security bridge — state → role mapping table

Schema: MARTS · Database: OLIST_ANALYTICS_DB

Pre-joined, pre-filtered tables purpose-built for Python EDA and data science workflows — avoids complex Pandas merges downstream.

Model Grain Columns Tests Purpose
obt_logistics_diagnostics One row per delivered order item 18 35 Root-cause logistics EDA — delivery delays, freight costs, review correlations

Phase 2 — Python EDA

This layer is consumed by the ecommerce-logistics-diagnostics Python notebook project, which investigates the Amazonas 66.7% delivery delay anomaly using Pandera data contracts and cohort analysis.

Materialization: table · Schema: MARTS · Database: OLIST_ANALYTICS_DB


Key Models to Explore

  • fct_order_items

Core fact table. Grain: one row per delivered order line item. Pre-calculated measures: order_total, freight_value, is_delayed, days_to_deliver. Joins to all 5 dimensions.

View in catalog

  • dim_customers

Customer dimension with customer_unique_id for cross-order identity resolution. Includes is_repeat_customer flag (≥ 2 delivered orders).

View in catalog

  • dim_products

Product catalog enriched with English category translations. Includes is_verified quality flag for products with complete dimensions and photos.

View in catalog

  • dim_date

Full calendar spine covering 2016–2018. Includes day-of-week, week, month, quarter, and year columns for Power BI time intelligence.

View in catalog


Navigating the Lineage DAG

In the catalog, click any model name → then click "See Lineage" to see its full upstream/downstream dependency graph. Use +model_name syntax in the search to select a model with all its parents highlighted.


Phase 2 — Analytics Layer

obt_logistics_diagnostics is a One Big Table purpose-built as the single data source for the Python EDA project. It pre-joins 7 upstream models so no Pandas merges are required in the notebook.

Dimension Detail
Grain One row per delivered order item (order_id + order_item_id)
Columns 18 — identifiers, timestamps, financials, geography, product specs, review score, DQ flags
Tests 35 — generic + 3 singular business-rule assertions
Upstream deps fct_order_items · stg_olist__orders · stg_olist__order_items · int_products__enriched · dim_customers · dim_sellers · int_order_reviews
Key rule WHERE order_status = 'DELIVERED' — only logistically clean records
DQ flags is_valid_logistics + is_valid_product — must filter = 1 before computing transit time stats

View in catalog Python EDA Repo