Why This Is Asked
Data modeling is how raw transactional data becomes a reliable foundation for analytics. This question tests whether you can define the right grain, choose between schema patterns, and handle temporal complexity (slowly changing dimensions, late-arriving data).
Key Concepts to Cover
- Grain definition — what does one row in the fact table represent?
- Star vs snowflake schema — when to denormalize for query performance
- Slowly changing dimensions (SCD) — how to track historical attribute changes
- Fact table types — transaction, snapshot, accumulating snapshot
- Customer lifetime value — requires a customer dimension with historical cohort data
How to Approach This
1. Identify the Analytical Questions
Before designing, list the key questions the schema must answer:
- Revenue by product, category, and region over time
- Customer lifetime value by acquisition cohort
- Funnel: browse → add-to-cart → checkout → purchase
2. Star Schema Design
fact_orders (grain: one row per order line item)
order_id, customer_key, product_key, date_key, quantity, revenue, discount
dim_customer (SCD Type 2)
customer_key, customer_id, name, acquisition_channel, cohort_week,
valid_from, valid_to, is_current
dim_product
product_key, product_id, name, category, subcategory, cost
dim_date
date_key, full_date, week, month, quarter, year, is_weekend
3. SCD Type 2 for Customer Dimension
When a customer changes their acquisition channel attribution or segment:
- Insert a new row with the new attributes
- Set
valid_toon the old row andvalid_fromon the new row is_current = TRUEflags the active row for simple joins- Fact rows preserve the
customer_keyat time of purchase — history is preserved
Common Follow-ups
-
"How do you handle late-arriving orders that come in after the daily ETL?" Partition fact table by
order_date, re-run the daily partition when late data arrives; downstream views use MAX(ingestion_date) to get latest. -
"A product category changes retroactively — how does your schema handle that?" SCD Type 2 preserves historical categories; use
dim_product_keyat order time, not current state. -
"How would you model a subscription product instead of one-time purchases?" Add a
fact_subscriptionssnapshot table (one row per subscription per billing period) alongside the transaction fact table.