Intermediate2 min read

Design an Analytics Schema for an E-Commerce Platform

Design a data warehouse schema for an e-commerce platform that needs to answer questions about revenue by product, customer lifetime value, and cohort retention.

Asked at:Amazon

Prep for the full interview loop

Know the concepts. Now prove it. Practice GenAI, Coding, System Design, and AI/ML Design interviews with an AI that tells you exactly where you fell short.

Start a mock interview

Why This Is Asked

Data modeling is how raw transactional data becomes a reliable foundation for analytics. This question tests whether you can define the right grain, choose between schema patterns, and handle temporal complexity (slowly changing dimensions, late-arriving data).

Key Concepts to Cover

  • Grain definition — what does one row in the fact table represent?
  • Star vs snowflake schema — when to denormalize for query performance
  • Slowly changing dimensions (SCD) — how to track historical attribute changes
  • Fact table types — transaction, snapshot, accumulating snapshot
  • Customer lifetime value — requires a customer dimension with historical cohort data

How to Approach This

1. Identify the Analytical Questions

Before designing, list the key questions the schema must answer:

  • Revenue by product, category, and region over time
  • Customer lifetime value by acquisition cohort
  • Funnel: browse → add-to-cart → checkout → purchase

2. Star Schema Design

fact_orders (grain: one row per order line item)
  order_id, customer_key, product_key, date_key, quantity, revenue, discount

dim_customer (SCD Type 2)
  customer_key, customer_id, name, acquisition_channel, cohort_week,
  valid_from, valid_to, is_current

dim_product
  product_key, product_id, name, category, subcategory, cost

dim_date
  date_key, full_date, week, month, quarter, year, is_weekend

3. SCD Type 2 for Customer Dimension

When a customer changes their acquisition channel attribution or segment:

  • Insert a new row with the new attributes
  • Set valid_to on the old row and valid_from on the new row
  • is_current = TRUE flags the active row for simple joins
  • Fact rows preserve the customer_key at time of purchase — history is preserved

Common Follow-ups

  1. "How do you handle late-arriving orders that come in after the daily ETL?" Partition fact table by order_date, re-run the daily partition when late data arrives; downstream views use MAX(ingestion_date) to get latest.

  2. "A product category changes retroactively — how does your schema handle that?" SCD Type 2 preserves historical categories; use dim_product_key at order time, not current state.

  3. "How would you model a subscription product instead of one-time purchases?" Add a fact_subscriptions snapshot table (one row per subscription per billing period) alongside the transaction fact table.

Related Questions

Prep for the full interview loop

Know the concepts. Now prove it. Practice GenAI, Coding, System Design, and AI/ML Design interviews with an AI that tells you exactly where you fell short.

Start a mock interview