Intermediate2 min read

Design an A/B Test for a Checkout Flow Change

The product team wants to test a redesigned checkout flow they expect will increase conversion rate by 5%. Design the experiment.

Asked at:Amazon

Prep for the full interview loop

Know the concepts. Now prove it. Practice GenAI, Coding, System Design, and AI/ML Design interviews with an AI that tells you exactly where you fell short.

Start a mock interview

Why This Is Asked

Experiment design is the core competency of a product data analyst. This question tests statistical rigor, business judgment, and whether you know what can go wrong in practice.

Key Concepts to Cover

  • Randomization unit — user-level vs session-level vs device-level
  • Primary metric — checkout conversion rate (completed purchases / checkout initiations)
  • Sample size calculation — minimum detectable effect, statistical power, significance level
  • Novelty effect — users behave differently when they see something new, regardless of quality
  • Interference effects — shared inventory, referral loops that violate independence assumption

How to Approach This

1. Define the Randomization Unit

Randomize at the user level, not the session level. A user seeing both variants in different sessions contaminates the experiment. For logged-out users, randomize by a persistent cookie.

2. Define the Primary Metric

Checkout conversion rate = completed purchases / checkout initiations. This is the direct measure of the change's effect.

Add guardrails:

  • Average order value (the new flow should not reduce basket size)
  • Support contact rate (a confusing flow may increase help requests)
  • Load time p95 (new UI must not regress performance)

3. Calculate Sample Size

For a 5% relative lift on a 3% baseline conversion rate, 80% power, α=0.05:

  • Minimum detectable effect (absolute): 3% × 5% = 0.15 percentage points
  • Required sample per variant: ~180,000 users (use a sample size calculator)
  • At 10,000 checkouts/day, that's ~36 days per variant = ~72 days total
  • If this is too long, consider raising α to 0.1 or targeting a larger effect

4. Decide on Duration

Run for at least 2 full weeks to capture weekly seasonality. Don't stop early just because results look significant — peeking inflates false positive rate.

Common Follow-ups

  1. "How do you handle users who abandon checkout in the middle and come back later?" Count them in the denominator at checkout initiation. Their return is already captured in your randomization since they're in the same variant.

  2. "The experiment ran for 2 weeks and p-value is 0.04. Should you ship?" Check: Did you pre-register the hypothesis? Did you look multiple times during the run? Is the effect size practically meaningful? 0.04 is borderline — look for consistent results across segments before shipping.

Related Questions

Prep for the full interview loop

Know the concepts. Now prove it. Practice GenAI, Coding, System Design, and AI/ML Design interviews with an AI that tells you exactly where you fell short.

Start a mock interview