Why This Is Asked
Funnel analysis is one of the most common analytics tasks. This question tests whether you can translate a business question into correct, readable SQL — and explain your reasoning step by step.
Key Concepts to Cover
- Window functions —
ROW_NUMBER(),FIRST_VALUE(),LAG()for user event sequencing - Session attribution — joining acquisition source to conversion events
- Cohort grouping —
DATE_TRUNC('week', signup_date)for cohort bucketing - Conversion rate —
COUNT(DISTINCT converted_user_id) / COUNT(DISTINCT signed_up_user_id) - Step ordering — ensuring funnel steps are in chronological order per user
How to Approach This
1. Clarify the Funnel Steps
- What events define each funnel stage? (signup → email_verified → first_purchase)
- Is there a time window? (conversion must happen within 30 days of signup)
- How are users attributed? (last-touch, first-touch, or multi-touch)
2. Skeleton Query
WITH signups AS (
SELECT user_id,
DATE_TRUNC('week', event_time) AS cohort_week,
acquisition_channel
FROM events
WHERE event_type = 'signup'
),
first_purchases AS (
SELECT user_id, MIN(event_time) AS purchased_at
FROM events
WHERE event_type = 'first_purchase'
GROUP BY user_id
)
SELECT
s.cohort_week,
s.acquisition_channel,
COUNT(DISTINCT s.user_id) AS signed_up,
COUNT(DISTINCT fp.user_id) AS converted,
ROUND(
COUNT(DISTINCT fp.user_id)::numeric /
NULLIF(COUNT(DISTINCT s.user_id), 0),
4
) AS conversion_rate
FROM signups s
LEFT JOIN first_purchases fp
ON s.user_id = fp.user_id
AND fp.purchased_at <= s.cohort_week + INTERVAL '30 days'
GROUP BY 1, 2
ORDER BY 1, 2;
3. Explain the Choices
LEFT JOINkeeps users who never converted (numerator becomes NULL → 0 in COUNT DISTINCT)NULLIFprevents division by zero for cohorts with 0 signupsMIN(event_time)ensures we use the first purchase, not any purchaseDATE_TRUNC('week', ...)groups into cohort buckets for trend analysis
Common Follow-ups
-
"How would you add a middle step — email verification between signup and purchase?" Add a third CTE, chain the LEFT JOINs, add verified/signed_up as an intermediate conversion rate column.
-
"How would this query perform on 1 billion rows?" Partition
eventsbyevent_date, filter both CTEs on date ranges, cluster onuser_idfor efficient joins.