A dashboard query scanning a 10-billion-row events table takes 8 minutes to run. Walk through how you'd diagnose and fix it.

Learn how to optimize a slow reporting query on a massive table. Covers partitioning, clustering, execution plans, query rewrites, and materialized views.

Optimize a Slow SQL Reporting Query - Data Engineering Interview

Why This Is Asked

Query optimization is a core data engineering skill. This question tests whether you can reason about execution plans, storage layout, and query rewrites — and explain your thinking clearly.

Key Concepts to Cover

Partition pruning — query should filter on the partition key to skip irrelevant partitions
Clustering/sorting — co-locating related rows reduces bytes scanned
Execution plan reading — identify full table scans, hash joins, sort operations
Query rewrites — avoid SELECT *, push filters early, replace correlated subqueries
Materialized views / pre-aggregation — for expensive repeated aggregations
Approximate aggregations — HyperLogLog for cardinality, sampling for trends

How to Approach This

1. Diagnose First

Before optimizing, look at the execution plan:

Is it doing a full scan when it should be partition-pruning?
Are there implicit type casts preventing index use?
Is the join order causing a large intermediate result set?

2. Partitioning Fix

If the table is partitioned by event_date and the query doesn't filter on it:

-- Bad: scans all partitions
SELECT user_id, COUNT(*) FROM events WHERE event_type = 'purchase'

-- Good: partition pruning
SELECT user_id, COUNT(*) FROM events
WHERE event_date >= '2026-01-01' AND event_type = 'purchase'

3. Pre-Aggregation

If the same query runs 1000 times/day, materialize it:

-- Materialized view refreshed nightly
CREATE MATERIALIZED VIEW daily_purchase_counts AS
SELECT event_date, user_id, COUNT(*) as purchase_count
FROM events WHERE event_type = 'purchase'
GROUP BY event_date, user_id;

Common Follow-ups

"The query uses a window function over the whole table — how do you speed that up?" Partition the window function by a high-cardinality column (user_id), push a date filter before the window.
"What if you can't change the query — only the table structure?" Add a clustering key on the most common filter columns; create a pre-aggregated summary table.

Optimize a Slow Reporting Query on a 10-Billion-Row Table