Intermediate2 min read

Design a Real-Time Event Ingestion Pipeline

Design a pipeline to ingest millions of user events per second in real time, make them available for analytics within 30 seconds, and guarantee no data loss.

Asked at:

Prep for the full interview loop

Know the concepts. Now prove it. Practice GenAI, Coding, System Design, and AI/ML Design interviews with an AI that tells you exactly where you fell short.

Start a mock interview

Why This Is Asked

Real-time event pipelines are the heartbeat of modern data platforms. This question tests whether you can reason about high-throughput ingestion, stream processing semantics, and the trade-offs between latency, cost, and reliability.

Key Concepts to Cover

  • Ingestion layer — Kafka, Kinesis, or Pub/Sub as the durable event bus
  • Stream processing — Flink, Spark Streaming, or Kafka Streams for transformation
  • Exactly-once semantics — idempotent consumers, transactional writes
  • Schema evolution — Avro/Protobuf with a schema registry
  • Backpressure — what happens when consumers fall behind producers
  • Monitoring — consumer lag, end-to-end latency SLAs

How to Approach This

1. Clarify Requirements

  • What event volume? (millions/second vs thousands/second changes architecture significantly)
  • What latency SLA? (sub-second vs 30 seconds vs minutes)
  • Who are the downstream consumers? (real-time dashboards, ML features, analytics queries)
  • What durability guarantees? (at-most-once, at-least-once, exactly-once)

2. High-Level Architecture

Producers → Kafka (event bus) → Stream Processor → Hot Store (Redis/DynamoDB)
                                                 → Cold Store (S3/GCS)
                                                 → OLAP (ClickHouse/BigQuery)

3. Exactly-Once Semantics

  • Producers write with idempotent keys
  • Kafka transactions ensure atomic writes across topics
  • Consumers checkpoint offsets only after successful downstream writes
  • Deduplication at the sink using event IDs if exactly-once isn't end-to-end

4. Schema Evolution

  • Avro or Protobuf schemas registered in a Schema Registry (Confluent)
  • Backward-compatible changes: add optional fields
  • Breaking changes: new topic + migration job
  • Consumer reads schema version from message header

Common Follow-ups

  1. "What happens if your stream processor falls behind by 10 minutes?" Consumer lag alerting, auto-scaling, dead letter queue for processing failures.

  2. "How do you handle late-arriving events?" Watermarks in Flink define how late you'll tolerate; late events go to a side output or a separate late-data table.

  3. "How do you test this pipeline?" Unit tests for transformation logic, integration tests with embedded Kafka, end-to-end latency regression tests in staging.

Related Questions

Prep for the full interview loop

Know the concepts. Now prove it. Practice GenAI, Coding, System Design, and AI/ML Design interviews with an AI that tells you exactly where you fell short.

Start a mock interview