Advanced3 min read

Design an AI-Powered Code Review System

Design a system that uses LLMs to automatically review pull requests — identifying bugs, style issues, and suggesting improvements at scale.

Also preparing for coding interviews?

Rubduck is an AI mock interviewer for DSA and coding rounds — get instant feedback on your solutions.

Daily tips, confessions & AI news. Unsubscribe anytime. Questions? [email protected]

Why This Is Asked

Interviewers use this question to test your ability to design a complex AI system that integrates with existing engineering workflows. They want to see if you can reason about: how to represent code for LLMs, how to handle the latency constraints of PR review, how to evaluate AI-generated comments for quality, and how to build feedback loops that improve the system over time.

Key Concepts to Cover

  • Code representation — how to prepare code diffs for LLM context windows
  • Context limits — handling large PRs that exceed token limits
  • Comment generation — structured output for actionable review comments
  • Latency requirements — async vs. real-time review pipelines
  • Quality evaluation — how to measure if AI comments are helpful
  • Feedback loops — using developer reactions to improve the model
  • Integration — GitHub/GitLab webhooks, PR comment APIs
  • Cost management — controlling LLM API costs at scale

How to Approach This

1. Clarify Requirements

Start by asking:

  • What types of review? (bugs, style, security, performance, all?)
  • What scale? (10 PRs/day vs. 10,000 PRs/day changes the architecture)
  • Latency requirements? (blocking the PR merge vs. async background comments)
  • What languages/frameworks? (affects context and prompting)
  • Human-in-the-loop or fully automated comments?

2. High-Level Architecture

GitHub Webhook → Queue → Diff Processor → Context Builder → LLM Service → Comment Poster
                                                                    ↓
                                                            Feedback Collector → Fine-tuning Pipeline

3. Deep Dive: Handling Large Diffs

Most real PRs exceed LLM context windows. Strategies:

  • File-level batching: process each changed file independently
  • Chunk with overlap: split large files into overlapping chunks
  • Prioritization: rank files by change size, complexity, or risk; review the top N
  • Summary pass: first pass to summarize the PR, second pass for detailed review of flagged areas

4. Deep Dive: Prompt Design

Structure prompts to produce actionable, structured output:

You are a senior engineer reviewing a pull request.
Given this code diff, identify issues and suggest improvements.

Output as JSON:
{
  "comments": [
    {
      "file": "string",
      "line": number,
      "severity": "bug" | "suggestion" | "nitpick",
      "comment": "string",
      "suggestion": "string (optional)"
    }
  ]
}

Code diff:
{diff}

5. Evaluation and Feedback Loop

This is what separates strong candidates. Discuss:

  • Track thumbs up/down on AI comments
  • Monitor comment acceptance rate (did the developer act on it?)
  • Use accepted comments as positive training examples
  • Use dismissed comments as negative examples
  • Build an eval set of PRs with known issues for regression testing

6. Cost and Latency

  • Cache review results for identical diffs (content hash)
  • Use smaller/faster models for initial triage, larger models for complex files
  • Process async — don't block PR creation, post comments within minutes
  • Rate limit per repository to control costs

Common Follow-ups

  1. "How would you handle false positives — AI comments that are wrong or unhelpful?" Discuss confidence thresholds, filtering comments below a quality score, requiring human approval for high-severity findings, and surfacing uncertainty in the comment text itself.

  2. "How do you keep the system up to date as coding practices evolve?" Cover periodic retraining on recent accepted comments, prompt updates as new patterns emerge, and A/B testing new prompt versions against a held-out eval set.

  3. "What security concerns do you have about sending code to an external LLM API?" Address data privacy (on-prem models for sensitive code), API data retention policies, secrets scanning before sending diffs, and redacting sensitive values from diffs.

Related Questions

Prep the coding round too

AI knowledge is only half the picture. Rubduck helps you nail DSA and coding interviews with an AI interviewer that gives real-time feedback.

Daily tips, confessions & AI news. Unsubscribe anytime. Questions? [email protected]