Design a search system that combines dense vector search with sparse keyword search — outperforming either approach alone through intelligent score fusion.

Learn how to design a hybrid search system combining dense semantic search with sparse keyword search. Covers BM25, embeddings, re-ranking, and score fusion.

Design a Hybrid Search System - Semantic + Keyword

Why This Is Asked

Pure semantic search misses exact-match queries. Pure keyword search misses semantic meaning. Hybrid search combines both and is the current state-of-the-art for production retrieval systems.

Key Concepts to Cover

Dense retrieval — embedding-based similarity search
Sparse retrieval — BM25/TF-IDF keyword matching
Reciprocal Rank Fusion (RRF) — combining ranked lists from multiple retrievers
Score normalization — making scores from different systems comparable
Re-ranking — using a cross-encoder to re-score the merged result set
Infrastructure — vector stores, inverted indexes, dual-store architecture

| Query Type | Dense (semantic) | Sparse (BM25) | |------------|-----------------|---------------| | "How do I authenticate?" | Great | May miss if docs say "login" | | "Error code 404" | May miss exact code | Exact match | | "SKU-98432 price" | Poor for IDs/codes | Exact match | | "What is the return policy?" | Understands intent | Keyword match |

Hybrid search captures both semantic intent and exact term matches.

2. High-Level Architecture

Query
  Dense Retriever (embedding model + vector DB) -> top-k results
  Sparse Retriever (BM25/Elasticsearch) -> top-k results
              |
        Score Fusion (RRF or weighted sum)
              |
        Merged & Ranked Results (top-k)
              |
        Cross-Encoder Re-ranker (optional)
              |
        Final Results

3. Score Fusion: Reciprocal Rank Fusion (RRF)

For each document:

RRF_score(d) = Sum of 1 / (k + rank_i(d))

Where rank_i(d) is the document's rank in each retriever's result list, and k is a constant (typically 60, from Cormack et al. 2009). Documents not appearing in a retriever's result list are assigned rank = ∞, contributing 0 to the sum. RRF is robust to score magnitude differences across retrievers — only rank positions matter, not raw scores.

4. Cross-Encoder Re-ranking

After fusion, take top-20 results and re-rank with a cross-encoder:

Sees both query and document together (not separately embedded)
Much more accurate than embedding similarity alone
Too slow for all documents — only use on the shortlist

5. Infrastructure

Two indexes running in parallel:

Vector store: Pinecone, Weaviate, Qdrant, pgvector
Inverted index: Elasticsearch, OpenSearch, or BM25 library

Some databases support both natively (Weaviate, Qdrant, Elasticsearch with embedding plugins).

Common Follow-ups

"How do you tune the balance between dense and sparse retrieval?" Tune on your evaluation dataset by sweeping weight ratios and measuring retrieval metrics. If most queries are semantic, weight dense higher. If many exact-match queries, weight sparse higher.
"When would you skip sparse retrieval entirely?" When your query distribution is purely semantic (no IDs or codes), or when you do not have infrastructure for a dual-store system.
"How does this scale to 100M documents?" Vector search uses ANN algorithms (HNSW, IVF) rather than exact search. Sparse index scales with distributed Elasticsearch. Bottleneck shifts to index update latency — discuss incremental indexing tradeoffs.

Design a Hybrid Search System Combining Semantic and Keyword Search

Why This Is Asked

Key Concepts to Cover

How to Approach This

1. Why Hybrid?

2. High-Level Architecture

3. Score Fusion: Reciprocal Rank Fusion (RRF)

4. Cross-Encoder Re-ranking

5. Infrastructure

Common Follow-ups

Related Questions

Design a RAG Pipeline from Scratch

How Would You Evaluate Retrieval Quality in a RAG System?

Design a Document Q&A System for a Large Corpus

Prep for the full interview loop