Advanced3 min read

Design a Hybrid Search System Combining Semantic and Keyword Search

Design a search system that combines dense vector search with sparse keyword search — outperforming either approach alone through intelligent score fusion.

Also preparing for coding interviews?

Rubduck is an AI mock interviewer for DSA and coding rounds — get instant feedback on your solutions.

Daily tips, confessions & AI news. Unsubscribe anytime. Questions? [email protected]

Why This Is Asked

Pure semantic search misses exact-match queries. Pure keyword search misses semantic meaning. Hybrid search combines both and is the current state-of-the-art for production retrieval systems.

Key Concepts to Cover

  • Dense retrieval — embedding-based similarity search
  • Sparse retrieval — BM25/TF-IDF keyword matching
  • Reciprocal Rank Fusion (RRF) — combining ranked lists from multiple retrievers
  • Score normalization — making scores from different systems comparable
  • Re-ranking — using a cross-encoder to re-score the merged result set
  • Infrastructure — vector stores, inverted indexes, dual-store architecture

How to Approach This

1. Why Hybrid?

| Query Type | Dense (semantic) | Sparse (BM25) | |------------|-----------------|---------------| | "How do I authenticate?" | Great | May miss if docs say "login" | | "Error code 404" | May miss exact code | Exact match | | "SKU-98432 price" | Poor for IDs/codes | Exact match | | "What is the return policy?" | Understands intent | Keyword match |

Hybrid search captures both semantic intent and exact term matches.

2. High-Level Architecture

Query
  Dense Retriever (embedding model + vector DB) -> top-k results
  Sparse Retriever (BM25/Elasticsearch) -> top-k results
              |
        Score Fusion (RRF or weighted sum)
              |
        Merged & Ranked Results (top-k)
              |
        Cross-Encoder Re-ranker (optional)
              |
        Final Results

3. Score Fusion: Reciprocal Rank Fusion (RRF)

For each document:

RRF_score(d) = Sum of 1 / (k + rank_i(d))

Where rank_i(d) is the document's rank in each retriever's result list, and k is a constant (typically 60, from Cormack et al. 2009). Documents not appearing in a retriever's result list are assigned rank = ∞, contributing 0 to the sum. RRF is robust to score magnitude differences across retrievers — only rank positions matter, not raw scores.

4. Cross-Encoder Re-ranking

After fusion, take top-20 results and re-rank with a cross-encoder:

  • Sees both query and document together (not separately embedded)
  • Much more accurate than embedding similarity alone
  • Too slow for all documents — only use on the shortlist

5. Infrastructure

Two indexes running in parallel:

  • Vector store: Pinecone, Weaviate, Qdrant, pgvector
  • Inverted index: Elasticsearch, OpenSearch, or BM25 library

Some databases support both natively (Weaviate, Qdrant, Elasticsearch with embedding plugins).

Common Follow-ups

  1. "How do you tune the balance between dense and sparse retrieval?" Tune on your evaluation dataset by sweeping weight ratios and measuring retrieval metrics. If most queries are semantic, weight dense higher. If many exact-match queries, weight sparse higher.

  2. "When would you skip sparse retrieval entirely?" When your query distribution is purely semantic (no IDs or codes), or when you do not have infrastructure for a dual-store system.

  3. "How does this scale to 100M documents?" Vector search uses ANN algorithms (HNSW, IVF) rather than exact search. Sparse index scales with distributed Elasticsearch. Bottleneck shifts to index update latency — discuss incremental indexing tradeoffs.

Related Questions

Prep the coding round too

AI knowledge is only half the picture. Rubduck helps you nail DSA and coding interviews with an AI interviewer that gives real-time feedback.

Daily tips, confessions & AI news. Unsubscribe anytime. Questions? [email protected]