Why This Is Asked
Pure semantic search misses exact-match queries. Pure keyword search misses semantic meaning. Hybrid search combines both and is the current state-of-the-art for production retrieval systems.
Key Concepts to Cover
- Dense retrieval — embedding-based similarity search
- Sparse retrieval — BM25/TF-IDF keyword matching
- Reciprocal Rank Fusion (RRF) — combining ranked lists from multiple retrievers
- Score normalization — making scores from different systems comparable
- Re-ranking — using a cross-encoder to re-score the merged result set
- Infrastructure — vector stores, inverted indexes, dual-store architecture
How to Approach This
1. Why Hybrid?
| Query Type | Dense (semantic) | Sparse (BM25) | |------------|-----------------|---------------| | "How do I authenticate?" | Great | May miss if docs say "login" | | "Error code 404" | May miss exact code | Exact match | | "SKU-98432 price" | Poor for IDs/codes | Exact match | | "What is the return policy?" | Understands intent | Keyword match |
Hybrid search captures both semantic intent and exact term matches.
2. High-Level Architecture
Query
Dense Retriever (embedding model + vector DB) -> top-k results
Sparse Retriever (BM25/Elasticsearch) -> top-k results
|
Score Fusion (RRF or weighted sum)
|
Merged & Ranked Results (top-k)
|
Cross-Encoder Re-ranker (optional)
|
Final Results
3. Score Fusion: Reciprocal Rank Fusion (RRF)
For each document:
RRF_score(d) = Sum of 1 / (k + rank_i(d))
Where rank_i(d) is the document's rank in each retriever's result list, and k is a constant (typically 60, from Cormack et al. 2009). Documents not appearing in a retriever's result list are assigned rank = ∞, contributing 0 to the sum. RRF is robust to score magnitude differences across retrievers — only rank positions matter, not raw scores.
4. Cross-Encoder Re-ranking
After fusion, take top-20 results and re-rank with a cross-encoder:
- Sees both query and document together (not separately embedded)
- Much more accurate than embedding similarity alone
- Too slow for all documents — only use on the shortlist
5. Infrastructure
Two indexes running in parallel:
- Vector store: Pinecone, Weaviate, Qdrant, pgvector
- Inverted index: Elasticsearch, OpenSearch, or BM25 library
Some databases support both natively (Weaviate, Qdrant, Elasticsearch with embedding plugins).
Common Follow-ups
-
"How do you tune the balance between dense and sparse retrieval?" Tune on your evaluation dataset by sweeping weight ratios and measuring retrieval metrics. If most queries are semantic, weight dense higher. If many exact-match queries, weight sparse higher.
-
"When would you skip sparse retrieval entirely?" When your query distribution is purely semantic (no IDs or codes), or when you do not have infrastructure for a dual-store system.
-
"How does this scale to 100M documents?" Vector search uses ANN algorithms (HNSW, IVF) rather than exact search. Sparse index scales with distributed Elasticsearch. Bottleneck shifts to index update latency — discuss incremental indexing tradeoffs.