Why This Is Asked
Choosing a vector DB is a real production decision. Interviewers want to see that you understand what's happening inside the index — not just that you've used Pinecone or Chroma — and that you can make reasoned tradeoffs given a scenario.
Key Concepts to Cover
- Exact vs. approximate nearest neighbor (ANN) — when exactness matters
- Index types — HNSW, IVF, PQ, LSH and their tradeoffs
- Filtering — pre-filter vs. post-filter and why it's hard
- Scale — how index choice changes at 1M vs. 1B vectors
- Hosted vs. self-managed — Pinecone/Weaviate/Qdrant vs. pgvector vs. FAISS
How to Approach This
1. Exact vs. Approximate Search
Exact k-NN is O(n·d) per query (n = corpus size, d = dimensions) — fine for tens of thousands of vectors, unusable at millions. ANN trades a small accuracy loss (recall@10 typically 95-99%) for orders-of-magnitude speed gains. For most RAG use cases, ANN recall is sufficient.
2. Index Types
HNSW (Hierarchical Navigable Small World)
- Graph-based index; builds a multi-layer proximity graph
- Best recall/latency tradeoff for most RAG workloads
- Memory-resident — the whole graph lives in RAM
- Weakness: high memory usage (4-8 bytes × dimensions × num_vectors)
- Default choice for production RAG at moderate scale
IVF (Inverted File Index)
- Clusters vectors at index time; at query time, searches only the nearest clusters
- Lower memory than HNSW; can be combined with PQ for disk storage
- Weakness: recall degrades if query falls near cluster boundary; requires training step
- Good for very large corpora where HNSW memory usage is prohibitive
PQ (Product Quantization)
- Compresses vectors by splitting dimensions into sub-vectors and quantizing each
- Massive memory savings (10-25x) at the cost of recall
- Usually used as IVF+PQ (Faiss
IndexIVFPQ) for billion-scale search - Not ideal for small corpora where full precision is cheap
LSH (Locality-Sensitive Hashing)
- Hashes similar vectors to the same bucket with high probability
- Theoretically elegant but in practice HNSW dominates for dense vectors
- Still useful for Jaccard/Hamming similarity (sparse or binary vectors)
3. The Filtering Problem
Metadata filtering (e.g., "only search documents from user_id = 123") is harder than it looks:
- Post-filter: run ANN search, then discard results that don't match filter → recall degrades severely if the filter is selective
- Pre-filter: restrict the search space to matching vectors first → can break ANN index structure, degrades to brute force
- Best approach: purpose-built vector DBs (Qdrant, Weaviate, Pinecone) implement filtered HNSW or segment-based filtering that maintains recall; pgvector naively post-filters
For high-selectivity filters (e.g., per-user document isolation), this is a key architectural consideration.
4. Choosing a Vector Database: Decision Framework
| Requirement | Recommendation | |---|---| | Simple prototype, small corpus | pgvector (already in Postgres) | | Production RAG, < 50M vectors | Qdrant or Weaviate (self-hosted) or Pinecone | | Need strong metadata filtering | Qdrant (best-in-class filtered search) | | Billion-scale, cost-sensitive | FAISS (IVF+PQ) self-managed | | Need full-text + vector hybrid | Weaviate or Elasticsearch with dense retrieval | | Minimal ops overhead, managed | Pinecone |
5. Similarity Metrics
- Cosine similarity: angle between vectors; most common for text embeddings; scale-invariant
- Dot product: faster, but scale-sensitive; used when embeddings are normalized (equivalent to cosine)
- L2 (Euclidean): sensitive to vector magnitude; less common for text but standard in some CV applications
Match the metric to how your embedding model was trained — most sentence transformers are trained with cosine.
Common Follow-ups
-
"How does clustering reduce the search space, and when does it fail?" IVF assigns each vector to a cluster centroid at index time. At query time, it searches only the top
nprobenearest clusters. It fails when query vectors fall near cluster boundaries — the relevant neighbors may be in an adjacent cluster that wasn't probed. Mitigation: increasenprobe(trades recall for speed) or use HNSW. -
"How would you handle vector index updates in a production system?" HNSW supports incremental inserts but not efficient deletes (requires rebuilding affected nodes). For high-churn corpora, segment-based approaches (write new segments, merge periodically) or managed databases (Pinecone, Qdrant) that handle this internally are preferable.
-
"What's the difference between a vector index, a vector DB, and a vector plugin?"
- Vector index (FAISS, HNSW lib): just the search index, no storage layer
- Vector DB (Qdrant, Weaviate, Pinecone): full database with storage, metadata, filtering, APIs
- Vector plugin (pgvector, Redis VSS): adds vector search to an existing DB; convenient but less optimized for pure-vector workloads