Compare vector index types — HNSW, IVF, PQ, LSH — and explain how to choose the right vector database given scale, latency, filtering, and cost requirements.

Learn how to choose the right vector index (HNSW, IVF, PQ, LSH) and vector database for a production RAG system. Covers tradeoffs, filtering, and scale.

Vector Database Design and Index Selection - AI Interview Question

Why This Is Asked

Choosing a vector DB is a real production decision. Interviewers want to see that you understand what's happening inside the index — not just that you've used Pinecone or Chroma — and that you can make reasoned tradeoffs given a scenario.

Key Concepts to Cover

Exact vs. approximate nearest neighbor (ANN) — when exactness matters
Index types — HNSW, IVF, PQ, LSH and their tradeoffs
Filtering — pre-filter vs. post-filter and why it's hard
Scale — how index choice changes at 1M vs. 1B vectors
Hosted vs. self-managed — Pinecone/Weaviate/Qdrant vs. pgvector vs. FAISS

How to Approach This

1. Exact vs. Approximate Search

Exact k-NN is O(n·d) per query (n = corpus size, d = dimensions) — fine for tens of thousands of vectors, unusable at millions. ANN trades a small accuracy loss (recall@10 typically 95-99%) for orders-of-magnitude speed gains. For most RAG use cases, ANN recall is sufficient.

2. Index Types

HNSW (Hierarchical Navigable Small World)

Graph-based index; builds a multi-layer proximity graph
Best recall/latency tradeoff for most RAG workloads
Memory-resident — the whole graph lives in RAM
Weakness: high memory usage (4-8 bytes × dimensions × num_vectors)
Default choice for production RAG at moderate scale

IVF (Inverted File Index)

Clusters vectors at index time; at query time, searches only the nearest clusters
Lower memory than HNSW; can be combined with PQ for disk storage
Weakness: recall degrades if query falls near cluster boundary; requires training step
Good for very large corpora where HNSW memory usage is prohibitive

PQ (Product Quantization)

Compresses vectors by splitting dimensions into sub-vectors and quantizing each
Massive memory savings (10-25x) at the cost of recall
Usually used as IVF+PQ (Faiss IndexIVFPQ) for billion-scale search
Not ideal for small corpora where full precision is cheap

LSH (Locality-Sensitive Hashing)

Hashes similar vectors to the same bucket with high probability
Theoretically elegant but in practice HNSW dominates for dense vectors
Still useful for Jaccard/Hamming similarity (sparse or binary vectors)

3. The Filtering Problem

Metadata filtering (e.g., "only search documents from user_id = 123") is harder than it looks:

Post-filter: run ANN search, then discard results that don't match filter → recall degrades severely if the filter is selective
Pre-filter: restrict the search space to matching vectors first → can break ANN index structure, degrades to brute force
Best approach: purpose-built vector DBs (Qdrant, Weaviate, Pinecone) implement filtered HNSW or segment-based filtering that maintains recall; pgvector naively post-filters

For high-selectivity filters (e.g., per-user document isolation), this is a key architectural consideration.

4. Choosing a Vector Database: Decision Framework

| Requirement | Recommendation | |---|---| | Simple prototype, small corpus | pgvector (already in Postgres) | | Production RAG, < 50M vectors | Qdrant or Weaviate (self-hosted) or Pinecone | | Need strong metadata filtering | Qdrant (best-in-class filtered search) | | Billion-scale, cost-sensitive | FAISS (IVF+PQ) self-managed | | Need full-text + vector hybrid | Weaviate or Elasticsearch with dense retrieval | | Minimal ops overhead, managed | Pinecone |

5. Similarity Metrics

Cosine similarity: angle between vectors; most common for text embeddings; scale-invariant
Dot product: faster, but scale-sensitive; used when embeddings are normalized (equivalent to cosine)
L2 (Euclidean): sensitive to vector magnitude; less common for text but standard in some CV applications

Match the metric to how your embedding model was trained — most sentence transformers are trained with cosine.

Common Follow-ups

"How does clustering reduce the search space, and when does it fail?" IVF assigns each vector to a cluster centroid at index time. At query time, it searches only the top nprobe nearest clusters. It fails when query vectors fall near cluster boundaries — the relevant neighbors may be in an adjacent cluster that wasn't probed. Mitigation: increase nprobe (trades recall for speed) or use HNSW.
"How would you handle vector index updates in a production system?" HNSW supports incremental inserts but not efficient deletes (requires rebuilding affected nodes). For high-churn corpora, segment-based approaches (write new segments, merge periodically) or managed databases (Pinecone, Qdrant) that handle this internally are preferable.
"What's the difference between a vector index, a vector DB, and a vector plugin?"
- Vector index (FAISS, HNSW lib): just the search index, no storage layer
- Vector DB (Qdrant, Weaviate, Pinecone): full database with storage, metadata, filtering, APIs
- Vector plugin (pgvector, Redis VSS): adds vector search to an existing DB; convenient but less optimized for pure-vector workloads

How Do You Choose a Vector Index and Vector Database for a RAG System?

Why This Is Asked

Key Concepts to Cover

How to Approach This

1. Exact vs. Approximate Search

2. Index Types

3. The Filtering Problem

4. Choosing a Vector Database: Decision Framework

5. Similarity Metrics

Common Follow-ups

Related Questions

Design a RAG Pipeline from Scratch

How Do Vector Embeddings Work, and How Do You Choose the Right Embedding Model?

Design a Hybrid Search System Combining Semantic and Keyword Search

Prep for the full interview loop