Advanced4 min read

How Do You Choose a Vector Index and Vector Database for a RAG System?

Compare vector index types — HNSW, IVF, PQ, LSH — and explain how to choose the right vector database given scale, latency, filtering, and cost requirements.

Prep for the full interview loop

Know the concepts. Now prove it. Practice GenAI, Coding, System Design, and AI/ML Design interviews with an AI that tells you exactly where you fell short.

Start a mock interview

Why This Is Asked

Choosing a vector DB is a real production decision. Interviewers want to see that you understand what's happening inside the index — not just that you've used Pinecone or Chroma — and that you can make reasoned tradeoffs given a scenario.

Key Concepts to Cover

  • Exact vs. approximate nearest neighbor (ANN) — when exactness matters
  • Index types — HNSW, IVF, PQ, LSH and their tradeoffs
  • Filtering — pre-filter vs. post-filter and why it's hard
  • Scale — how index choice changes at 1M vs. 1B vectors
  • Hosted vs. self-managed — Pinecone/Weaviate/Qdrant vs. pgvector vs. FAISS

How to Approach This

Exact k-NN is O(n·d) per query (n = corpus size, d = dimensions) — fine for tens of thousands of vectors, unusable at millions. ANN trades a small accuracy loss (recall@10 typically 95-99%) for orders-of-magnitude speed gains. For most RAG use cases, ANN recall is sufficient.

2. Index Types

HNSW (Hierarchical Navigable Small World)

  • Graph-based index; builds a multi-layer proximity graph
  • Best recall/latency tradeoff for most RAG workloads
  • Memory-resident — the whole graph lives in RAM
  • Weakness: high memory usage (4-8 bytes × dimensions × num_vectors)
  • Default choice for production RAG at moderate scale

IVF (Inverted File Index)

  • Clusters vectors at index time; at query time, searches only the nearest clusters
  • Lower memory than HNSW; can be combined with PQ for disk storage
  • Weakness: recall degrades if query falls near cluster boundary; requires training step
  • Good for very large corpora where HNSW memory usage is prohibitive

PQ (Product Quantization)

  • Compresses vectors by splitting dimensions into sub-vectors and quantizing each
  • Massive memory savings (10-25x) at the cost of recall
  • Usually used as IVF+PQ (Faiss IndexIVFPQ) for billion-scale search
  • Not ideal for small corpora where full precision is cheap

LSH (Locality-Sensitive Hashing)

  • Hashes similar vectors to the same bucket with high probability
  • Theoretically elegant but in practice HNSW dominates for dense vectors
  • Still useful for Jaccard/Hamming similarity (sparse or binary vectors)

3. The Filtering Problem

Metadata filtering (e.g., "only search documents from user_id = 123") is harder than it looks:

  • Post-filter: run ANN search, then discard results that don't match filter → recall degrades severely if the filter is selective
  • Pre-filter: restrict the search space to matching vectors first → can break ANN index structure, degrades to brute force
  • Best approach: purpose-built vector DBs (Qdrant, Weaviate, Pinecone) implement filtered HNSW or segment-based filtering that maintains recall; pgvector naively post-filters

For high-selectivity filters (e.g., per-user document isolation), this is a key architectural consideration.

4. Choosing a Vector Database: Decision Framework

| Requirement | Recommendation | |---|---| | Simple prototype, small corpus | pgvector (already in Postgres) | | Production RAG, < 50M vectors | Qdrant or Weaviate (self-hosted) or Pinecone | | Need strong metadata filtering | Qdrant (best-in-class filtered search) | | Billion-scale, cost-sensitive | FAISS (IVF+PQ) self-managed | | Need full-text + vector hybrid | Weaviate or Elasticsearch with dense retrieval | | Minimal ops overhead, managed | Pinecone |

5. Similarity Metrics

  • Cosine similarity: angle between vectors; most common for text embeddings; scale-invariant
  • Dot product: faster, but scale-sensitive; used when embeddings are normalized (equivalent to cosine)
  • L2 (Euclidean): sensitive to vector magnitude; less common for text but standard in some CV applications

Match the metric to how your embedding model was trained — most sentence transformers are trained with cosine.

Common Follow-ups

  1. "How does clustering reduce the search space, and when does it fail?" IVF assigns each vector to a cluster centroid at index time. At query time, it searches only the top nprobe nearest clusters. It fails when query vectors fall near cluster boundaries — the relevant neighbors may be in an adjacent cluster that wasn't probed. Mitigation: increase nprobe (trades recall for speed) or use HNSW.

  2. "How would you handle vector index updates in a production system?" HNSW supports incremental inserts but not efficient deletes (requires rebuilding affected nodes). For high-churn corpora, segment-based approaches (write new segments, merge periodically) or managed databases (Pinecone, Qdrant) that handle this internally are preferable.

  3. "What's the difference between a vector index, a vector DB, and a vector plugin?"

    • Vector index (FAISS, HNSW lib): just the search index, no storage layer
    • Vector DB (Qdrant, Weaviate, Pinecone): full database with storage, metadata, filtering, APIs
    • Vector plugin (pgvector, Redis VSS): adds vector search to an existing DB; convenient but less optimized for pure-vector workloads

Related Questions

Prep for the full interview loop

Know the concepts. Now prove it. Practice GenAI, Coding, System Design, and AI/ML Design interviews with an AI that tells you exactly where you fell short.

Start a mock interview