Advanced5 min read

How Do You Architect a Multi-Tenant LLM Deployment with Role-Based Data Access?

Enterprise AI products serve multiple customers from shared infrastructure. Walk through how to design tenant isolation, role-based access control, and data governance for a multi-tenant LLM deployment.

Prep for the full interview loop

Know the concepts. Now prove it. Practice GenAI, Coding, System Design, and AI/ML Design interviews with an AI that tells you exactly where you fell short.

Start a mock interview

Why This Is Asked

This is a flagship question at Microsoft (Copilot/Azure OpenAI), Salesforce, and any company building B2B AI products. It tests enterprise engineering instincts: data isolation, compliance, RBAC, and cost attribution — none of which come up in consumer product interviews.

Key Concepts to Cover

  • Tenant isolation models — silo vs. pool vs. bridge
  • RBAC for AI — how permissions propagate to retrieved content
  • Data residency — where tenant data lives and how it's governed
  • Cross-tenant data leakage — the specific risks in LLM/RAG systems
  • Cost attribution — tracking usage per tenant
  • Rate limiting and quotas — per-tenant resource management

How to Approach This

1. Clarify the Tenancy Model

Before designing, establish:

  • Is each tenant a company, a department, or an end user?
  • Do tenants bring their own data, or do they share a common corpus?
  • Are there regulatory requirements? (HIPAA, GDPR, SOC 2)
  • What's the scale? (10 tenants vs. 10,000 tenants)

2. Tenant Isolation Models

Silo model (one stack per tenant)

  • Separate vector DB, LLM endpoint, and storage per tenant
  • Maximum isolation — zero cross-tenant data risk
  • Cost: high; operational burden: high
  • Best for: high-compliance tenants (healthcare, government, finance), small number of large enterprise customers

Pool model (shared stack with logical separation)

  • All tenants in shared infrastructure, partitioned by tenant_id
  • Vector DB namespaces or collections per tenant; all metadata tagged with tenant_id
  • Enforce tenant filter on every query — never execute a vector search without a tenant scope
  • Lower cost; still requires careful implementation to prevent leakage
  • Best for: SaaS products with many SMB tenants

Bridge model (hybrid)

  • Shared compute and LLM access, but isolated storage per tenant
  • Tenant-owned databases, centrally managed API layer
  • Good balance for mid-market customers with data sovereignty requirements

3. RBAC in the Retrieval Layer

This is the hardest part. A user with "viewer" access to the sales team's docs shouldn't retrieve HR documents — even if they're in the same tenant.

Design:

  • Every document in the vector DB has metadata: tenant_id, owner_team, classification, allowed_roles
  • Every retrieval query includes the user's tenant_id and roles as mandatory filters
  • The application layer constructs the filter — never trust the user query to scope itself
# Example: every vector search must include authorization scope
results = vector_db.search(
    query_vector=embedded_query,
    filter={
        "tenant_id": {"eq": current_user.tenant_id},
        "allowed_roles": {"contains_any": current_user.roles}
    },
    top_k=10
)

Risk: prompt injection bypassing RBAC A malicious user could craft a query that tricks the LLM into revealing information it retrieved but shouldn't surface. Mitigations:

  • Apply RBAC at retrieval — never retrieve unauthorized docs in the first place
  • Post-process: scan LLM output for content from unauthorized sources
  • Least-privilege: retrieve the minimum needed, not everything that matches

4. Data Residency and Compliance

Enterprises in regulated industries require data to stay within geographic boundaries.

Design considerations:

  • Deploy regional stacks (US, EU, APAC) with tenant routing
  • Route EU tenants exclusively to EU-hosted LLM endpoints (Azure EU regions, not US endpoints)
  • Encrypt tenant data at rest and in transit with tenant-managed keys (BYOK) for high-compliance tenants
  • Audit log every LLM request and retrieved document for compliance reporting

5. Cross-Tenant Leakage Risks

Specific to LLM/RAG systems:

  • Prompt cache sharing: if two tenants share an LLM inference node, prompt caches could leak. Use separate inference endpoints or disable cross-request prompt sharing.
  • Embedding space proximity: documents from different tenants could be near each other in embedding space. A missing tenant filter would retrieve the wrong tenant's documents. This is the most common production bug.
  • LLM fine-tuned on shared data: if you fine-tune on multi-tenant data, model weights can memorize specific tenant content. Always fine-tune on per-tenant datasets or anonymized aggregates.

6. Cost Attribution and Quotas

  • Tag every LLM request with tenant_id — enables cost reporting per tenant
  • Enforce per-tenant token quotas (monthly budget caps)
  • Rate limit at the API gateway level per tenant key
  • Expose usage dashboards in the tenant admin portal
API Gateway → authenticate → extract tenant_id
           → check quota (Redis counter) → reject if exceeded
           → forward request with tenant_id header
           → log tokens consumed → decrement quota

Common Follow-ups

  1. "How would you handle a tenant that wants to bring their own LLM (e.g., a fine-tuned Llama)?" Abstract the LLM layer behind an interface; route per-tenant to different model endpoints. Store endpoint configuration per tenant. The retrieval, RBAC, and context-assembly layers remain shared.

  2. "What's your strategy for onboarding a new enterprise tenant?" Provision a namespace/collection in the vector DB, configure RBAC policies from their identity provider (SAML/OIDC), ingest their documents through the standard pipeline with tenant tagging, run a smoke test with retrieval quality checks, and set up their quota and billing.

  3. "How do you test that RBAC is actually working?" Write integration tests that: (a) create two tenants with overlapping document content, (b) query as tenant A and assert no tenant B documents appear, (c) test cross-role queries within a tenant. Run these in CI on every deployment.

Related Questions

Prep for the full interview loop

Know the concepts. Now prove it. Practice GenAI, Coding, System Design, and AI/ML Design interviews with an AI that tells you exactly where you fell short.

Start a mock interview