Enterprise AI products serve multiple customers from shared infrastructure. Walk through how to design tenant isolation, role-based access control, and data governance for a multi-tenant LLM deployment.

Design a multi-tenant LLM system with role-based data access. Covers tenant isolation, RBAC, data residency, cost attribution, and enterprise AI patterns.

Multi-Tenant LLM Deployment with RBAC - AI Interview Question

Why This Is Asked

This is a flagship question at Microsoft (Copilot/Azure OpenAI), Salesforce, and any company building B2B AI products. It tests enterprise engineering instincts: data isolation, compliance, RBAC, and cost attribution — none of which come up in consumer product interviews.

Key Concepts to Cover

Tenant isolation models — silo vs. pool vs. bridge
RBAC for AI — how permissions propagate to retrieved content
Data residency — where tenant data lives and how it's governed
Cross-tenant data leakage — the specific risks in LLM/RAG systems
Cost attribution — tracking usage per tenant
Rate limiting and quotas — per-tenant resource management

How to Approach This

1. Clarify the Tenancy Model

Before designing, establish:

Is each tenant a company, a department, or an end user?
Do tenants bring their own data, or do they share a common corpus?
Are there regulatory requirements? (HIPAA, GDPR, SOC 2)
What's the scale? (10 tenants vs. 10,000 tenants)

2. Tenant Isolation Models

Silo model (one stack per tenant)

Separate vector DB, LLM endpoint, and storage per tenant
Maximum isolation — zero cross-tenant data risk
Cost: high; operational burden: high
Best for: high-compliance tenants (healthcare, government, finance), small number of large enterprise customers

Pool model (shared stack with logical separation)

All tenants in shared infrastructure, partitioned by tenant_id
Vector DB namespaces or collections per tenant; all metadata tagged with tenant_id
Enforce tenant filter on every query — never execute a vector search without a tenant scope
Lower cost; still requires careful implementation to prevent leakage
Best for: SaaS products with many SMB tenants

Bridge model (hybrid)

Shared compute and LLM access, but isolated storage per tenant
Tenant-owned databases, centrally managed API layer
Good balance for mid-market customers with data sovereignty requirements

3. RBAC in the Retrieval Layer

This is the hardest part. A user with "viewer" access to the sales team's docs shouldn't retrieve HR documents — even if they're in the same tenant.

Design:

Every document in the vector DB has metadata: tenant_id, owner_team, classification, allowed_roles
Every retrieval query includes the user's tenant_id and roles as mandatory filters
The application layer constructs the filter — never trust the user query to scope itself

# Example: every vector search must include authorization scope
results = vector_db.search(
    query_vector=embedded_query,
    filter={
        "tenant_id": {"eq": current_user.tenant_id},
        "allowed_roles": {"contains_any": current_user.roles}
    },
    top_k=10
)

Risk: prompt injection bypassing RBAC A malicious user could craft a query that tricks the LLM into revealing information it retrieved but shouldn't surface. Mitigations:

Apply RBAC at retrieval — never retrieve unauthorized docs in the first place
Post-process: scan LLM output for content from unauthorized sources
Least-privilege: retrieve the minimum needed, not everything that matches

4. Data Residency and Compliance

Enterprises in regulated industries require data to stay within geographic boundaries.

Design considerations:

Deploy regional stacks (US, EU, APAC) with tenant routing
Route EU tenants exclusively to EU-hosted LLM endpoints (Azure EU regions, not US endpoints)
Encrypt tenant data at rest and in transit with tenant-managed keys (BYOK) for high-compliance tenants
Audit log every LLM request and retrieved document for compliance reporting

5. Cross-Tenant Leakage Risks

Specific to LLM/RAG systems:

Prompt cache sharing: if two tenants share an LLM inference node, prompt caches could leak. Use separate inference endpoints or disable cross-request prompt sharing.
Embedding space proximity: documents from different tenants could be near each other in embedding space. A missing tenant filter would retrieve the wrong tenant's documents. This is the most common production bug.
LLM fine-tuned on shared data: if you fine-tune on multi-tenant data, model weights can memorize specific tenant content. Always fine-tune on per-tenant datasets or anonymized aggregates.

6. Cost Attribution and Quotas

Tag every LLM request with tenant_id — enables cost reporting per tenant
Enforce per-tenant token quotas (monthly budget caps)
Rate limit at the API gateway level per tenant key
Expose usage dashboards in the tenant admin portal

API Gateway → authenticate → extract tenant_id
           → check quota (Redis counter) → reject if exceeded
           → forward request with tenant_id header
           → log tokens consumed → decrement quota

Common Follow-ups

"How would you handle a tenant that wants to bring their own LLM (e.g., a fine-tuned Llama)?" Abstract the LLM layer behind an interface; route per-tenant to different model endpoints. Store endpoint configuration per tenant. The retrieval, RBAC, and context-assembly layers remain shared.
"What's your strategy for onboarding a new enterprise tenant?" Provision a namespace/collection in the vector DB, configure RBAC policies from their identity provider (SAML/OIDC), ingest their documents through the standard pipeline with tenant tagging, run a smoke test with retrieval quality checks, and set up their quota and billing.
"How do you test that RBAC is actually working?" Write integration tests that: (a) create two tenants with overlapping document content, (b) query as tenant A and assert no tenant B documents appear, (c) test cross-role queries within a tenant. Run these in CI on every deployment.

How Do You Architect a Multi-Tenant LLM Deployment with Role-Based Data Access?

Why This Is Asked

Key Concepts to Cover

How to Approach This

1. Clarify the Tenancy Model

2. Tenant Isolation Models

3. RBAC in the Retrieval Layer

4. Data Residency and Compliance

5. Cross-Tenant Leakage Risks

6. Cost Attribution and Quotas

Common Follow-ups

Related Questions

Design a Production LLM Chat System (Design ChatGPT)

How Do You Estimate the Cost of Running a Production LLM System?

Design a Document Q&A System for a Large Corpus

Prep for the full interview loop