Why This Is Asked
This is a flagship question at Microsoft (Copilot/Azure OpenAI), Salesforce, and any company building B2B AI products. It tests enterprise engineering instincts: data isolation, compliance, RBAC, and cost attribution — none of which come up in consumer product interviews.
Key Concepts to Cover
- Tenant isolation models — silo vs. pool vs. bridge
- RBAC for AI — how permissions propagate to retrieved content
- Data residency — where tenant data lives and how it's governed
- Cross-tenant data leakage — the specific risks in LLM/RAG systems
- Cost attribution — tracking usage per tenant
- Rate limiting and quotas — per-tenant resource management
How to Approach This
1. Clarify the Tenancy Model
Before designing, establish:
- Is each tenant a company, a department, or an end user?
- Do tenants bring their own data, or do they share a common corpus?
- Are there regulatory requirements? (HIPAA, GDPR, SOC 2)
- What's the scale? (10 tenants vs. 10,000 tenants)
2. Tenant Isolation Models
Silo model (one stack per tenant)
- Separate vector DB, LLM endpoint, and storage per tenant
- Maximum isolation — zero cross-tenant data risk
- Cost: high; operational burden: high
- Best for: high-compliance tenants (healthcare, government, finance), small number of large enterprise customers
Pool model (shared stack with logical separation)
- All tenants in shared infrastructure, partitioned by
tenant_id - Vector DB namespaces or collections per tenant; all metadata tagged with
tenant_id - Enforce tenant filter on every query — never execute a vector search without a tenant scope
- Lower cost; still requires careful implementation to prevent leakage
- Best for: SaaS products with many SMB tenants
Bridge model (hybrid)
- Shared compute and LLM access, but isolated storage per tenant
- Tenant-owned databases, centrally managed API layer
- Good balance for mid-market customers with data sovereignty requirements
3. RBAC in the Retrieval Layer
This is the hardest part. A user with "viewer" access to the sales team's docs shouldn't retrieve HR documents — even if they're in the same tenant.
Design:
- Every document in the vector DB has metadata:
tenant_id,owner_team,classification,allowed_roles - Every retrieval query includes the user's
tenant_idandrolesas mandatory filters - The application layer constructs the filter — never trust the user query to scope itself
# Example: every vector search must include authorization scope
results = vector_db.search(
query_vector=embedded_query,
filter={
"tenant_id": {"eq": current_user.tenant_id},
"allowed_roles": {"contains_any": current_user.roles}
},
top_k=10
)
Risk: prompt injection bypassing RBAC A malicious user could craft a query that tricks the LLM into revealing information it retrieved but shouldn't surface. Mitigations:
- Apply RBAC at retrieval — never retrieve unauthorized docs in the first place
- Post-process: scan LLM output for content from unauthorized sources
- Least-privilege: retrieve the minimum needed, not everything that matches
4. Data Residency and Compliance
Enterprises in regulated industries require data to stay within geographic boundaries.
Design considerations:
- Deploy regional stacks (US, EU, APAC) with tenant routing
- Route EU tenants exclusively to EU-hosted LLM endpoints (Azure EU regions, not US endpoints)
- Encrypt tenant data at rest and in transit with tenant-managed keys (BYOK) for high-compliance tenants
- Audit log every LLM request and retrieved document for compliance reporting
5. Cross-Tenant Leakage Risks
Specific to LLM/RAG systems:
- Prompt cache sharing: if two tenants share an LLM inference node, prompt caches could leak. Use separate inference endpoints or disable cross-request prompt sharing.
- Embedding space proximity: documents from different tenants could be near each other in embedding space. A missing tenant filter would retrieve the wrong tenant's documents. This is the most common production bug.
- LLM fine-tuned on shared data: if you fine-tune on multi-tenant data, model weights can memorize specific tenant content. Always fine-tune on per-tenant datasets or anonymized aggregates.
6. Cost Attribution and Quotas
- Tag every LLM request with
tenant_id— enables cost reporting per tenant - Enforce per-tenant token quotas (monthly budget caps)
- Rate limit at the API gateway level per tenant key
- Expose usage dashboards in the tenant admin portal
API Gateway → authenticate → extract tenant_id
→ check quota (Redis counter) → reject if exceeded
→ forward request with tenant_id header
→ log tokens consumed → decrement quota
Common Follow-ups
-
"How would you handle a tenant that wants to bring their own LLM (e.g., a fine-tuned Llama)?" Abstract the LLM layer behind an interface; route per-tenant to different model endpoints. Store endpoint configuration per tenant. The retrieval, RBAC, and context-assembly layers remain shared.
-
"What's your strategy for onboarding a new enterprise tenant?" Provision a namespace/collection in the vector DB, configure RBAC policies from their identity provider (SAML/OIDC), ingest their documents through the standard pipeline with tenant tagging, run a smoke test with retrieval quality checks, and set up their quota and billing.
-
"How do you test that RBAC is actually working?" Write integration tests that: (a) create two tenants with overlapping document content, (b) query as tenant A and assert no tenant B documents appear, (c) test cross-role queries within a tenant. Run these in CI on every deployment.