Why This Is Asked
LLM providers regularly update models — sometimes silently. A model update can change output format, tone, reasoning quality, or safety behavior in ways that break your application. Interviewers want to see if you treat model upgrades with the same rigor as software deployments.
Key Concepts to Cover
- Eval gate — run your test suite against the new model before any production traffic
- Shadow testing — run new model in parallel, compare outputs, do not serve to users
- Canary deployment — route a small % of traffic to the new model, monitor metrics
- Prompt compatibility — new models may respond differently to the same prompts
- Rollback plan — always be able to revert to the previous model version
- Pinned model versions — use explicit version IDs, not "latest"
How to Approach This
1. Never Use "latest" in Production
Always pin to a specific model version:
- Bad: model: "latest" (or any floating alias) — this can change without notice
- Good: model: "provider-model-YYYY-MM-DD" (or exact immutable version ID) — pinned and reproducible
Monitor provider announcements and deprecation notices.
2. Pre-Upgrade: Eval Gate
Before touching production, run your full eval suite against the new model version:
old_results = run_eval_suite(model="current_pinned_model_version")
new_results = run_eval_suite(model="candidate_model_version")
regression = compare_results(old_results, new_results)
if regression.any_critical_failures:
raise Exception("New model failed critical eval cases — block upgrade")
3. Shadow Testing
Run the new model in parallel with production, but do not serve its responses to users:
- Route 100% of traffic through the current model (users see this)
- Also route 100% through the new model (log results, do not serve)
- Compare outputs side by side
- Run for 24-48 hours to get a representative sample
4. Canary Deployment
If shadow testing looks good, route a small percentage to the new model:
- Start at 1-5% for 24 hours
- Monitor quality metrics, error rates, user satisfaction
- Gradually increase: 5% → 10% → 25% → 50% → 100%
- Roll back immediately if any metric degrades
5. Rollback Plan
Always have an instant rollback path:
- Feature flag to switch model version without a code deploy
- Keep the previous model pinned in your config for 30+ days after upgrade
- Document which model version was used when, for debugging historical issues
Common Follow-ups
-
"How do you handle provider-forced upgrades when an old model is deprecated?" Start the upgrade process 2-3 months before deprecation. Use the deadline as a forcing function for the eval gate and canary process.
-
"What if the new model is better on most metrics but worse on one critical dimension?" The critical dimension wins. Fix the regression (update the prompt, add guardrails) or do not upgrade.
-
"How do you manage model upgrades across multiple features using the same LLM?" Each feature should have its own eval suite. Run all evals in parallel before any upgrade. Different features may be ready to upgrade on different timelines.