Why This Is Asked
Interviewers want to see your decision-making framework — not just that you know both techniques, but that you can reason about when each is appropriate based on data availability, cost, latency, and maintenance overhead.
Key Concepts to Cover
- Few-shot prompting — including examples in the prompt to guide behavior
- Fine-tuning — updating model weights on a task-specific dataset
- Data requirements — how much labeled data each approach needs
- Latency and cost — few-shot sends more tokens; fine-tuning uses smaller models
- Starting point — in most cases, start with few-shot, fine-tune when necessary
How to Approach This
1. Few-Shot Prompting
Include 3-20 labeled examples directly in the prompt before the test input.
Pros: Zero training required, easy to update, works immediately, interpretable.
Cons: Examples take up context window, higher per-request cost and latency, performance ceiling.
Best for: New tasks, tasks with few examples, frequently changing requirements, prototyping.
2. Fine-Tuning
Continue training a model on your labeled dataset, updating its weights.
Pros: Better performance on well-defined tasks, lower per-request cost, lower latency, better consistency.
Cons: Requires labeled data (100-10,000+ examples), training cost, risk of overfitting, re-training when task shifts.
Best for: High-volume tasks with stable definitions, tasks where examples alone don't achieve needed accuracy.
3. Decision Framework
| Condition | Recommendation | |-----------|---------------| | < 100 labeled examples | Few-shot | | Task changes frequently | Few-shot | | Prototyping | Few-shot | | > 1,000 examples, stable task | Consider fine-tuning | | Cost/latency critical at scale | Fine-tuning | | Few-shot accuracy is sufficient | Stay with few-shot |
4. Practical Recommendation
In most cases, start with few-shot. Fine-tune when:
- Few-shot accuracy doesn't meet requirements after optimization
- You have enough labeled data
- The task is stable enough
- Cost/latency savings justify the engineering investment
Common Follow-ups
-
"What about RLHF?" RLHF (and its modern variants — DPO, RLAIF, Constitutional AI) are used to shape model behavior based on human or AI feedback. They were initially associated with general alignment (safety, helpfulness), but are now widely applied for task-specific adaptation too. For a classification task, standard supervised fine-tuning is usually the right starting point. RLHF-style techniques become relevant when you want to optimize for nuanced human preferences that are hard to capture with simple labeled examples — for example, response quality in open-ended tasks or avoiding specific failure modes.
-
"How much data do you need to fine-tune?" Simple binary: 200-500 examples. Multi-class: 1,000-10,000. Highly specialized: 10,000+. Always evaluate on a held-out validation set.
-
"What is LoRA and when would you use it?" LoRA freezes most model weights and trains small adapter matrices, dramatically reducing training cost. Standard approach for fine-tuning large models without full compute budget.