Advanced3 min read

Compare Few-Shot Prompting vs. Fine-Tuning for a Classification Task

Understand when to use few-shot prompting versus fine-tuning for classification — covering cost, data requirements, latency, and when each approach wins.

Also preparing for coding interviews?

Rubduck is an AI mock interviewer for DSA and coding rounds — get instant feedback on your solutions.

Daily tips, confessions & AI news. Unsubscribe anytime. Questions? [email protected]

Why This Is Asked

Interviewers want to see your decision-making framework — not just that you know both techniques, but that you can reason about when each is appropriate based on data availability, cost, latency, and maintenance overhead.

Key Concepts to Cover

  • Few-shot prompting — including examples in the prompt to guide behavior
  • Fine-tuning — updating model weights on a task-specific dataset
  • Data requirements — how much labeled data each approach needs
  • Latency and cost — few-shot sends more tokens; fine-tuning uses smaller models
  • Starting point — in most cases, start with few-shot, fine-tune when necessary

How to Approach This

1. Few-Shot Prompting

Include 3-20 labeled examples directly in the prompt before the test input.

Pros: Zero training required, easy to update, works immediately, interpretable.

Cons: Examples take up context window, higher per-request cost and latency, performance ceiling.

Best for: New tasks, tasks with few examples, frequently changing requirements, prototyping.

2. Fine-Tuning

Continue training a model on your labeled dataset, updating its weights.

Pros: Better performance on well-defined tasks, lower per-request cost, lower latency, better consistency.

Cons: Requires labeled data (100-10,000+ examples), training cost, risk of overfitting, re-training when task shifts.

Best for: High-volume tasks with stable definitions, tasks where examples alone don't achieve needed accuracy.

3. Decision Framework

| Condition | Recommendation | |-----------|---------------| | < 100 labeled examples | Few-shot | | Task changes frequently | Few-shot | | Prototyping | Few-shot | | > 1,000 examples, stable task | Consider fine-tuning | | Cost/latency critical at scale | Fine-tuning | | Few-shot accuracy is sufficient | Stay with few-shot |

4. Practical Recommendation

In most cases, start with few-shot. Fine-tune when:

  1. Few-shot accuracy doesn't meet requirements after optimization
  2. You have enough labeled data
  3. The task is stable enough
  4. Cost/latency savings justify the engineering investment

Common Follow-ups

  1. "What about RLHF?" RLHF (and its modern variants — DPO, RLAIF, Constitutional AI) are used to shape model behavior based on human or AI feedback. They were initially associated with general alignment (safety, helpfulness), but are now widely applied for task-specific adaptation too. For a classification task, standard supervised fine-tuning is usually the right starting point. RLHF-style techniques become relevant when you want to optimize for nuanced human preferences that are hard to capture with simple labeled examples — for example, response quality in open-ended tasks or avoiding specific failure modes.

  2. "How much data do you need to fine-tune?" Simple binary: 200-500 examples. Multi-class: 1,000-10,000. Highly specialized: 10,000+. Always evaluate on a held-out validation set.

  3. "What is LoRA and when would you use it?" LoRA freezes most model weights and trains small adapter matrices, dramatically reducing training cost. Standard approach for fine-tuning large models without full compute budget.

Related Questions

Prep the coding round too

AI knowledge is only half the picture. Rubduck helps you nail DSA and coding interviews with an AI interviewer that gives real-time feedback.

Daily tips, confessions & AI news. Unsubscribe anytime. Questions? [email protected]