The Claude 4.x family
Three model sizes, one philosophy: same family, different speed/cost/capability sliders. Pick by task. Mix in production.
On this page
1. TL;DR
| Model | ID | Best at | Cost | Context |
|---|---|---|---|---|
| Opus 4.7 | claude-opus-4-7 | Hard reasoning, multi-step plans, ambiguous specs. | $$$ | 200k (1M variant available) |
| Sonnet 4.6 | claude-sonnet-4-6 | The daily driver. 90% of pro work. | $$ | 200k |
| Haiku 4.5 | claude-haiku-4-5-20251001 | High-volume, latency-critical, bounded tasks. | $ | 200k |
2. Claude Opus 4.7
The reasoning flagship. Best at multi-step plans, large refactors, design decisions, and anything where the model needs to think before answering. Extended thinking is supported and recommended for hard problems.
Use when
- The task is open-ended ("redesign this auth flow"), ambiguous ("figure out why X is slow"), or has lots of state.
- You're refactoring across many files at once.
- You need long-horizon agentic behavior (Claude Code on a hard ticket).
- The cost of a wrong answer is high (production code, contracts, security review).
Avoid when
- The task is bounded and well-defined — Sonnet is just as good for 90% of well-scoped prompts and 5× cheaper.
- You're doing high-volume bulk work — use Haiku or batch Sonnet.
- Latency matters more than depth — Sonnet is faster.
1M context variant
For huge codebases or massive document workloads, claude-opus-4-7[1m] exposes a 1M-token context window. Use sparingly — token cost is the same per-token but you'll easily ship 500k tokens per request. Pairs perfectly with prompt caching.
3. Claude Sonnet 4.6
The default model for almost everything. Strong reasoning, fast enough for interactive UX, ~5× cheaper than Opus. Most production apps should start here.
Use when
- You're building a chat UX — users feel the latency.
- The task is well-scoped: "summarize this doc", "answer this question with citations", "rewrite this paragraph".
- You want a strong agent for routine coding work.
- You're optimizing for cost at moderate volume.
Promote to Opus when
- Sonnet visibly struggles on your evals — wrong tool selection, missed edge cases, weak plans.
- The task starts to need multi-step thinking and Sonnet's plans are shallow.
4. Claude Haiku 4.5
The cheap, fast model for high-volume tasks. Strong enough for classification, routing, intent detection, autocomplete, simple Q&A. Not the model for complex reasoning.
Use when
- You'll make a lot of calls — millions/day.
- The task is narrow and bounded.
- Latency is critical (autocomplete, live UX).
- You're routing or filtering before handing off to Sonnet / Opus.
Common patterns
- Triage router — Haiku decides which downstream model to call.
- Bulk classification — feed a million records through Haiku + Batch API for ~$0.50/M tokens.
- Autocomplete — sub-second response times.
- Guardrail check — pre-filter prompts for safety before invoking Opus.
5. How to pick (decision flow)
Is the task open-ended / multi-step / ambiguous?
└─ Yes → Opus 4.7
└─ No → Is latency critical OR are you making >100k calls/day?
└─ Yes → Haiku 4.5
└─ No → Sonnet 4.6
Validate the choice with an eval set of ~50 representative inputs. If Sonnet hits your bar, stay there. If it doesn't, escalate.
6. Mixing models in one product
Most pro deployments use multiple models. Common topology:
| Stage | Model | Why |
|---|---|---|
| Route the user's intent | Haiku 4.5 | Fast, cheap, accurate enough for "is this support / sales / engineering?" |
| Filter / guardrail | Haiku 4.5 | Same — cheap precheck before paying for Sonnet. |
| Execute the user's request | Sonnet 4.6 | Quality + cost sweet spot. |
| Handle the gnarly edge case | Opus 4.7 | Escalation path when Sonnet's confidence is low. |
| Generate the final polished output | Sonnet 4.6 or Opus 4.7 | Quality matters at the end. |
Cost-wise, this beats "always Opus" by 5–20×, often with no quality drop.
7. Migration notes (4.5 / 4.6 → 4.7)
Sonnet 4.6 → Opus 4.7 / Sonnet 4.7 is largely a model-string change. The API surface is stable. Things to know:
- Extended thinking is more useful in 4.x than it was in 3.x — bump
budget_tokensfor hard tasks. - Tool selection is more aggressive about parallel calls. If your handler doesn't support concurrency, set
disable_parallel_tool_use: true. - Prompt caching works the same; existing cache breakpoints continue to apply.
- System prompt sensitivity — 4.x is better at following nuanced instructions. You may be able to trim verbose "always do X" lists.
- Re-run your evals. Higher capability sometimes means new failure modes (over-confidence, more aggressive refactoring). Don't ship without a validation sweep.
8. Deprecated / retired models
Anthropic retires older models on a published schedule. If your code still references one of these, plan a migration:
| Old | Replace with |
|---|---|
claude-3-opus-* | claude-opus-4-7 |
claude-3-5-sonnet-* | claude-sonnet-4-6 |
claude-3-haiku-* | claude-haiku-4-5-20251001 |
claude-3-5-haiku-* | claude-haiku-4-5-20251001 |
Check docs.claude.com/about-claude/models for the live retirement schedule.