Claude Models Explained (2026): Opus 4.7 vs Sonnet 4.6 vs Haiku 4.5 — Which One for Your Business
Ampliflow
Advanced AI frontier lab and business growth agency. Helping UK businesses deploy agentic AI systems.

There are three Claude models that matter in 2026, and choosing between them is a cost-and-capability decision, not a technical one. Opus 4.7 is the most capable. Sonnet 4.6 is the workhorse. Haiku 4.5 is the fast one. The skill is knowing which job goes to which.
Last updated: May 2026 · Covers Claude Opus 4.7, Sonnet 4.6, Haiku 4.5
TL;DR: Anthropic ships three current Claude models. Opus 4.7 (released 16 April 2026) is the frontier model — 87.6% on SWE-bench Verified, a 1M-token context window, and a step-change in agentic coding, at $5 per million input tokens and $25 per million output. Sonnet 4.6 is the best balance of speed and intelligence at $3/$15 — the model most production systems should default to. Haiku 4.5 is the fastest, with near-frontier quality at $1/$5, built for high-volume work like classification and triage. The expensive mistake UK businesses make is running everything on the biggest model. The teams that win route each task to the cheapest model that can do it well. This guide explains each model, gives you a decision framework, and shows how we route models in production.
Contents
- What are the current Claude models?
- Claude Opus 4.7: when you need the best
- Claude Sonnet 4.6: the workhorse
- Claude Haiku 4.5: speed and scale
- Which Claude model should you actually use?
- How much do the Claude models cost?
- What changed with Opus 4.7, and what is being retired?
- How does Ampliflow choose models in production?
- Frequently asked questions
What are the current Claude models?
Claude is a family of three current models from Anthropic, separated by capability, speed, and price. They share the same skills — text and image input, vision, multilingual reasoning, tool use — and differ in how hard they can think, how fast they respond, and what they cost per token.
Here is the whole family in one table, with the numbers that actually drive a decision.
| Claude Opus 4.7 | Claude Sonnet 4.6 | Claude Haiku 4.5 | |
|---|---|---|---|
| Role | Most capable | Best speed + intelligence | Fastest, near-frontier |
| API model ID | `claude-opus-4-7` | `claude-sonnet-4-6` | `claude-haiku-4-5` |
| Context window | 1M tokens | 1M tokens | 200k tokens |
| Max output | 128k tokens | 64k tokens | 64k tokens |
| Input price | $5 / M tokens | $3 / M tokens | $1 / M tokens |
| Output price | $25 / M tokens | $15 / M tokens | $5 / M tokens |
| Relative latency | Moderate | Fast | Fastest |
| Reliable knowledge cutoff | Jan 2026 | Aug 2025 | Feb 2025 |
A "token" is roughly three-quarters of a word. A million tokens is around 750,000 words — the entire Lord of the Rings trilogy, with room to spare. That is the size of context Opus 4.7 and Sonnet 4.6 can hold in working memory at once.
The headline most people miss: the price gap between the top and bottom of the range is 5×. The same task costs five times more on Opus than on Haiku. Whether that 5× buys you anything depends entirely on the task. For most tasks, it does not. That is the entire game.
Claude Opus 4.7: when you need the best
Opus 4.7 is Anthropic's most capable generally available model, and the gap it opened in agentic coding is the reason to care. Released on 16 April 2026, it scores 87.6% on SWE-bench Verified — the industry-standard test of an AI resolving real GitHub issues — up from 80.8% on the previous Opus, and ahead of both Gemini 3.1 Pro (80.6%) and GPT-5.4.
The headline number undersells the practical one. On Rakuten's internal engineering benchmark, Opus 4.7 resolved three times as many production tasks as Opus 4.6. On the harder, more industrially honest SWE-bench Pro, it jumped from 53.4% to 64.3% — a ten-point gain that puts daylight between it and every competitor currently shipping.
What that means in a business: the class of work an AI agent can finish on its own, without a human untangling it, just got meaningfully larger. Multi-step refactors. Cross-file debugging. The tickets that used to need a senior engineer's full attention.
Opus 4.7 holds a 1M-token context window — large enough to read an entire mid-sized codebase, a full quarter of customer transcripts, or a stack of legal contracts in a single pass. It uses adaptive thinking, deciding for itself how much reasoning a problem deserves rather than burning tokens on questions that do not need them.
One sharp edge worth knowing: Opus 4.7 ships with a new tokenizer. Your token counts — and therefore your costs — will not map one-to-one onto older models. If you are migrating a cost estimate from Opus 4.6, re-measure rather than assume.
Use Opus 4.7 when the cost of a wrong answer is high and the problem is genuinely hard: production code changes, complex reasoning over long documents, financial or legal analysis, and the reviewing-model step that checks other models' work. Do not use it to summarise an email.
Claude Sonnet 4.6: the workhorse
Sonnet 4.6 is the model most production systems should default to. Anthropic describes it as the best combination of speed and intelligence in the range, and that is the honest summary: it is fast enough for interactive use, smart enough for the overwhelming majority of real tasks, and priced at $3 input / $15 output — 40% of Opus's cost.
It carries the same 1M-token context window as Opus 4.7, so long-document work is not off the table. It supports both extended thinking and adaptive thinking, which means you can dial reasoning up for the harder requests and leave it lean for the rest.
In practice, Sonnet 4.6 is where the work lives. Customer-facing chat. Content drafting. Data extraction and structuring. Routine code. Internal tools. Research synthesis. The model is good enough that the question is usually not "is Sonnet capable of this?" but "is this rare task hard enough to justify paying for Opus?"
If you are building one thing and want one model to build it with, this is the one. Start on Sonnet. Promote the genuinely hard sub-tasks to Opus only when you can measure that it matters.
Claude Haiku 4.5: speed and scale
Haiku 4.5 is the fastest Claude model, with near-frontier intelligence at a fifth of Opus's price. At $1 input / $5 output and the lowest latency in the family, it is built for the work that is high in volume and low in individual stakes.
Think classification at scale: routing 50,000 support tickets into categories, scoring leads, tagging documents, moderating user content, extracting fields from forms. The intelligence required per item is modest. The number of items is enormous. On Opus, that workload would be financially absurd. On Haiku, it is a rounding error.
Its 200k-token context window is smaller than its siblings — large by any normal standard, but not the million-token canvas of Opus and Sonnet. Its reliable knowledge cutoff (February 2025) is also older, so for questions about recent events, feed it the facts rather than relying on its memory.
Use Haiku where speed and volume dominate and each individual decision is simple. The art is recognising that a great many "AI tasks" in a business are exactly this kind of task — and have been quietly overpaying for Opus to do them.
Which Claude model should you actually use?
Match the model to the cost of being wrong, not to the prestige of the name. The wrong default — "use the best one, to be safe" — is how AI bills balloon while the value stays flat.
The model stopped being the bottleneck this year. Your judgement about which model to point at which job became it. Intelligence got cheap. Knowing where to spend it did not.
Here is the routing logic we apply to every workload.
| If the task is… | Use | Why |
|---|---|---|
| High-volume classification, tagging, routing, moderation | Haiku 4.5 | Simple per item, enormous in aggregate. Speed and price dominate. |
| Customer chat, content drafting, data extraction, routine code | Sonnet 4.6 | The capable default. Smart enough for ~90% of real work. |
| Production code changes, hard multi-step reasoning, long-document analysis | Opus 4.7 | The cost of a wrong answer is high and the problem is genuinely hard. |
| Checking another model's output before it ships | Opus 4.7 | A fresh, more capable reviewer catches what the worker missed. |
| Anything you are unsure about | Sonnet 4.6 | Start here. Promote to Opus only when you can measure the gain. |
The pattern underneath the table: most systems should run a mix, not a single model. A well-built customer-support workflow might triage incoming messages on Haiku, draft replies on Sonnet, and escalate the rare genuinely-complex case to Opus — paying frontier prices only for the 5% of work that needs frontier intelligence.
This is the difference between an AI experiment and an AI system. The experiment uses one big model for everything and quietly costs three times what it should. The system routes deliberately.
How much do the Claude models cost?
You pay per token — input and output billed separately — and output is always the expensive half. That single fact changes how you design a system. A model that reasons concisely costs far less than one that thinks out loud, even at the same headline rate.
Here is the API pricing again, with a worked example underneath.
| Model | Input ($/M tokens) | Output ($/M tokens) |
|---|---|---|
| Opus 4.7 | $5 | $25 |
| Sonnet 4.6 | $3 | $15 |
| Haiku 4.5 | $1 | $5 |
The maths that matters: A support-triage agent classifies 100,000 messages a month. Each message is ~500 input tokens and ~100 output tokens. On Haiku 4.5, that is 50M input + 10M output = roughly $100/month. Run the identical workload on Opus 4.7 and it is 50M × $5 + 10M × $25 = $500/month — five times the cost, for a classification task Haiku handles perfectly. The model choice is a £4,800-a-year decision on one workflow alone.
Two levers cut these numbers further, and both are worth knowing before you build:
- Prompt caching lets you reuse a large, fixed context (a system prompt, a knowledge base, a codebase) across many requests at a steep discount, instead of paying full input price every time. For agents that re-read the same context repeatedly, this is the single biggest saving available.
- The Batch API processes non-urgent jobs at a substantial discount in exchange for slower turnaround. If the work does not need to happen this second — overnight report generation, bulk enrichment, large backfills — batch it.
For most UK businesses, the practical path is not the raw API at all. Claude Code (the subscription coding tool) bundles model access into a flat per-seat price, and consumer Claude.ai plans start around £17/month. The raw API is for when you are building a product or an agent on top of the models. We cover the full breakdown — including how to get set up and what it really costs — in our companion guide on getting an Anthropic API key and its true cost, and the Claude Code subscription economics in Claude Code Pricing 2026.
What changed with Opus 4.7, and what is being retired?
The 4.7 release moved the agentic-coding ceiling and quietly reset the pricing of the top tier. Two things are worth flagging for anyone who made decisions on an older model.
Agentic coding took a real step up. The jump from 80.8% to 87.6% on SWE-bench Verified, and the 3× improvement on Rakuten's production benchmark, are not marketing increments. They expand the set of tasks you can hand to an agent and trust it to finish. If you evaluated agentic coding six months ago and concluded it was not ready for your hardest work, that conclusion has an expiry date on it.
The top tier got dramatically cheaper over the last year. Opus once cost $15 input / $75 output. Opus 4.7 costs $5 / $25 — a two-thirds price cut for the frontier model in roughly a year. The cost case for using the best model on the work that needs it has never been stronger.
And two models are being retired. The original Claude Opus 4 (claude-opus-4-20250514) and Claude Sonnet 4 (claude-sonnet-4-20250514) reach end of life on 15 June 2026. If any of your systems still pin those IDs, migrate to Sonnet 4.6 and Opus 4.7 before the deadline — pinned-snapshot model IDs do not silently upgrade themselves, so this will not happen automatically. A model retirement is the kind of dependency that breaks a production system at the worst possible moment if nobody is watching the calendar.
How does Ampliflow choose models in production?
We treat model selection as an engineering discipline, not a default. Inside Amplex — our agentic orchestration framework — every step in a workflow is assigned the cheapest model that can do that step reliably, and the assignment is measured, not assumed.
In practice that looks like a hierarchy. Haiku 4.5 runs the high-volume, low-stakes layer: classification, routing, first-pass extraction. Sonnet 4.6 runs the body of the work: drafting, structuring, the bulk of code generation, customer-facing responses. Opus 4.7 is reserved for two jobs — the genuinely hard reasoning that nothing cheaper can finish, and the reviewing-model verification step, where a more capable model reads another model's output with fresh context and grades it against a rubric before a human ever sees it.
That last pattern is the one most teams miss. Using your most capable model as a reviewer rather than only as a worker catches a large share of errors before they reach production — and it is often cheaper than the rework a missed bug would have cost.
The discipline pays for itself twice. Costs stay proportional to the difficulty of the work rather than the ambition of the system. And reliability goes up, because each model is operating inside the envelope where it performs best instead of being stretched across tasks it is over- or under-qualified for.
This is how the same orchestration runs underneath Cellbot, our Claude-powered platform for UK repair businesses, and the operational automation in Hermes Agent. Different products, same principle: route deliberately.
Frequently asked questions
What is the difference between Opus, Sonnet, and Haiku?
They are three tiers of the same Claude family. Opus 4.7 is the most capable and most expensive, for the hardest work. Sonnet 4.6 balances speed and intelligence and handles the majority of real tasks. Haiku 4.5 is the fastest and cheapest, built for high-volume, lower-complexity work. The names stay constant across generations; the version number (4.7, 4.6, 4.5) tells you the release.
Which Claude model is best for coding?
Claude Opus 4.7, by a clear margin — 87.6% on SWE-bench Verified at release, the highest public score among major models. For everyday coding inside a tight budget, Sonnet 4.6 is excellent and far cheaper; many teams run Sonnet by default and route only the hardest changes to Opus. Most UK businesses access this through Claude Code rather than the raw API.
Is Sonnet good enough, or do I need Opus?
For roughly 90% of business tasks, Sonnet 4.6 is good enough — and you will not notice the difference except on the bill, where you save 40%. Reach for Opus 4.7 when the task is genuinely hard (complex multi-step reasoning, production code, long-document analysis) or when the cost of a wrong answer is high. Start on Sonnet, measure, and promote specific tasks to Opus only when the gain is real.
How big is the Claude context window?
Both Opus 4.7 and Sonnet 4.6 have a 1M-token context window — around 750,000 words, enough to hold an entire codebase or a large stack of documents at once. Haiku 4.5 has a 200k-token window, which is still very large but not the full million. Maximum output per response is 128k tokens for Opus and 64k for Sonnet and Haiku.
How much does the Claude API cost?
Per million tokens: Opus 4.7 is $5 input / $25 output, Sonnet 4.6 is $3 / $15, and Haiku 4.5 is $1 / $5. Output is billed at five times the input rate across the range, so concise systems cost less. Prompt caching and the Batch API both cut costs substantially for the right workloads. Full setup and cost detail is in our Anthropic API key guide.
Which models are being retired and when?
The original Claude Opus 4 and Claude Sonnet 4 (both with 20250514 model IDs) reach end of life on 15 June 2026. Migrate any systems pinned to those IDs to Opus 4.7 and Sonnet 4.6 before then. Opus 4.6 and Sonnet 4.5 remain available as legacy models but Anthropic recommends moving to the current generation.
Do I need the API, or is a subscription enough?
If you want Claude for chat, research, or coding, a subscription is enough — Claude.ai plans start around £17/month, and Claude Code bundles model access for engineering work at a flat per-seat price. You only need the raw API when you are building a product, an automation, or an agent on top of the models, where you want to control routing, caching, and integration yourself.
Related reading
- ↔ How to Get an Anthropic API Key — And What It Really Costs (2026) — the practical setup guide, with the cost mechanics this article only summarised
- ↔ The Latest Claude Code Features: Plan Mode, Hooks, Subagents & GitHub Actions — what the newest Claude Code release lets your team do
- ↔ The Claude Agent SDK: Building Production Agents — when to build on the models directly instead of using Claude Code
- ↔ What Is MCP (Model Context Protocol)? — the standard that connects these models to your tools and data
- ↔ What Is Claude Code? A UK Business Guide — the coding tool most UK teams use to access these models
What should you do next?
Most UK businesses do not have a model problem. They have a routing problem — paying frontier prices for work a cheaper model would do perfectly, or running everything on a cautious default that costs three times what it should.
If you are building anything on Claude — an agent, an automation, a customer-facing product — the fastest way to find the savings is a free audit. We map your workloads to the right models, estimate the bill before you commit, and return a specific plan within 48 hours. No obligation, no sales pitch.
See how Ampliflow builds on Claude in production →
Or start with a free audit of your stack, your workloads, and your highest-impact opportunities: Book a free AI audit →
The teams that treat model selection as a discipline in 2026 will run AI systems that scale profitably. The teams that default to "the best one" will spend 2027 wondering where the budget went.
Ampliflow is a UK AI frontier lab and growth agency based in Solihull, West Midlands. We ship production AI systems for UK SMEs and enterprises using Claude, the Amplex orchestration framework, and reviewing-model verification. Our case studies are named, our methodology is published, and our team builds with Claude daily.