Claude Code vs Codex: Anthropic vs OpenAI Agentic Coding (2026)
Ampliflow
Advanced AI frontier lab and business growth agency. Helping UK businesses deploy agentic AI systems.

Codex writes faster, cheaper code. Claude Code writes higher-quality, more contextually-aware code. Your job is knowing which one your team actually needs — and when the right answer is using both. This is the comparison no developer-press article frames properly: not benchmark-by-benchmark, but the business decision a UK SME engineering lead actually has to make in 2026. Where does the code run? Who pays per token? What does your DPO think? And — most underrated — which one matches the way your team actually ships software?
Last updated: May 2026 · Covers Claude Code v2 (Opus 4.7 / Sonnet 4.6 / Haiku 4.5) + OpenAI Codex CLI v0.130 (GPT-5.5)
TL;DR:
- Codex's GPT-5.5 leads SWE-bench Verified by 1.1pp (88.7% vs 87.6%) — effectively parity at the top
- Claude Code (Opus 4.7) leads SWE-bench Pro by 5.7pp on multi-file complex work — meaningful for production-scale
- Codex runs your code in OpenAI's cloud containers; Claude Code runs locally — material for UK GDPR / IP-sensitive codebases
- Codex uses 3-4× fewer tokens per equivalent task; Claude Code generates more thorough output
- The Reddit-validated production answer: use both — Claude Code for architecture and complex multi-file work, Codex for autonomous DevOps and CI tasks
What each tool actually is
OpenAI Codex (May 2026)
The current OpenAI Codex is CLI v0.130.0 (released 8 May 2026) — written in Rust, Apache-2.0 licensed, installable via npm install -g @openai/codex or brew install --cask codex. Also ships as a macOS desktop app and IDE extensions for VS Code, Cursor, and Windsurf.
The default model is GPT-5.5 (launched 23 April 2026), available across ChatGPT Plus, Pro, Business, and Enterprise plans. The defining architectural choice: tasks run in isolated cloud containers. Codex clones your repo into an OpenAI-managed container, disables internet inside it, runs the task autonomously, and returns a diff. The developer reviews the result before merge.
Sub-agents exist but run as independent parallel threads with no shared task list. MCP support is mature — added as default in December 2025. Native GitHub integration: triggered directly from issues/PRs.
Claude Code (May 2026)
Claude Code v2 — installable via npm install -g @anthropic-ai/claude-code. The defining architectural choice: local-first execution. Reads your actual filesystem, runs real shell commands, edits files in place, executes tests against your local environment. The Anthropic API is called only for model processing — code never leaves your machine.
Models: Opus 4.7 (1M context window, released 16 April 2026), Sonnet 4.6 (default), Haiku 4.5 (sub-agent delegation).
Agent Teams (research preview) is the major differentiator: spawn multiple sub-agents that share a task list with dependency tracking, pass messages, work in parallel on separate git worktrees. The parent agent owns planning + integration; specialist sub-agents handle bounded tasks.
The `CLAUDE.md` project memory file is unique to Claude Code — defines project conventions, tooling rules, and team preferences that persist across every session.
The architectural difference that changes the business answer
| Dimension | OpenAI Codex (GPT-5.5) | Claude Code (Opus 4.7) |
|---|---|---|
| Where does your code run | OpenAI cloud containers | Your local machine |
| Code stays on-device | ❌ No | ✅ Yes |
| Context window | 200K (1.05M long-context @ 2× billing) | 1M standard |
| MCP support | Yes (default since Dec 2025) | Yes (mature, full sub-agent inheritance) |
| Sub-agents | Independent parallel threads | Coordinated, shared task list + dependencies |
| Sandbox enforcement | Kernel-level (Seatbelt / Landlock) | Application-layer hooks (26 lifecycle events) |
| GitHub integration | Native (issue/PR trigger) | Requires setup |
| Persistent project context | `AGENTS.md` (per-profile) | `CLAUDE.md` (project-wide, hierarchical) |
| Token efficiency | Baseline | 3-4× more tokens per equivalent task |
| Generation speed | 1,000+ tok/s (Cerebras inference) | ~200 tok/s |
| Interaction style | Autonomous — does first, asks rarely | Collaborative — asks first, then executes |
| VS Code rating | 3.4/5 | 4.0/5 |
The big one: where the code runs. For non-sensitive open-source work, this barely matters. For UK businesses with proprietary code touching customer data, business logic, or IP, it matters a lot. Cloud-container execution sends your code to a US-based OpenAI infrastructure for processing; local execution keeps it on your machine.
Benchmark reality (May 2026)
SWE-bench Verified and SWE-bench Pro are different benchmark sets. Cross-comparing them is methodologically invalid — Anthropic typically reports Verified, OpenAI typically reports Verified, but the harder Pro benchmark is where the real production-quality signal lives.
| Benchmark | Codex (GPT-5.5) | Claude Code (Opus 4.7) | Winner |
|---|---|---|---|
| SWE-bench Verified | 88.7% | 87.6% | Codex (+1.1pp) |
| SWE-bench Pro | 58.6% | 64.3% | Claude Code (+5.7pp) |
| Terminal-Bench 2.0 | 82.7% | 65.4% | Codex (+17pp) |
| Blind code quality (community) | 33% win rate | 67% win rate | Claude Code |
How to read this in practice:
- SWE-bench Verified — isolated bug fixes, well-defined scope. Effectively parity.
- SWE-bench Pro — complex multi-file GitHub issue resolution. Claude Code leads materially.
- Terminal-Bench 2.0 — DevOps / CLI / scripting tasks. Codex leads by 17 points — a real gap for ops automation.
- Blind code quality — humans rating output without knowing which tool produced it. Claude Code wins 2:1.
Translation for an engineering lead: Codex is the better tool when the work is bounded, scriptable, and CI-shaped. Claude Code is the better tool when the work spans multiple files, needs architectural reasoning, or requires the output to look like senior-engineer code rather than pattern-matched code.
Pricing (May 2026, verified)
Claude Code via Anthropic plans
| Plan | Price | Includes Claude Code? | Real-world capacity |
|---|---|---|---|
| Pro | $20/mo (~£17) | ✅ Yes | ~44K tokens/5h window — hits limits fast on heavy work |
| Max 5x | $100/mo (~£85) | ✅ Yes (Opus 4.7 unlocked) | Full daily use without throttling |
| Max 20x | $200/mo (~£170) | ✅ Yes | Heavy use, parallel sessions, agent teams |
| Team Premium | $100/seat (annual) | ✅ Yes | Includes Claude Code; Team Standard at $25/seat does NOT |
Important pricing event: On 21 April 2026, Anthropic briefly removed Claude Code from the Pro plan for ~2% of new signups. Reversed within 24 hours after public backlash. Signals real pricing pressure on Claude Code's token-intensive sessions. The full context is in our Claude Code pricing 2026 guide.
Codex via ChatGPT plans
| Plan | Price | Includes Codex? | Real-world capacity |
|---|---|---|---|
| Plus | $20/mo (~£17) | ✅ Yes | 30-150 messages per 5h window |
| Pro ($100) | $100/mo | ✅ Yes | New mid-tier added April 2026 |
| Pro ($200) | $200/mo | ✅ Yes | 5× Plus usage |
| Business | $30/seat | ✅ Yes | Team access |
| Enterprise | Custom | ✅ Yes | Full enterprise terms |
API direct rates (per 1M tokens):
- Opus 4.7: $5 input / $25 output
- Sonnet 4.6: $3 input / $15 output
- GPT-5.5: $5 input / $30 output
Claude is 17% cheaper on output tokens vs GPT-5.5 — the per-call cost is comparable, but Claude Code uses 3-4× more tokens per equivalent task. Net effect: Codex is meaningfully cheaper at the per-task level.
For a typical UK SME engineering team, both tools land at £100-300/month per heavy-use developer. The cost differential is rarely the deciding factor; capability fit and trust signals are.
Data residency — the question your DPO will ask
This is the single biggest practical difference between the two tools for UK businesses. Most comparison articles skip it.
Codex's cloud-container execution model means:
- Your repository contents are cloned into OpenAI infrastructure
- Internet inside the container is disabled (good — reduces exfiltration risk)
- The container runs in OpenAI's regions (US-based by default)
- Your code is ephemeral but the prompt + diff + reasoning context are processed at OpenAI
For non-sensitive code (open source, marketing sites, public APIs) this is fine. For codebases that touch customer PII, financial data, health information, or proprietary algorithms — your DPO will want to know where the processing happens. OpenAI's DPA covers this; the answer is generally "compliant with appropriate contractual controls" but the conversation is non-trivial.
Claude Code's local execution model means:
- Your code never leaves your local machine
- Only the prompts (which can include code excerpts) go to Anthropic's API
- For UK data residency, route via AWS Bedrock (eu-west-2 London) — covered in our Claude Code vs Cursor comparison
For FCA-regulated firms, NHS suppliers, law firms, or healthcare businesses, Claude Code via Bedrock-eu-west-2 is currently the defensible answer. Codex's cloud-container model can be made defensible but requires more documentation effort with your DPO.
The decision tree for UK engineering teams
Three branches based on your team's actual situation, not the benchmark table.
Branch 1 — Speed and DevOps automation matter more than code quality
Choose Codex. The 17-point lead on Terminal-Bench 2.0 and the cheaper per-task cost make it the right pick when the work is scripting, CI/CD, deployment automation, or throwaway prototypes. Plus the cloud-container model is genuinely safer for arbitrary script execution — if Codex generates rm -rf it deletes a container, not your ~/.
Branch 2 — Multi-file production work in a UK business with sensitive data
Choose Claude Code. The 5.7pp lead on SWE-bench Pro + the local execution model make it the better fit for production engineering on real codebases. The CLAUDE.md pattern compounds — by month three, your team's deployment is doing work no fresh Codex session could match because the project context isn't there.
Branch 3 — Run both, route by task type
The Reddit-validated answer. Senior engineers use Claude Code for architecture, complex refactors, multi-file features. They use Codex for one-shot scripting tasks, CI scripts, and DevOps runbook automation. Total monthly spend: £200-300 across both subscriptions for a single developer — the productivity stack pays for itself in week one.
This is what we do at Ampliflow: Claude Code for client product engineering, Codex for internal automation scripts and CI cleanup work. The two tools rarely compete — they serve different parts of the workflow.
What none of the comparison articles say honestly
The token-cost gap is real. Claude Code generates 3-4× more tokens per equivalent task. At Pro plan limits, this is the difference between "8 hours of focused work before throttling" (Codex) and "3 hours of focused work before throttling" (Claude Code Pro). Heavy users need Max 5x or higher on Anthropic; the same pattern doesn't apply on OpenAI plans.
Cloud sandbox is a security feature when it benefits you. Codex's cloud-container isolation means a compromised prompt can't access your ~/.ssh/ or write to your codebase outside the cloned scope. Claude Code's local execution means a compromised prompt can — which is why the Claude Code skills governance pattern emphasises disable-model-invocation: true for any skill with side effects.
The CLAUDE.md is the moat. Six months in, your CLAUDE.md becomes the most valuable file in your project. Codex's AGENTS.md is similar but the ecosystem and patterns are less developed. If you switch from Claude Code to Codex after a year, you lose that compounded context — the migration cost is real.
Anthropic's pricing pressure is signal. The 21 April 2026 attempt to remove Claude Code from Pro tells you something about token economics under the hood. Plan for 20-30% price moves over the next year on either platform.
Frequently asked questions
Is Codex better than Claude Code?
Depends on the task. Codex wins on speed (5× faster generation), per-task cost (3-4× cheaper), and Terminal-Bench (DevOps work). Claude Code wins on multi-file production work (SWE-bench Pro +5.7pp), code quality (67% blind-rating win rate), and data residency for UK businesses with sensitive code.
Can I use both Codex and Claude Code on the same project?
Yes — they don't conflict. Many UK engineering teams run both: Claude Code for architecture and feature work, Codex for one-shot scripts and DevOps automation. Total monthly cost ~£200-300 per heavy developer; the productivity stack pays for itself.
Why does Claude Code use so many more tokens than Codex?
Claude Code reads more context per session (the whole CLAUDE.md, relevant files, often the test suite output), generates more thorough output (verbose explanations, multiple file edits, test additions), and runs more reasoning passes per response. The output is generally better; the cost per task is higher.
Is my code safe with Codex?
OpenAI runs your code in an isolated cloud container with internet disabled. The container is ephemeral. The DPA covers data handling. For non-regulated work this is fine. For UK businesses with sensitive code (financial, healthcare, legal, IP-sensitive), Claude Code's local execution model is generally easier to defend with a DPO.
Is Claude Code still in the £20 Pro plan?
Yes as of May 2026. Anthropic briefly removed it for ~2% of new signups on 21 April; reversed within 24 hours after backlash. Plan for the possibility that this changes — Claude Code's token intensity is creating real pricing pressure for Anthropic.
What does Codex's AGENTS.md do compared to CLAUDE.md?
Both files give the AI persistent project context. AGENTS.md is per-Codex-profile; CLAUDE.md is project-wide and hierarchical (parent + child files compose). Both establish coding conventions, tooling rules, forbidden patterns, etc. The Claude Code ecosystem around CLAUDE.md is more mature — more shared patterns, more tooling, more community examples.
Which has better GitHub integration?
Codex has native GitHub integration — open an issue, mention @codex, get a PR back. Claude Code has GitHub Actions v1.0 GA — open a PR, mention @claude, get a code review. Different workflows; both work well in the right pattern.
What about a UK business that's all-in on Microsoft 365 / Azure?
Codex's GPT-5.5 has the closest organic fit (OpenAI is Microsoft's strategic partner). Claude Code via Azure-mirror infrastructure is possible but requires more setup. For a Microsoft-heavy stack, Codex is the lower-friction default.
Related reading
- ↑ What is Claude Code? A UK Business Guide — the foundational pillar
- ↔ Claude Code vs Cursor for UK Businesses — the other major comparison your team is asking about
- ↔ Claude Code Pricing 2026 — Real Cost for UK Businesses — the deeper pricing analysis
- ↔ How to Install Claude Code — UK Business Guide — the install guide if Claude Code is your pick
- ↔ What is Hermes Agent? A UK Business Guide — the operational-automation companion to either coding tool
What should you do next?
Most UK engineering teams that have tried both end up running both, with senior engineers routing tasks by complexity. The cost is small relative to engineer salary; the productivity multiplier compounds.
See how Ampliflow uses Claude Code and Codex in production →
Or to scope your team's specific tooling decision — including the data-residency posture for your codebase — book a free working session.