Claude Code25 March 2026Updated 14 May 202612 min read

Claude Code vs Codex: Anthropic vs OpenAI Agentic Coding (2026)

Sajad Saleem

Co-founder of Ampliflow. Builds AI automation, websites, SEO/AEO, and growth systems for UK SMEs.

Claude Code vs Codex: Anthropic vs OpenAI Agentic Coding (2026)

Codex writes faster, cheaper code. Claude Code writes higher-quality, more contextually-aware code. Your job is knowing which one your team actually needs — and when the right answer is using both. This is the comparison no developer-press article frames properly: not benchmark-by-benchmark, but the business decision a UK SME engineering lead actually has to make in 2026. Where does the code run? Who pays per token? What does your DPO think? And — most underrated — which one matches the way your team actually ships software?

Last updated: May 2026 · Covers Claude Code v2 (Opus 4.7 / Sonnet 4.6 / Haiku 4.5) + OpenAI Codex CLI v0.130 (GPT-5.5)

TL;DR:

Codex's GPT-5.5 leads SWE-bench Verified by 1.1pp (88.7% vs 87.6%) — effectively parity at the top
Claude Code (Opus 4.7) leads SWE-bench Pro by 5.7pp on multi-file complex work — meaningful for production-scale
Codex runs your code in OpenAI's cloud containers; Claude Code runs locally — material for UK GDPR / IP-sensitive codebases
Codex uses 3-4× fewer tokens per equivalent task; Claude Code generates more thorough output
The Reddit-validated production answer: use both — Claude Code for architecture and complex multi-file work, Codex for autonomous DevOps and CI tasks

What each tool actually is

OpenAI Codex (May 2026)

The current OpenAI Codex is CLI v0.130.0 (released 8 May 2026) — written in Rust, Apache-2.0 licensed, installable via npm install -g @openai/codex or brew install --cask codex. Also ships as a macOS desktop app and IDE extensions for VS Code, Cursor, and Windsurf.

The default model is GPT-5.5 (launched 23 April 2026), available across ChatGPT Plus, Pro, Business, and Enterprise plans. The defining architectural choice: tasks run in isolated cloud containers. Codex clones your repo into an OpenAI-managed container, disables internet inside it, runs the task autonomously, and returns a diff. The developer reviews the result before merge.

Sub-agents exist but run as independent parallel threads with no shared task list. MCP support is mature — added as default in December 2025. Native GitHub integration: triggered directly from issues/PRs.

Claude Code (May 2026)

Claude Code v2 — installable via npm install -g @anthropic-ai/claude-code. The defining architectural choice: local-first execution. Reads your actual filesystem, runs real shell commands, edits files in place, executes tests against your local environment. The Anthropic API is called only for model processing — code never leaves your machine.

Models: Opus 4.7 (1M context window, released 16 April 2026), Sonnet 4.6 (default), Haiku 4.5 (sub-agent delegation).

Agent Teams (research preview) is the major differentiator: spawn multiple sub-agents that share a task list with dependency tracking, pass messages, work in parallel on separate git worktrees. The parent agent owns planning + integration; specialist sub-agents handle bounded tasks.

The `CLAUDE.md` project memory file is unique to Claude Code — defines project conventions, tooling rules, and team preferences that persist across every session.

The architectural difference that changes the business answer

Dimension	OpenAI Codex (GPT-5.5)	Claude Code (Opus 4.7)
Where does your code run	OpenAI cloud containers	Your local machine
Code stays on-device	❌ No	✅ Yes
Context window	200K (1.05M long-context @ 2× billing)	1M standard
MCP support	Yes (default since Dec 2025)	Yes (mature, full sub-agent inheritance)
Sub-agents	Independent parallel threads	Coordinated, shared task list + dependencies
Sandbox enforcement	Kernel-level (Seatbelt / Landlock)	Application-layer hooks (26 lifecycle events)
GitHub integration	Native (issue/PR trigger)	Requires setup
Persistent project context	`AGENTS.md` (per-profile)	`CLAUDE.md` (project-wide, hierarchical)
Token efficiency	Baseline	3-4× more tokens per equivalent task
Generation speed	1,000+ tok/s (Cerebras inference)	~200 tok/s
Interaction style	Autonomous — does first, asks rarely	Collaborative — asks first, then executes
VS Code rating	3.4/5	4.0/5

The big one: where the code runs. For non-sensitive open-source work, this barely matters. For UK businesses with proprietary code touching customer data, business logic, or IP, it matters a lot. Cloud-container execution sends your code to a US-based OpenAI infrastructure for processing; local execution keeps it on your machine.

Benchmark reality (May 2026)

SWE-bench Verified and SWE-bench Pro are different benchmark sets. Cross-comparing them is methodologically invalid — Anthropic typically reports Verified, OpenAI typically reports Verified, but the harder Pro benchmark is where the real production-quality signal lives.

Benchmark	Codex (GPT-5.5)	Claude Code (Opus 4.7)	Winner
SWE-bench Verified	88.7%	87.6%	Codex (+1.1pp)
SWE-bench Pro	58.6%	64.3%	Claude Code (+5.7pp)
Terminal-Bench 2.0	82.7%	65.4%	Codex (+17pp)
Blind code quality (community)	33% win rate	67% win rate	Claude Code

How to read this in practice:

SWE-bench Verified — isolated bug fixes, well-defined scope. Effectively parity.
SWE-bench Pro — complex multi-file GitHub issue resolution. Claude Code leads materially.
Terminal-Bench 2.0 — DevOps / CLI / scripting tasks. Codex leads by 17 points — a real gap for ops automation.
Blind code quality — humans rating output without knowing which tool produced it. Claude Code wins 2:1.

Translation for an engineering lead: Codex is the better tool when the work is bounded, scriptable, and CI-shaped. Claude Code is the better tool when the work spans multiple files, needs architectural reasoning, or requires the output to look like senior-engineer code rather than pattern-matched code.

Pricing (May 2026, verified)

Claude Code via Anthropic plans

Plan	Price	Includes Claude Code?	Real-world capacity
Pro	$20/mo (~£17)	✅ Yes	~44K tokens/5h window — hits limits fast on heavy work
Max 5x	$100/mo (~£85)	✅ Yes (Opus 4.7 unlocked)	Full daily use without throttling
Max 20x	$200/mo (~£170)	✅ Yes	Heavy use, parallel sessions, agent teams
Team Premium	$100/seat (annual)	✅ Yes	Includes Claude Code; Team Standard at $25/seat does NOT

Important pricing event: On 21 April 2026, Anthropic briefly removed Claude Code from the Pro plan for ~2% of new signups. Reversed within 24 hours after public backlash. Signals real pricing pressure on Claude Code's token-intensive sessions. The full context is in our Claude Code pricing 2026 guide.

Codex via ChatGPT plans

Plan	Price	Includes Codex?	Real-world capacity
Plus	$20/mo (~£17)	✅ Yes	30-150 messages per 5h window
Pro ($100)	$100/mo	✅ Yes	New mid-tier added April 2026
Pro ($200)	$200/mo	✅ Yes	5× Plus usage
Business	$30/seat	✅ Yes	Team access
Enterprise	Custom	✅ Yes	Full enterprise terms

API direct rates (per 1M tokens):

Opus 4.7: $5 input / $25 output
Sonnet 4.6: $3 input / $15 output
GPT-5.5: $5 input / $30 output

Claude is 17% cheaper on output tokens vs GPT-5.5 — the per-call cost is comparable, but Claude Code uses 3-4× more tokens per equivalent task. Net effect: Codex is meaningfully cheaper at the per-task level.

For a typical UK SME engineering team, both tools land at £100-300/month per heavy-use developer. The cost differential is rarely the deciding factor; capability fit and trust signals are.

Data residency — the question your DPO will ask

This is the single biggest practical difference between the two tools for UK businesses. Most comparison articles skip it.

Codex's cloud-container execution model means:

Your repository contents are cloned into OpenAI infrastructure
Internet inside the container is disabled (good — reduces exfiltration risk)
The container runs in OpenAI's regions (US-based by default)
Your code is ephemeral but the prompt + diff + reasoning context are processed at OpenAI

For non-sensitive code (open source, marketing sites, public APIs) this is fine. For codebases that touch customer PII, financial data, health information, or proprietary algorithms — your DPO will want to know where the processing happens. OpenAI's DPA covers this; the answer is generally "compliant with appropriate contractual controls" but the conversation is non-trivial.

Claude Code's local execution model means:

Your code never leaves your local machine
Only the prompts (which can include code excerpts) go to Anthropic's API
For UK data residency, route via AWS Bedrock (eu-west-2 London) — covered in our Claude Code vs Cursor comparison

For FCA-regulated firms, NHS suppliers, law firms, or healthcare businesses, Claude Code via Bedrock-eu-west-2 is currently the defensible answer. Codex's cloud-container model can be made defensible but requires more documentation effort with your DPO.

The decision tree for UK engineering teams

Three branches based on your team's actual situation, not the benchmark table.

Branch 1 — Speed and DevOps automation matter more than code quality

Choose Codex. The 17-point lead on Terminal-Bench 2.0 and the cheaper per-task cost make it the right pick when the work is scripting, CI/CD, deployment automation, or throwaway prototypes. Plus the cloud-container model is genuinely safer for arbitrary script execution — if Codex generates rm -rf it deletes a container, not your ~/.

Branch 2 — Multi-file production work in a UK business with sensitive data

Choose Claude Code. The 5.7pp lead on SWE-bench Pro + the local execution model make it the better fit for production engineering on real codebases. The CLAUDE.md pattern compounds — by month three, your team's deployment is doing work no fresh Codex session could match because the project context isn't there.

Branch 3 — Run both, route by task type

The Reddit-validated answer. Senior engineers use Claude Code for architecture, complex refactors, multi-file features. They use Codex for one-shot scripting tasks, CI scripts, and DevOps runbook automation. Total monthly spend: £200-300 across both subscriptions for a single developer — the productivity stack pays for itself in week one.

This is what we do at Ampliflow: Claude Code for client product engineering, Codex for internal automation scripts and CI cleanup work. The two tools rarely compete — they serve different parts of the workflow.

What none of the comparison articles say honestly

The token-cost gap is real. Claude Code generates 3-4× more tokens per equivalent task. At Pro plan limits, this is the difference between "8 hours of focused work before throttling" (Codex) and "3 hours of focused work before throttling" (Claude Code Pro). Heavy users need Max 5x or higher on Anthropic; the same pattern doesn't apply on OpenAI plans.

Cloud sandbox is a security feature when it benefits you. Codex's cloud-container isolation means a compromised prompt can't access your ~/.ssh/ or write to your codebase outside the cloned scope. Claude Code's local execution means a compromised prompt can — which is why the Claude Code skills governance pattern emphasises disable-model-invocation: true for any skill with side effects.

The CLAUDE.md is the moat. Six months in, your CLAUDE.md becomes the most valuable file in your project. Codex's AGENTS.md is similar but the ecosystem and patterns are less developed. If you switch from Claude Code to Codex after a year, you lose that compounded context — the migration cost is real.

Anthropic's pricing pressure is signal. The 21 April 2026 attempt to remove Claude Code from Pro tells you something about token economics under the hood. Plan for 20-30% price moves over the next year on either platform.

Frequently asked questions

Is Codex better than Claude Code?

Depends on the task. Codex wins on speed (5× faster generation), per-task cost (3-4× cheaper), and Terminal-Bench (DevOps work). Claude Code wins on multi-file production work (SWE-bench Pro +5.7pp), code quality (67% blind-rating win rate), and data residency for UK businesses with sensitive code.

Can I use both Codex and Claude Code on the same project?

Yes — they don't conflict. Many UK engineering teams run both: Claude Code for architecture and feature work, Codex for one-shot scripts and DevOps automation. Total monthly cost ~£200-300 per heavy developer; the productivity stack pays for itself.

Why does Claude Code use so many more tokens than Codex?

Claude Code reads more context per session (the whole CLAUDE.md, relevant files, often the test suite output), generates more thorough output (verbose explanations, multiple file edits, test additions), and runs more reasoning passes per response. The output is generally better; the cost per task is higher.

Is my code safe with Codex?

OpenAI runs your code in an isolated cloud container with internet disabled. The container is ephemeral. The DPA covers data handling. For non-regulated work this is fine. For UK businesses with sensitive code (financial, healthcare, legal, IP-sensitive), Claude Code's local execution model is generally easier to defend with a DPO.

Is Claude Code still in the £20 Pro plan?

Yes as of May 2026. Anthropic briefly removed it for ~2% of new signups on 21 April; reversed within 24 hours after backlash. Plan for the possibility that this changes — Claude Code's token intensity is creating real pricing pressure for Anthropic.

What does Codex's `AGENTS.md` do compared to `CLAUDE.md`?

Both files give the AI persistent project context. AGENTS.md is per-Codex-profile; CLAUDE.md is project-wide and hierarchical (parent + child files compose). Both establish coding conventions, tooling rules, forbidden patterns, etc. The Claude Code ecosystem around CLAUDE.md is more mature — more shared patterns, more tooling, more community examples.

Which has better GitHub integration?

Codex has native GitHub integration — open an issue, mention @codex, get a PR back. Claude Code has GitHub Actions v1.0 GA — open a PR, mention @claude, get a code review. Different workflows; both work well in the right pattern.

What about a UK business that's all-in on Microsoft 365 / Azure?

Codex's GPT-5.5 has the closest organic fit (OpenAI is Microsoft's strategic partner). Claude Code via Azure-mirror infrastructure is possible but requires more setup. For a Microsoft-heavy stack, Codex is the lower-friction default.

↑ What is Claude Code? A UK Business Guide — the foundational pillar
↔ Claude Code vs Cursor for UK Businesses — the other major comparison your team is asking about
↔ Claude Code Pricing 2026 — Real Cost for UK Businesses — the deeper pricing analysis
↔ How to Install Claude Code — UK Business Guide — the install guide if Claude Code is your pick
↔ What is Hermes Agent? A UK Business Guide — the operational-automation companion to either coding tool

What should you do next?

Most UK engineering teams that have tried both end up running both, with senior engineers routing tasks by complexity. The cost is small relative to engineer salary; the productivity multiplier compounds.

See how Ampliflow uses Claude Code and Codex in production →

Or to scope your team's specific tooling decision — including the data-residency posture for your codebase — book a free working session.

Book a free working session →

Back to Read

Claude Code25 March 2026Updated 14 May 202612 min read

Claude Code vs Codex: Anthropic vs OpenAI Agentic Coding (2026)

Sajad Saleem

Co-founder of Ampliflow. Builds AI automation, websites, SEO/AEO, and growth systems for UK SMEs.

Last updated: May 2026 · Covers Claude Code v2 (Opus 4.7 / Sonnet 4.6 / Haiku 4.5) + OpenAI Codex CLI v0.130 (GPT-5.5)

TL;DR:

Codex's GPT-5.5 leads SWE-bench Verified by 1.1pp (88.7% vs 87.6%) — effectively parity at the top
Claude Code (Opus 4.7) leads SWE-bench Pro by 5.7pp on multi-file complex work — meaningful for production-scale
Codex runs your code in OpenAI's cloud containers; Claude Code runs locally — material for UK GDPR / IP-sensitive codebases
Codex uses 3-4× fewer tokens per equivalent task; Claude Code generates more thorough output
The Reddit-validated production answer: use both — Claude Code for architecture and complex multi-file work, Codex for autonomous DevOps and CI tasks

What each tool actually is

OpenAI Codex (May 2026)

Claude Code (May 2026)

Models: Opus 4.7 (1M context window, released 16 April 2026), Sonnet 4.6 (default), Haiku 4.5 (sub-agent delegation).

The `CLAUDE.md` project memory file is unique to Claude Code — defines project conventions, tooling rules, and team preferences that persist across every session.

The architectural difference that changes the business answer

Dimension	OpenAI Codex (GPT-5.5)	Claude Code (Opus 4.7)
Where does your code run	OpenAI cloud containers	Your local machine
Code stays on-device	❌ No	✅ Yes
Context window	200K (1.05M long-context @ 2× billing)	1M standard
MCP support	Yes (default since Dec 2025)	Yes (mature, full sub-agent inheritance)
Sub-agents	Independent parallel threads	Coordinated, shared task list + dependencies
Sandbox enforcement	Kernel-level (Seatbelt / Landlock)	Application-layer hooks (26 lifecycle events)
GitHub integration	Native (issue/PR trigger)	Requires setup
Persistent project context	`AGENTS.md` (per-profile)	`CLAUDE.md` (project-wide, hierarchical)
Token efficiency	Baseline	3-4× more tokens per equivalent task
Generation speed	1,000+ tok/s (Cerebras inference)	~200 tok/s
Interaction style	Autonomous — does first, asks rarely	Collaborative — asks first, then executes
VS Code rating	3.4/5	4.0/5

Benchmark reality (May 2026)

Benchmark	Codex (GPT-5.5)	Claude Code (Opus 4.7)	Winner
SWE-bench Verified	88.7%	87.6%	Codex (+1.1pp)
SWE-bench Pro	58.6%	64.3%	Claude Code (+5.7pp)
Terminal-Bench 2.0	82.7%	65.4%	Codex (+17pp)
Blind code quality (community)	33% win rate	67% win rate	Claude Code

How to read this in practice:

SWE-bench Verified — isolated bug fixes, well-defined scope. Effectively parity.
SWE-bench Pro — complex multi-file GitHub issue resolution. Claude Code leads materially.
Terminal-Bench 2.0 — DevOps / CLI / scripting tasks. Codex leads by 17 points — a real gap for ops automation.
Blind code quality — humans rating output without knowing which tool produced it. Claude Code wins 2:1.

Pricing (May 2026, verified)

Claude Code via Anthropic plans

Plan	Price	Includes Claude Code?	Real-world capacity
Pro	$20/mo (~£17)	✅ Yes	~44K tokens/5h window — hits limits fast on heavy work
Max 5x	$100/mo (~£85)	✅ Yes (Opus 4.7 unlocked)	Full daily use without throttling
Max 20x	$200/mo (~£170)	✅ Yes	Heavy use, parallel sessions, agent teams
Team Premium	$100/seat (annual)	✅ Yes	Includes Claude Code; Team Standard at $25/seat does NOT

Codex via ChatGPT plans

Plan	Price	Includes Codex?	Real-world capacity
Plus	$20/mo (~£17)	✅ Yes	30-150 messages per 5h window
Pro ($100)	$100/mo	✅ Yes	New mid-tier added April 2026
Pro ($200)	$200/mo	✅ Yes	5× Plus usage
Business	$30/seat	✅ Yes	Team access
Enterprise	Custom	✅ Yes	Full enterprise terms

API direct rates (per 1M tokens):

Opus 4.7: $5 input / $25 output
Sonnet 4.6: $3 input / $15 output
GPT-5.5: $5 input / $30 output

For a typical UK SME engineering team, both tools land at £100-300/month per heavy-use developer. The cost differential is rarely the deciding factor; capability fit and trust signals are.

Data residency — the question your DPO will ask

This is the single biggest practical difference between the two tools for UK businesses. Most comparison articles skip it.

Codex's cloud-container execution model means:

Your repository contents are cloned into OpenAI infrastructure
Internet inside the container is disabled (good — reduces exfiltration risk)
The container runs in OpenAI's regions (US-based by default)
Your code is ephemeral but the prompt + diff + reasoning context are processed at OpenAI

Claude Code's local execution model means:

Your code never leaves your local machine
Only the prompts (which can include code excerpts) go to Anthropic's API
For UK data residency, route via AWS Bedrock (eu-west-2 London) — covered in our Claude Code vs Cursor comparison

The decision tree for UK engineering teams

Three branches based on your team's actual situation, not the benchmark table.

Branch 1 — Speed and DevOps automation matter more than code quality

Branch 2 — Multi-file production work in a UK business with sensitive data

Branch 3 — Run both, route by task type

What none of the comparison articles say honestly

Frequently asked questions

Is Codex better than Claude Code?

Can I use both Codex and Claude Code on the same project?

Why does Claude Code use so many more tokens than Codex?

Is my code safe with Codex?

Is Claude Code still in the £20 Pro plan?

What does Codex's `AGENTS.md` do compared to `CLAUDE.md`?

Which has better GitHub integration?

What about a UK business that's all-in on Microsoft 365 / Azure?

↑ What is Claude Code? A UK Business Guide — the foundational pillar
↔ Claude Code vs Cursor for UK Businesses — the other major comparison your team is asking about
↔ Claude Code Pricing 2026 — Real Cost for UK Businesses — the deeper pricing analysis
↔ How to Install Claude Code — UK Business Guide — the install guide if Claude Code is your pick
↔ What is Hermes Agent? A UK Business Guide — the operational-automation companion to either coding tool

What should you do next?

See how Ampliflow uses Claude Code and Codex in production →

Or to scope your team's specific tooling decision — including the data-residency posture for your codebase — book a free working session.

Book a free working session →

Claude Code vs Codex: Anthropic vs OpenAI Agentic Coding (2026)

What each tool actually is

OpenAI Codex (May 2026)

Claude Code (May 2026)

The architectural difference that changes the business answer

Benchmark reality (May 2026)

Pricing (May 2026, verified)

Claude Code via Anthropic plans

Codex via ChatGPT plans

Data residency — the question your DPO will ask

The decision tree for UK engineering teams

Branch 1 — Speed and DevOps automation matter more than code quality

Branch 2 — Multi-file production work in a UK business with sensitive data

Branch 3 — Run both, route by task type

What none of the comparison articles say honestly

Frequently asked questions

Is Codex better than Claude Code?

Can I use both Codex and Claude Code on the same project?

Why does Claude Code use so many more tokens than Codex?

Is my code safe with Codex?

Is Claude Code still in the £20 Pro plan?

What does Codex's AGENTS.md do compared to CLAUDE.md?

Which has better GitHub integration?

What about a UK business that's all-in on Microsoft 365 / Azure?

Related reading

What should you do next?

We'll build and run the agent for you

Claude Code vs Codex: Anthropic vs OpenAI Agentic Coding (2026)

What each tool actually is

OpenAI Codex (May 2026)

Claude Code (May 2026)

The architectural difference that changes the business answer

Benchmark reality (May 2026)

Pricing (May 2026, verified)

Claude Code via Anthropic plans

Codex via ChatGPT plans

Data residency — the question your DPO will ask

The decision tree for UK engineering teams

Branch 1 — Speed and DevOps automation matter more than code quality

Branch 2 — Multi-file production work in a UK business with sensitive data

Branch 3 — Run both, route by task type

What none of the comparison articles say honestly

Frequently asked questions

Is Codex better than Claude Code?

Can I use both Codex and Claude Code on the same project?

Why does Claude Code use so many more tokens than Codex?

Is my code safe with Codex?

Is Claude Code still in the £20 Pro plan?

What does Codex's AGENTS.md do compared to CLAUDE.md?

Which has better GitHub integration?

What about a UK business that's all-in on Microsoft 365 / Azure?

Related reading

What should you do next?

We'll build and run the agent for you

What does Codex's `AGENTS.md` do compared to `CLAUDE.md`?

What does Codex's `AGENTS.md` do compared to `CLAUDE.md`?