Claude Opus 4.7: Anthropic's software engineering flagship gets sharper
- Bastien
- 17 Apr, 2026
What is Claude Opus 4.7
On April 16, 2026, Anthropic released Claude Opus 4.7 — a targeted upgrade to its flagship model focused on one theme: rigor in long-running software engineering work. Where Opus 4.6 was already strong on agentic coding, 4.7 pushes further by optimizing sustained reasoning, tightening instruction following, and adding a new xhigh effort level that sits between high and max.
Pricing is unchanged at $5 per million input tokens and $25 per million output tokens. The API identifier is claude-opus-4-7.
What’s new
Four shifts define this release:
xhigh effort level — a new intermediate setting between high and max. In Claude Code, xhigh is now the default for all plans, reflecting Anthropic’s view that harder coding tasks benefit from more deliberate reasoning budget.
Higher-resolution vision — images up to 2,576 pixels on the long edge (roughly 3.75 megapixels), a 3x+ increase over prior capability. This unlocks dense technical diagrams, high-resolution screenshots, and chemical structures.
Task budgets — a public beta feature that lets developers guide token spend on autonomous tasks, preventing runaway costs on long-horizon jobs.
/ultrareview in Claude Code — a dedicated slash command for thorough code review sessions. Pro and Max users get 3 free reviews.
Benchmark results
Software engineering and coding
| Benchmark | Opus 4.7 | Opus 4.6 |
|---|---|---|
| CursorBench | 70% | 58% |
| Rakuten-SWE-Bench (production tasks) | 3x resolution | baseline |
| CodeRabbit recall | +10% | baseline |
| Terminal Bench | passes tasks prior models failed | — |
The 12-point CursorBench jump (58% → 70%) is the headline metric. Rakuten-SWE-Bench — a production-realistic coding benchmark — shows 4.7 resolving three times more tasks than 4.6. On CodeRabbit, precision remains stable while recall improves by 10%+.
Finance and professional work
| Benchmark | Opus 4.7 | Opus 4.6 |
|---|---|---|
| General Finance module | 0.813 | 0.767 |
| GDPval-AA | state-of-the-art | — |
| Finance Agent | state-of-the-art | — |
| Harvey legal (high effort) | 90.9% | — |
Agentic and long-context
| Benchmark | Opus 4.7 | Detail |
|---|---|---|
| Research-agent | 0.715 | tied top score |
| Notion Agent | +14% | vs Opus 4.6, fewer token errors |
| Genspark Super Agent | best | quality-per-tool-call ratio measured |
Vision
| Benchmark | Opus 4.7 | Opus 4.6 |
|---|---|---|
| XBOW visual-acuity | 98.5% | 54.5% |
The XBOW jump from 54.5% to 98.5% is the single largest gain in the release — a direct consequence of the resolution increase and improved multimodal training.
What makes it different: rigor over breadth
Anthropic frames Opus 4.7 as less broadly capable than the Mythos Preview but more reliable on the tasks it is optimized for. This is an unusual positioning — explicitly choosing depth over generality.
The practical implication: 4.7 follows instructions more strictly than 4.6. Prompts tuned for 4.6 may produce different behavior on 4.7 and may need re-tuning. Anthropic also notes the model produces more output tokens at higher effort levels, trading cost for reliability.
The updated tokenizer maps input text with 1.0–1.35x variance relative to 4.6, which can affect token budgets on existing integrations.
Claude Code integration
Opus 4.7 ships with several Claude Code enhancements:
- Default effort raised to
xhighacross all plans /ultrareviewslash command for dedicated review sessions (3 free for Pro/Max)- Auto mode extended to Max users for autonomous decision-making
- Improved file-system memory across multi-session work
- Recommended starting effort for coding:
highorxhigh
For teams running Claude Code in production, the combination of xhigh default + improved long-horizon coherence means fewer mid-task derailments on hour-long autonomous runs.
Availability
Opus 4.7 is available across Anthropic’s full deployment surface:
- Anthropic API (
platform.claude.com) - Amazon Bedrock
- Google Cloud Vertex AI
- Microsoft Foundry
- All Claude products (Claude.ai, Claude Code, etc.)
Launched alongside: Claude Design (new Anthropic Labs product for collaborative visual design work) and the Cyber Verification Program for legitimate security researchers.
Safety and limitations
Anthropic describes 4.7 as “largely well-aligned and trustworthy, though not fully ideal.” Key notes:
- Improved: honesty, resistance to prompt injection
- Similar to 4.6: low rates of deception, sycophancy, misuse cooperation
- Weakness: overly detailed harm-reduction advice on controlled substances
- Intentional reduction: cyber capabilities are deliberately limited versus the Mythos Preview; automatic detection blocks prohibited high-risk cybersecurity requests
The Mythos Preview remains Anthropic’s best-aligned model, per their internal evaluations. Opus 4.7 is positioned as the production flagship — more reliable, more tightly scoped.
Conclusion
Claude Opus 4.7 is not a broad capability jump. It is a targeted refinement: better software engineering, better vision, stricter instruction following, better long-horizon coherence. For teams using Claude Code as an autonomous engineer, the CursorBench jump (58% → 70%) and 3x improvement on Rakuten-SWE-Bench translate directly into more tasks completed per session.
The new xhigh effort level and /ultrareview command are the developer-facing features most likely to change daily workflows. At unchanged pricing, 4.7 is a clear upgrade for coding workloads — with the caveat that prompts optimized for 4.6 may need re-tuning.
Announcement: anthropic.com/news/claude-opus-4-7 · API: claude-opus-4-7
Tags :
- AI
- Anthropic
- Claude
- Coding
- Claude Code
- Agentic