Lineage — Claude Sonnet 4.6

The Sonnet Family · 2024–2026

From First to Best

Every step in the journey, with the milestone that defined each release.

OCTOBER 2024

Claude Sonnet 3.5

14.9% OSWorld

History was made. Anthropic released the world's first general-purpose computer-using AI model. The score was humble — and the company was honest about it, calling the capability "still experimental — at times cumbersome and error-prone." But a new era began here.

First computer use model General purpose

FEBRUARY 2025

Claude Sonnet 3.7

28.0% OSWorld

The first proof that computer use wasn't a one-time trick — it was improvable. +13.1 percentage points in four months. Nearly doubled the starting score. The trajectory was unmistakable.

+13.1pp gain Confirmed trajectory

JUNE 2025

Claude Sonnet 4

42.2% OSWorld

Crossed 40%. This was the threshold where computer use started becoming genuinely useful for routine office automation. Enterprise adoption began accelerating. Another +14.2pp in four months.

+14.2pp gain Enterprise adoption begins

OCTOBER 2025

Claude Sonnet 4.5

61.4% OSWorld-Verified

The methodology upgrade arrived with Sonnet 4.5 — OSWorld-Verified, with stricter task quality and grading. Despite the harder benchmark, the model scored 61.4%. Early users began reporting human-level performance on specific tasks: spreadsheet navigation, web forms, multi-tab workflows.

OSWorld-Verified Human-level on specific tasks

FEBRUARY 17, 2026

Claude Sonnet 4.6 ✦ NOW

72.5% OSWorld-Verified

The culmination of sixteen months of relentless improvement. Computer use nearly 5× the starting score. Coding at 79.6% SWE-bench. Math at 89%. Matching Opus 4.6 on OfficeQA. A 1M token context window. And the same price as the model it replaced.

~5× from start Opus-level intelligence Same price

OSWorld Score Growth — Cumulative

Percentage point gains per release

Sonnet 3.5 → 3.7 +13.1pp

Sonnet 3.7 → S4 +14.2pp

S4 → Sonnet 4.5 +19.2pp*

Sonnet 4.5 → 4.6 +11.1pp

* Partially affected by OSWorld → OSWorld-Verified methodology change

📐

The Methodology Upgrade

OSWorld-Verified, released July 2025, upgraded the original benchmark with better task quality, improved grading, and updated infrastructure. Sonnet 4.5 and 4.6 use this harder version — meaning their scores are held to a stricter standard than earlier models.

⏱️

The Pace of Change

Five major releases in sixteen months. The Sonnet line went from being called "experimental" to achieving 72.5% on one of the hardest AI benchmarks in existence. This rate of improvement — roughly 3.6 percentage points per month — has no precedent in the history of AI benchmarking.

Claude 4.6 Family

Where Sonnet Sits Today

Three tiers. Different strengths. Understanding which to reach for.

CLAUDE HAIKU 4.5

Speed & Volume

The fastest and most cost-efficient model in the family. Built for high-volume tasks where throughput matters more than depth.

High-volume queries Real-time applications Cost-sensitive pipelines Simple classification

CLAUDE SONNET 4.6 ✦ DEFAULT

The Daily Driver

The model for the vast majority of real work. Coding, computer use, document reasoning, agent workflows, design — at $3/$15 per million tokens.

Default for Free + Pro plans. No change needed.

Coding Computer use Document analysis Agent workflows Frontend design Financial analysis

CLAUDE OPUS 4.6

Maximum Depth

The deepest reasoning. The highest precision. For tasks where getting it exactly right matters more than cost or speed. $5/$25 per million tokens.

Codebase refactoring Multi-agent coordination Terminal-Bench 2.0 Humanity's Last Exam Precision-critical work

Claude 4.6 Family — Pricing Comparison

Cost per million tokens across the model tier

Sonnet 4.6 pricing unchanged from Sonnet 4.5. Haiku pricing approximate. Source: claude.com/pricing

Task Type	Recommended Model	Why
Everyday coding, bug fixes, code review	Sonnet 4.6	79.6% SWE-bench; 70% dev preference vs predecessor
Computer use / UI automation	Sonnet 4.6	72.5% OSWorld; 94% on insurance tasks
Enterprise document analysis	Sonnet 4.6	Matches Opus 4.6 on OfficeQA
Multi-step agentic workflows	Sonnet 4.6	Adaptive thinking; improved orchestration evals
Full codebase refactoring	Opus 4.6	Opus retains top spot on Terminal-Bench 2.0
Coordinating multiple AI agents	Opus 4.6	Deepest reasoning for coordination complexity
High-volume classification / tagging	Haiku 4.5	Fastest, most cost-efficient for volume tasks
Real-time API responses	Haiku 4.5	Speed-optimized