Sonnet Code
The Sonnet Code Blog

Engineering notes from the field.

Essays and field notes on AI, software engineering, design, and the craft of building product teams that ship. Written by the engineers doing the work.

FeaturedAI & Machine Learning·May 6, 2026

97% Deployed, 23% See ROI: The Agent ROI Gap Is the Real Story of Q2 2026

Deloitte's State of AI in the Enterprise puts agent deployment at 97% of organizations, but only 23% report significant ROI from those agents and just one in five has a mature governance model for autonomous workflows. The gap isn't model capability — it's the integration, eval, and governance scaffolding that nobody buys and everyone needs.

Read the article
AI & Machine Learning6 min read

Claude Code's Plugin Ecosystem Crossed 9,000 — and the Integration Layer Quietly Became the Product

As of May 6 the Claude Code marketplace lists 4,200+ skills, 770+ MCP servers, 2,500+ marketplaces, and 9,000+ plugins total. Prismatic Skills shipped May 4 explicitly to make integration work — auth, multi-tenant deploy, webhooks, connectors — feel native inside Claude Code. The ecosystem has tipped: the IDE is the model's runtime, but the integration layer is now where the differentiation lives, and most of it ships outside Anthropic.

Sonnet Code Editorial Team · May 6, 2026
AI & Machine Learning7 min read

Ten Finance Agent Templates and a Microsoft 365 Bridge: Anthropic's Vertical Move and What It Standardizes

On May 5 Anthropic shipped ten reference agent templates for financial services and insurance — Pitch builder, Earnings reviewer, Month-end closer, KYC screener, and seven more — bundled with Excel/PowerPoint/Word/Outlook context-sharing and a Moody's data partnership. Read alongside Vals AI's 64.4% Finance Agent benchmark for Opus 4.7, the move is less about ten more agents and more about making the template — skills + connectors + subagents bundled together — the unit of vertical procurement.

Sonnet Code Editorial Team · May 6, 2026
AI & Machine Learning7 min read

200 Models, One Platform: Gemini Enterprise and the End of "Pick a Model" as a Procurement Question

At Google Cloud Next '26, Google folded Vertex AI into the Gemini Enterprise Agent Platform — one surface for 200+ models including direct access to Anthropic's Claude Opus, Sonnet, and Haiku. Multi-model is now the platform default. The integration question (which 46% of orgs still cite as their #1 deployment block) doesn't go away — it just moves up the stack to routing and governance.

Sonnet Code Editorial Team · May 5, 2026
AI & Machine Learning6 min read

Apple Ships Agentic Coding Natively in Xcode 26.3 — and Quietly Standardizes MCP for iOS

Xcode 26.3 added native integration for Anthropic's Claude Agent and OpenAI's Codex inside the IDE, plus exposed Xcode's own capabilities as an open Model Context Protocol server. Apple is the most conservative IDE vendor in the industry. When the conservative vendor ships agentic coding as a default surface, the category isn't experimental anymore — it's platform infrastructure.

Sonnet Code Editorial Team · May 5, 2026
AI & Machine Learning7 min read

Cursor 3 Reframes the IDE as an Agent Runtime — and the Scaffolding Bill Just Came Due

Cursor 3 replaced the Composer pane with a full-screen Agents Window built for parallel execution across local, worktree, SSH, and cloud sessions. Cursor 3.2 added /multitask sub-agents two weeks later. The IDE is no longer where a developer types — it's the runtime where a fleet of agents executes. The teams that win this cycle are the ones who already have a scaffold, a routing layer, and an eval suite to point at it.

Sonnet Code Editorial Team · May 5, 2026
AI & Machine Learning8 min read

Why Your Agent Looks Great on SWE-bench and Wobbles in Production: The 37% Lab-to-Prod Gap

A new analysis of 12 major agentic benchmarks found a 37% gap between lab scores and production deployment performance, with up to 50x cost variation for similar accuracy and validity issues affecting 7 of 10 widely-cited evals. The work that closes that gap is not bigger benchmarks — it's task-specific eval suites grounded in domain expertise.

Sonnet Code Editorial Team · May 4, 2026