Sonnet Code
The Sonnet Code Blog · Page 2

Engineering notes from the field.

Essays and field notes on AI, software engineering, design, and the craft of building product teams that ship. Written by the engineers doing the work.

AI & Machine Learning6 min read

Surge AI at $25B: What the Data-Labeling Valuation Tells You About the AI Training Layer

Surge AI generates roughly $1.2B annually selling expert humans who write the data that trains frontier models — and just opened its first capital raise at a reported $25B valuation. Read against Anthropic's $30B run rate, the message is unambiguous: the human-in-the-loop layer is not commoditizing, and the premium isn't going away.

Sonnet Code Editorial Team · May 4, 2026
AI & Machine Learning7 min read

Live-SWE-agent at 79.2%: The Scaffold Just Caught Up to the Lab

An open-source scaffold from OpenAutoCoder hit 79.2% on SWE-bench Verified paired with Claude Opus 4.5 — 1.7 points behind Anthropic's own internal harness, and the leading open-source result on SWE-Bench Pro. The model layer is no longer where the alpha lives. The scaffold is.

Sonnet Code Editorial Team · May 4, 2026
AI & Machine Learning8 min read

Only 31% of Enterprises Have an Agent Eval Framework. The Other 69% Will Pay for It in Q3.

Adobe says 31% of enterprises have a measurement framework for agentic AI. LangChain says 57% have agents in production and 32% cite quality as the top deployment barrier. The math doesn't add up — and the gap is the most expensive line item on the next eighteen months of AI roadmaps.

Sonnet Code Editorial Team · May 3, 2026
AI & Machine Learning8 min read

DeepSeek V4: A Frontier-Class Open-Weight Model at One-Tenth the Output Cost

DeepSeek V4 Pro matches GPT-5.5 and Claude Opus 4.7 on most agentic benchmarks at 10–13x lower output cost, ships a 1M-token context, and lands under a standard MIT license. The open-weight tier just stopped being a fallback — and the routing playbook needs a fourth row.

Sonnet Code Editorial Team · May 3, 2026
AI & Machine Learning7 min read

Xcode 26.3 Brought Claude and Codex Into the IDE — Mobile Just Caught Up to the Web

Apple shipped agentic coding in Xcode 26.3: Claude Agent and OpenAI Codex run native, MCP is the open hook for everything else. iOS development just inherited the agentic stack the web has been quietly building for two years — and the strategic move is the MCP surface, not the partner logos.

Sonnet Code Editorial Team · May 3, 2026
AI & Machine Learning9 min read

Mistral Medium 3.5 and the Open-Weight Coding Tier That Can Actually Compete

On April 29, Mistral shipped Medium 3.5 — 128B dense, 256k context, 77.6% SWE-Bench Verified, modified MIT license — alongside Vibe remote agents that run coding sessions in the cloud. For the first time, the open-weight option in agentic coding is close enough to the closed frontier to change the routing playbook. Here's what to do about it this month.

Sonnet Code Editorial Team · May 2, 2026