AI & LLM

Core

LLM & AI

Frameworks

Platforms

Regulated

Consumer & Tech

Company

Resources

Careers Blog

The Sonnet Code Blog · Page 13

Engineering notes from the field.

Essays and field notes on AI, software engineering, design, and the craft of building product teams that ship. Written by the engineers doing the work.

AI Training9 min read

DeepSeek V4 Put Frontier-Class Coding in Open Weights — The Real Question Isn't Cost, It's Whether You Should Own the Model

DeepSeek V4 landed in April 2026 with open weights under an MIT license, 80.6% on SWE-bench Verified, and a price that's 5–55x cheaper than Western frontier models. The headline is cost. The more important story for serious teams is ownership: open weights mean you can fine-tune on your own data and self-host with no per-token API relationship — a genuine option for data-sensitive and compliance-heavy work that an API endpoint can never offer. But owning a model is a commitment, not a download. The weights are free; the post-training data, the evals, and the human judgment that make a fine-tune actually better than the base model are where the real cost and the real moat live.

Sonnet Code Editorial Team · May 26, 2026

AI Development8 min read

MCP Quietly Became the Integration Standard — and Tunnels Just Removed the Last Enterprise Objection

On May 19, 2026 the Model Context Protocol shipped tunnels and self-hosted sandboxes in the same week — the two features that let an agent reach data behind a corporate firewall without opening an inbound port, and run its tools inside an auditable box the security team controls. Combined with MCP's December 2025 move under the Linux Foundation and Forrester's call that 30% of enterprise app vendors will ship an MCP server this year, the protocol has gone from a clever Anthropic spec to the default way AI connects to the systems a company already runs. For teams trying to put AI into a real product, the integration question stopped being 'which framework' and became 'how do we expose our systems as well-governed MCP servers.'

Sonnet Code Editorial Team · May 26, 2026

AI Training9 min read

As Agents Get Faster and Cheaper, Human Ground Truth Got More Valuable, Not Less — The 2026 Verification Moat

The dominant 2026 narrative is that AI is automating its own training: RLAIF lets a stronger model grade a weaker one, synthetic data is everywhere, and the obvious conclusion is that human evaluators are on the way out. The evidence points the other way. Frontier labs found that training models on their own outputs creates feedback loops that amplify their own mistakes, RLAIF is hard-capped by the capability of the AI annotator, and the most-repeated line in the tooling press all year is that code is now generated faster than teams can verify it. Cheap, fast generation didn't reduce the need for human ground truth — it raised the price of being wrong at scale. The scarce resource in 2026 isn't model capability. It's expert verification, and the teams that own it own the moat.

Sonnet Code Editorial Team · May 25, 2026

AI & Machine Learning9 min read

Gemini 3.5 Flash Made Speed the Frontier — Why 289 Tokens a Second Changes Which Agents You Can Actually Ship

Google's Gemini 3.5 Flash, launched at I/O 2026, runs at roughly 289 output tokens per second — about four times faster than Opus 4.7 or GPT-5.5 — while reportedly beating last quarter's Gemini 3.1 Pro on most benchmarks. The headline is a speed number, but the consequence is architectural: a multi-step agent that reasons, calls a tool, observes, and tries again is a latency-multiplier, and below a certain tokens-per-second threshold those loops are simply too slow and too expensive to run in production. Speed just crossed that threshold for a class of agentic workloads that were demos last quarter. The catch is that 'fast and good enough' redraws your model-selection and lock-in decisions in ways the benchmark table doesn't show.

Sonnet Code Editorial Team · May 25, 2026

AI Development9 min read

Parallel Agents Became Table Stakes in a Single Fortnight — The New Engineering Skill Is Dispatch and Review, Not Prompting

Within the same two weeks of May 2026, Cursor 3.2 shipped /multitask, Zed 1.0 launched with parallel agents in the editor, and Antigravity 2.0 made dynamic subagents a core primitive. The convergence is the story: when three independent tools ship the same model-agnostic pattern in one fortnight, the single-agent, watch-it-type loop is genuinely over. The scarce skill stops being how well you prompt one agent and becomes how well you decompose work into parallel streams, dispatch them, and review the merge. That's an organizational change disguised as a feature release — and most teams are tooled for the world it just replaced.

Sonnet Code Editorial Team · May 25, 2026

AI Development9 min read

Coding Agents Got Governed by Default and Metered by the Task — The Verification and FinOps Layer Is 2026's Real Bottleneck

Two announcements that landed in the same week tell the whole story of where AI-assisted engineering actually is in mid-2026: ServiceNow's Build Agent now runs inside Cursor, Windsurf, Claude Code, and GitHub Copilot — "governed by default" — and GitHub confirmed Copilot moves to AI-Credits-based billing on June 1. The headline framing is "agents are everywhere now." The substance is one tier deeper: code is being generated faster than teams can verify it, Gartner still projects 90% of enterprise engineers on AI assistants by 2028, and the two scarce resources are no longer model access or seat licenses — they're review capacity and a credit budget nobody is governing. The teams that win the back half of 2026 aren't the ones with the most agents. They're the ones who built the verification gate and the cost-attribution layer before the credit meter started running.

Sonnet Code Editorial Team · May 24, 2026