Essays and field notes on AI, software engineering, design, and the craft of building product teams that ship. Written by the engineers doing the work.

DeepSeek V4 landed in April 2026 with open weights under an MIT license, 80.6% on SWE-bench Verified, and a price that's 5–55x cheaper than Western frontier models. The headline is cost. The more important story for serious teams is ownership: open weights mean you can fine-tune on your own data and self-host with no per-token API relationship — a genuine option for data-sensitive and compliance-heavy work that an API endpoint can never offer. But owning a model is a commitment, not a download. The weights are free; the post-training data, the evals, and the human judgment that make a fine-tune actually better than the base model are where the real cost and the real moat live.

On May 19, 2026 the Model Context Protocol shipped tunnels and self-hosted sandboxes in the same week — the two features that let an agent reach data behind a corporate firewall without opening an inbound port, and run its tools inside an auditable box the security team controls. Combined with MCP's December 2025 move under the Linux Foundation and Forrester's call that 30% of enterprise app vendors will ship an MCP server this year, the protocol has gone from a clever Anthropic spec to the default way AI connects to the systems a company already runs. For teams trying to put AI into a real product, the integration question stopped being 'which framework' and became 'how do we expose our systems as well-governed MCP servers.'

The dominant 2026 narrative is that AI is automating its own training: RLAIF lets a stronger model grade a weaker one, synthetic data is everywhere, and the obvious conclusion is that human evaluators are on the way out. The evidence points the other way. Frontier labs found that training models on their own outputs creates feedback loops that amplify their own mistakes, RLAIF is hard-capped by the capability of the AI annotator, and the most-repeated line in the tooling press all year is that code is now generated faster than teams can verify it. Cheap, fast generation didn't reduce the need for human ground truth — it raised the price of being wrong at scale. The scarce resource in 2026 isn't model capability. It's expert verification, and the teams that own it own the moat.

Google's Gemini 3.5 Flash, launched at I/O 2026, runs at roughly 289 output tokens per second — about four times faster than Opus 4.7 or GPT-5.5 — while reportedly beating last quarter's Gemini 3.1 Pro on most benchmarks. The headline is a speed number, but the consequence is architectural: a multi-step agent that reasons, calls a tool, observes, and tries again is a latency-multiplier, and below a certain tokens-per-second threshold those loops are simply too slow and too expensive to run in production. Speed just crossed that threshold for a class of agentic workloads that were demos last quarter. The catch is that 'fast and good enough' redraws your model-selection and lock-in decisions in ways the benchmark table doesn't show.

Within the same two weeks of May 2026, Cursor 3.2 shipped /multitask, Zed 1.0 launched with parallel agents in the editor, and Antigravity 2.0 made dynamic subagents a core primitive. The convergence is the story: when three independent tools ship the same model-agnostic pattern in one fortnight, the single-agent, watch-it-type loop is genuinely over. The scarce skill stops being how well you prompt one agent and becomes how well you decompose work into parallel streams, dispatch them, and review the merge. That's an organizational change disguised as a feature release — and most teams are tooled for the world it just replaced.

Two announcements that landed in the same week tell the whole story of where AI-assisted engineering actually is in mid-2026: ServiceNow's Build Agent now runs inside Cursor, Windsurf, Claude Code, and GitHub Copilot — "governed by default" — and GitHub confirmed Copilot moves to AI-Credits-based billing on June 1. The headline framing is "agents are everywhere now." The substance is one tier deeper: code is being generated faster than teams can verify it, Gartner still projects 90% of enterprise engineers on AI assistants by 2028, and the two scarce resources are no longer model access or seat licenses — they're review capacity and a credit budget nobody is governing. The teams that win the back half of 2026 aren't the ones with the most agents. They're the ones who built the verification gate and the cost-attribution layer before the credit meter started running.