Sonnet Code
The Sonnet Code Blog · Page 19

Engineering notes from the field.

Essays and field notes on AI, software engineering, design, and the craft of building product teams that ship. Written by the engineers doing the work.

AI & Machine Learning7 min read

OpenAI Workspace Agents Hit Credit Pricing on May 6 — and Custom GPTs Are Officially the Old Way

On May 6 OpenAI flipped Workspace Agents from free preview to credit-based pricing. The agents are Codex-powered, run in the cloud, and connect natively to Slack, Salesforce, Microsoft 365, Notion, Google Drive, and Atlassian. They're explicitly positioned as the successor to custom GPTs — shareable, governed, persistent, and capable of long-running work. The procurement question changes today: this is no longer a free experiment, and every team that built on GPTs has a migration path to draft.

Sonnet Code Editorial Team · May 8, 2026
AI & Machine Learning7 min read

Dreaming, Outcomes, Multiagent: Anthropic Just Productionized the Three Things Custom Agents Were Failing At

Anthropic shipped three features in Claude Managed Agents on May 6 — Dreaming (memory consolidation across sessions, research preview), Outcomes (rubric-graded loops where a separate grader scores against criteria you wrote), and Multiagent Orchestration (a lead agent fanning specialist subagents in parallel on a shared filesystem). Each one is a productionized answer to a failure mode every team running custom agents has been hitting for two quarters. The release isn't a feature drop; it's a stack-shape change.

Sonnet Code Editorial Team · May 8, 2026
AI & Machine Learning7 min read

From RLHF to Rubrics-as-Rewards: Why Domain-Expert Trainers Just Became the Bottleneck

The post-training stack quietly shifted in early 2026: RLHF is no longer the default, RLVR (Reinforcement Learning with Verifiable Rewards) is the new baseline for domains with checkable outcomes, and Rubrics-as-Rewards is closing the gap for everything else. The unifying thread is that the rubric — written by a senior domain expert, encoded as code — has replaced the preference label as the load-bearing artifact. The bottleneck moved from labelers to experts.

Sonnet Code Editorial Team · May 7, 2026
AI & Machine Learning7 min read

DeepSeek V4 at One-Sixth the Cost of Opus 4.7: Why Multi-Model Routing Is No Longer Optional

DeepSeek dropped V4 Pro on April 24 — a 1.6T-parameter MoE, MIT-licensed, priced at $1.74/M input and $3.48/M output, roughly one-sixth the cost of Claude Opus 4.7 and one-seventh the cost of GPT-5.5. Frontier-class quality on open weights at this price floor changes the unit economics of every production agent. The teams that don't have a routing layer in May 2026 are the ones leaving the most money on the table.

Sonnet Code Editorial Team · May 7, 2026
AI & Machine Learning7 min read

Cursor 3.3 Ships Context Breakdown, Security Review, and an SDK — the IDE Just Became a Programmable Surface

Cursor 3.3 landed yesterday with a context-usage breakdown across rules, skills, MCPs, and subagents; an always-on Security Reviewer + Vulnerability Scanner on every PR; and a public-beta SDK that lets teams launch Cursor's runtime from a few lines of TypeScript. The editor is no longer the surface that matters — it's the runtime, the policy plane, and the SDK underneath it.

Sonnet Code Editorial Team · May 7, 2026
AI & Machine Learning7 min read

97% Deployed, 23% See ROI: The Agent ROI Gap Is the Real Story of Q2 2026

Deloitte's State of AI in the Enterprise puts agent deployment at 97% of organizations, but only 23% report significant ROI from those agents and just one in five has a mature governance model for autonomous workflows. The gap isn't model capability — it's the integration, eval, and governance scaffolding that nobody buys and everyone needs.

Sonnet Code Editorial Team · May 6, 2026