Essays and field notes on AI, software engineering, design, and the craft of building product teams that ship. Written by the engineers doing the work.

On May 6 OpenAI flipped Workspace Agents from free preview to credit-based pricing. The agents are Codex-powered, run in the cloud, and connect natively to Slack, Salesforce, Microsoft 365, Notion, Google Drive, and Atlassian. They're explicitly positioned as the successor to custom GPTs — shareable, governed, persistent, and capable of long-running work. The procurement question changes today: this is no longer a free experiment, and every team that built on GPTs has a migration path to draft.

Anthropic shipped three features in Claude Managed Agents on May 6 — Dreaming (memory consolidation across sessions, research preview), Outcomes (rubric-graded loops where a separate grader scores against criteria you wrote), and Multiagent Orchestration (a lead agent fanning specialist subagents in parallel on a shared filesystem). Each one is a productionized answer to a failure mode every team running custom agents has been hitting for two quarters. The release isn't a feature drop; it's a stack-shape change.

The post-training stack quietly shifted in early 2026: RLHF is no longer the default, RLVR (Reinforcement Learning with Verifiable Rewards) is the new baseline for domains with checkable outcomes, and Rubrics-as-Rewards is closing the gap for everything else. The unifying thread is that the rubric — written by a senior domain expert, encoded as code — has replaced the preference label as the load-bearing artifact. The bottleneck moved from labelers to experts.

DeepSeek dropped V4 Pro on April 24 — a 1.6T-parameter MoE, MIT-licensed, priced at $1.74/M input and $3.48/M output, roughly one-sixth the cost of Claude Opus 4.7 and one-seventh the cost of GPT-5.5. Frontier-class quality on open weights at this price floor changes the unit economics of every production agent. The teams that don't have a routing layer in May 2026 are the ones leaving the most money on the table.

Cursor 3.3 landed yesterday with a context-usage breakdown across rules, skills, MCPs, and subagents; an always-on Security Reviewer + Vulnerability Scanner on every PR; and a public-beta SDK that lets teams launch Cursor's runtime from a few lines of TypeScript. The editor is no longer the surface that matters — it's the runtime, the policy plane, and the SDK underneath it.

Deloitte's State of AI in the Enterprise puts agent deployment at 97% of organizations, but only 23% report significant ROI from those agents and just one in five has a mature governance model for autonomous workflows. The gap isn't model capability — it's the integration, eval, and governance scaffolding that nobody buys and everyone needs.