AI & LLM

Core

LLM & AI

Frameworks

Platforms

Regulated

Consumer & Tech

Company

Resources

Careers Blog

The Sonnet Code Blog · Page 4

Engineering notes from the field.

Essays and field notes on AI, software engineering, design, and the craft of building product teams that ship. Written by the engineers doing the work.

AI Development10 min read

Two Terminal Coding Agents Shipped in Six Days — Moonshot's Kimi Code CLI on June 6 With an Open MIT TypeScript Subagent Architecture That Works Out of the Box Against Non-Moonshot Providers, and xAI's Grok Build Plugin Marketplace on June 11 With MongoDB, Vercel, Sentry, Chrome DevTools, Cloudflare, and Superpowers Plugins Each Bundling Skills, Slash Commands, Agents, Hooks, MCP Servers, and LSPs Under a Commit-SHA Pin Verified at Install Time. The CLI Coding Agent Layer Just Became a Multi-Vendor Ecosystem, the Plugin Surface Replaced the Single-Agent Lock-In That Defined the Category Through Q1, and the Procurement Conversation at the Terminal Layer Moves From 'Which Agent Do We Standardize On' to 'Which Plugin Surface Compounds Across the Routing Matrix the Engineering Org Already Runs.'

Moonshot AI shipped Kimi Code CLI on June 6, 2026 — an open MIT-licensed TypeScript terminal coding agent distributed via npm, with isolated coder/explore/plan subagents running in separated contexts, conversational MCP server configuration through `/mcp-config` rather than raw JSON, and out-of-the-box compatibility with non-Moonshot model providers. Five days later, on June 11, xAI shipped the Grok Build Plugin Marketplace in beta — MongoDB, Vercel, Sentry, Chrome DevTools, Cloudflare, and Superpowers plugins available at launch, each plugin bundling skills, slash commands, agents, hooks, MCP servers, and LSPs into one installable package, with every remote plugin pinned to a specific commit SHA that Grok Build verifies at install time. The structural read isn't 'two more CLI agents joined the cohort.' It's that the terminal coding agent layer consolidated as a multi-vendor ecosystem in a single week, that the plugin surface — bundled skills + slash commands + agents + hooks + MCP servers + LSPs with commit-SHA pinning — became the differentiating procurement object the buyer evaluates, that the single-agent lock-in story that defined the category through Q1 collapsed under the multi-vendor reality, and that the procurement conversation at the terminal layer moves from 'which agent do we standardize on' to 'which plugin surface compounds across the routing matrix the engineering org already runs at the IDE, the CI, and the agentic-workflow layer.' Here's what that does to the multi-vendor CLI agent routing matrix, the plugin-surface procurement object the marketplace dynamic surfaces, the operational discipline a commit-SHA-pinned plugin substrate enforces, and the senior-judgment work that turns the multi-CLI portfolio into compounding production capability.

Sonnet Code Editorial Team · June 14, 2026

AI Development10 min read

Zhipu Just Open-Sourced GLM-5.2 Under MIT With a Usable 1M-Token Context Window, a 744B-Parameter MoE Backbone, and Frontier-Class Coding Capability — The Sovereign In-Perimeter Coding Tier Now Has a Permanent Open-Weight Anchor That Lands on the Same Self-Hosted Substrate That Was Already Running Llama 4 and DeepSeek V4 Pro, the Long-Horizon Agentic Workloads That Need the Full Million-Token Context Can Run Against a Substrate the Buyer Owns End to End, and the Routing Portfolio That Was Carrying a Synthetic Open-Weight Allocation Through Q2 Has to Re-Score the Tier Against the New Ceiling Before the FY27 Budget Locks.

Zhipu AI released GLM-5.2 on June 13, 2026 — a 744-billion-parameter Mixture-of-Experts architecture with a usable 1-million-token context window, a 131,072-token maximum output, a dual thinking-effort system (High and Max), immediate availability for GLM Coding Plan users (Lite/Pro/Max/Team), and a public commitment to ship the weights, the standalone API, and the Z.ai chatbot under MIT License the following week. The structural read isn't 'another open-weight model dropped.' It's that the sovereign in-perimeter frontier coding tier now has a permanent open-weight anchor at 1M context with frontier-class capability, that the routing portfolio that was carrying a synthetic open-weight allocation against the aging Llama 4 and DeepSeek V4 Pro substrate has a third option to re-score the tier ceiling against, that the procurement conversation for buyers operating under data-residency or air-gap constraints moves from 'which closed flagship do we proxy through a compliance shim' to 'which open-weight tier do we self-host against the workload distribution the engineering org actually has,' and that the long-horizon agentic coding workloads that need the full 1M-context window — multi-repository refactors, full-codebase migrations, cross-module planning that internalizes the codebase across the run — can now run against a substrate the buyer owns end to end with no vendor licensing surface. Here's what that does to the open-weight routing tier, the self-hosting operational substrate the 1M-context window demands, the eval discipline that has to grade the open-weight ceiling honestly against the closed flagships on the customer's workload, and the senior-judgment work that turns the capability into compounding production capability.

Sonnet Code Editorial Team · June 14, 2026

AI Training10 min read

The RLHF and Human-in-the-Loop Training Market Just Resolved on a Permanent Demand Curve — $2.8B in 2025 Forecast to $18.6B by 2034, Each Frontier Lab Spending Approximately $1B per Year on Human-Generated Training Data, and 70% of Enterprise LLM Deployments Now Running Some Variant of RLHF / DPO / GRPO for Post-Training Alignment. The 'AI Training Engineer' Role Is the Senior-Pipeline Object the Hiring Conversation Will Resolve Against Through Q3 and the Sourcing Constraint That Will Bind FY27 Plans.

The RLHF and human-in-the-loop training market is forecast to grow from $2.8B in 2025 to $18.6B by 2034 — a sustained compounding shape that reflects the demand-side reality the post-training alignment conversation resolved against through 2025: each frontier lab is spending approximately $1B per year on human-generated training data, 70% of enterprise LLM deployments now run some variant of RLHF / DPO / GRPO for post-training alignment, RLHF specialists earn $50.25 to $64.97 per hour against a senior-engineering supply curve that has not gotten cheaper, and the 'AI Training Engineer' role consolidated as the senior-pipeline object the hiring conversation resolves against. The structural read isn't 'human-in-the-loop training is a growth market.' It's that the post-training alignment discipline became a non-discretionary infrastructure line for both the frontier labs and the enterprise deployments, that the hybrid DPO+GRPO stack displaced the pure PPO posture as the production-grade alignment surface, that the supply of senior judges with domain context and the calibration depth to ground the alignment loop is the binding constraint on the FY27 plans, and that the enterprise buyers who internalized human-in-the-loop training as a capability rather than a vendor procurement are the buyers compounding the post-training quality through the rest of 2026. Here's what that does to the enterprise post-training discipline, the senior-judge pipeline that grounds the alignment loop, the hybrid-stack engineering work, and the human-in-the-loop service shape the buyer who wants the capability without standing up the in-house team will procure against.

Sonnet Code Editorial Team · June 13, 2026

AI Development10 min read

Windsurf Is Now Devin Desktop — Cognition's June 2 Rebrand, Open ACP Protocol Support, Agent Command Center as Default Surface, and Cascade End-of-Life July 1 Mean the Engineering Orgs Running CI Pipelines Pinned to the Old Surface Have a 29-Day Migration Window Before the Surface They Standardized On Stops Receiving Updates and the Multi-Agent Interoperability Story That ACP Opens Resets the Procurement Conversation at the Agentic-IDE Layer.

Cognition retired the Windsurf brand on June 2, 2026 and relaunched the IDE as Devin Desktop, with Devin Local replacing the Cascade agent (up to 30% more token-efficient on the Rust rewrite with subagent support), the Agent Command Center moving to the default surface that opens before the conventional editor, and open Agent Client Protocol (ACP) support that lets Codex, Claude Agent, OpenCode, and custom in-house agents run side by side in a shared Kanban workspace — and Cascade is end-of-life July 1, 2026, so any CI pipeline, automation script, or workflow rule that explicitly invokes Cascade has a 29-day window before the surface it depends on stops receiving updates. The structural read isn't 'Windsurf got a name change.' It's that the agentic-IDE category just consolidated around a multi-agent interoperability protocol (ACP) that displaces the single-vendor agent lock-in that defined the IDE category through 2025, that the engineering org running Windsurf-pinned automation has to repoint the workflow rules before the Cascade EOL, that the procurement conversation at the IDE layer moves from 'which agent do we standardize on' to 'which agent runs which class of work inside the ACP-native workspace', and that the teams whose Q2 IDE rollout was anchored to Cascade-specific tooling have to either extend to the ACP-native posture or absorb the cost of the deprecation on the next CI cycle. Here's what that does to the multi-agent IDE category, the ACP protocol layer's procurement implications, the migration discipline the next 29 days demand, and the orchestration work that turns the open-protocol posture into compounding multi-agent productivity.

Sonnet Code Editorial Team · June 13, 2026

AI Development10 min read

Anthropic Just Released Claude Fable 5 — Mythos-Class Made Safe for General Use, 80.3% on SWE-Bench Pro Against Opus 4.8's 69.2%, and Stripe Reporting Five Months of Engineering Work Compressed Into Days on a 50-Million-Line Ruby Codebase. The Frontier Coding Bar Just Moved, and the Routing Portfolio That Was Tuned to the Opus-Tier Capability Ceiling Through Q2 Needs to Be Re-Evaluated Against the New One Before the FY27 Budget Locks.

Anthropic announced Claude Fable 5 and Claude Mythos 5 on June 9, 2026 — Fable 5 is the Mythos-class model made safe for general use, with 80.3% on SWE-Bench Pro (vs Opus 4.8 at 69.2%, GPT 5.5 at 58.6%, Gemini 3.1 Pro at 54.2%), 29.3% on the hardest FrontierCode Diamond split (more than double Opus 4.8's 13.4%), and the operationally important real-world data point — Stripe reporting that the model compressed five months of engineering work into days, including a Ruby migration on a 50-million-line codebase that finished in one day where the prior estimate was two months for a full team. The structural read isn't 'Anthropic shipped a faster model.' It's that the frontier coding bar moved a generation in a single release, that the routing portfolio that was tuned to the Opus-tier capability ceiling through Q2 has a new ceiling against which the workload-specific eval has to be re-run, that the long-horizon execution surface (12-hour autonomous runs without intervention) reshapes the agentic-workflow design space, and that the procurement object the buyer who was waiting for the next-generation capability has been holding for is now on the table — with the FY27 budget conversation still six weeks from locking. Here's what that does to the agentic coding routing matrix, the long-horizon execution surface the new ceiling opens, the eval discipline that has to grade the new capability honestly, and the senior-judgment work that turns the capability into compounding production capability.

Sonnet Code Editorial Team · June 13, 2026

AI Development10 min read

JPMorgan Chase Just Reclassified $2B of Annual AI Spend as Core Infrastructure on Par With Fraud Detection and Cybersecurity Inside Its $19.8B 2026 Technology Budget — AI Stopped Being a Discretionary R&D Line and Became the Bank's Floor, and the Procurement Signal Will Move Through Every Tier-1 Financial Institution Inside Two Renewal Cycles, Resetting the FY27 Budget Shape Across the Sector.

JPMorgan Chase formally reclassified its AI investment from experimental R&D to core infrastructure this cycle, placing the $2B annual AI line alongside data centers, payment systems, and core risk controls inside its $19.8B 2026 technology budget — with CEO Jamie Dimon stating the investment has self-funded through $2B in operational savings across more than 150,000 employees and a 10-11% productivity gain in engineering, operations, and fraud detection. The structural read isn't 'JPMorgan increased its AI spend.' It's that the bank at the top of the tier-1 financial-services cohort just moved AI from the discretionary R&D bucket (where the CFO can defer the line in a soft quarter) to the non-discretionary core-infrastructure bucket (where the line is as defensible as the fraud-detection floor or the cybersecurity baseline), that the procurement signal will move through every tier-1 financial institution inside two renewal cycles, that the FY27 budget shape across the sector resets against the new floor, and that the supply-side curve on the senior engineering pipeline that ships these workloads is the binding constraint the procurement conversation will collide with next. Here's what that does to the financial-services AI procurement posture, the build-vs-buy boundary on the operational engineering layer, the in-house team hiring profile, and the AI training discipline that turns the spending re-classification into compounding capability rather than a deferred cost.

Sonnet Code Editorial Team · June 11, 2026