AI & LLM

Core

LLM & AI

Frameworks

Platforms

Regulated

Consumer & Tech

Company

Resources

Careers Blog

The Sonnet Code Blog · Page 15

Engineering notes from the field.

Essays and field notes on AI, software engineering, design, and the craft of building product teams that ship. Written by the engineers doing the work.

AI & Machine Learning9 min read

GitHub's Copilot Coding Agent Just Hit GA — Autonomous PRs Move the Bottleneck From Writing Code to Reviewing It

On May 20 GitHub announced general availability of the Copilot Coding Agent — the autonomous loop that takes an issue, investigates the repo, writes the code, opens a PR, and responds to review comments without a human in the keyboard. GA brings multi-model routing across Claude Opus 4.7, GPT-5.5, and Cursor's Composer 2.5; a new agent-PR review surface inside GitHub itself; and a billing model that meters per-task rather than per-token. The headline framing is "autonomous coding has arrived." The substance is one layer deeper: the bottleneck in software delivery just moved from writing code to reviewing it, and the engineering teams whose review process was already brittle are about to learn it under load.

Sonnet Code Editorial Team · May 21, 2026

AI & Machine Learning9 min read

Meta Just Open-Sourced a Coding Model That Matches Opus 4.7 — Self-Hosting Is a Procurement Line Now, Not a Hobby

On May 20 Meta released Llama 4.5 and Llama Code 4.5 — the first open-weight frontier coding model to land within a benchmark point of Claude Opus 4.7 and Cursor's Composer 2.5. Apache 2.0-style commercial license, weights and tokenizer on Hugging Face, hosted inference live on Together/Groq/Fireworks at roughly $0.30/M input and $1.20/M output. The headline framing is "open-source catches up." The substance is one tier deeper: self-hosting a frontier-grade coding agent inside a customer's VPC just became a defensible procurement decision rather than a research-team curiosity. For regulated industries, data-residency-bound enterprises, and any team whose CFO has been quietly tracking the per-token bill — that's a procurement conversation worth opening this quarter, not next year.

Sonnet Code Editorial Team · May 21, 2026

AI & Machine Learning9 min read

ServiceNow Build Agent Now Runs Inside Cursor, Windsurf, Claude Code, and Copilot — Gartner Says 40% of Agent Projects Will Cancel by 2027 Without Governance

On May 6 ServiceNow made Build Agent generally available and extended its governance plane into every major AI coding tool: Cursor, Windsurf, Claude Code, and GitHub Copilot. A week later Opsera shipped a similar layer for Cursor; Snyk paired with Anthropic to ship security gating directly into Claude. The pattern across all three: the same productivity surface every engineer uses, with the enterprise governance plane bolted in at the IDE rather than the deploy gate. Gartner's projection — over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs, unclear value, or inadequate risk controls — is the executive-visible version of the same story. The teams shipping agentic AI reliably aren't the ones with the best models. They're the ones with the governance layer wired into the developer surface, owned by a named engineer, refreshed on a cadence.

Sonnet Code Editorial Team · May 20, 2026

AI & Machine Learning9 min read

Google I/O 2026 Shipped Gemini Spark on MCP and Gemini 3.5 Flash at Half the Frontier Price — The Agentic Tier Is No Longer Optional

On May 19 Google announced Gemini Spark, a cloud-based 24/7 personal AI agent powered by Gemini 3.5 that runs across Gmail, Docs, Slides, and any service exposed through the Model Context Protocol — Canva, OpenTable, Instacart at launch, with the long tail to follow. Alongside it: Gemini 3.5 Flash at roughly one-third the price of comparable frontier models, and a new agent harness called Antigravity that Google says powers its own internal tools. The headline framing is "Google catches up on agents." The substance is one layer deeper: the agentic AI tier just stopped being a frontier-lab differentiator and became the default user surface for every major productivity stack — Google, Microsoft, Apple, Salesforce, all now committed. For product teams still shipping retrieval-augmented chat experiences, the competitive bar just moved.

Sonnet Code Editorial Team · May 20, 2026

AI & Machine Learning8 min read

Anthropic Just Bought the SDK Toolchain Used by OpenAI, Google, and Cloudflare — The Developer-Tooling Layer Is the Next Concentration Point

On May 18 Anthropic confirmed the acquisition of Stainless for north of $300 million — the AI-driven SDK generation startup that quietly produces the official client libraries shipped by OpenAI, Google, Cloudflare, and Anthropic itself. The headline framing is a routine talent-and-tech tuck-in. The substance is one tier deeper: the layer between every LLM API and every customer codebase — the SDKs, the typed clients, the language-specific harnesses — just consolidated under a single frontier lab. For teams whose AI integration sits on top of an `anthropic`, `openai`, or `google-genai` import statement, that's not a press release. It's a procurement-risk conversation worth having before the next renewal.

Sonnet Code Editorial Team · May 20, 2026

AI & Machine Learning8 min read

Gemini 3.1 Flash-Lite Lands at $0.25 per Million Tokens — The Efficiency Tier Is Where the Volume Workloads Actually Live

Google introduced Gemini 3.1 Flash-Lite this month: an efficiency-focused model at $0.25 per million input tokens, with 2.5× faster response times and 45% faster output generation than earlier Gemini Flash variants. The headline framing is "Google undercuts the budget tier." The substance is one layer deeper: the high-volume workloads — classification, extraction, routing, summarization, internal embeddings preprocessing — that quietly consume 80% of every production AI bill now have a credible default that costs an order of magnitude less than the frontier and a third less than the previous Flash tier. For teams whose Anthropic or OpenAI bill is dominated by volume rather than complexity, this is the model release that pays for itself in the first month — and the one that forces a real conversation about which workloads should never have been on the frontier in the first place.

Sonnet Code Editorial Team · May 19, 2026