AI & LLM

Core

LLM & AI

Frameworks

Platforms

Regulated

Consumer & Tech

Company

Resources

Careers Blog

The Sonnet Code Blog · Page 5

Engineering notes from the field.

Essays and field notes on AI, software engineering, design, and the craft of building product teams that ship. Written by the engineers doing the work.

AI Development10 min read

DeepSeek V4 Pro's 75% Promotional Discount Expired May 31 and the Permanent Rate Card Settled at $0.435/$0.87 per 1M Tokens — the Open-Weight Frontier Coding Tier Now Has a Permanent Floor One Order of Magnitude Below the Western Closed Flagships, and the Routing Portfolio That Wasn't Carrying an Open-Weight Tier Through Q2 Is About to Pay for the Omission in the Q3 Cost-Per-Successful-Task Line.

DeepSeek V4 Pro's 75% promotional discount that ran from April 24 through May 31, 2026 expired, and the permanent rate card settled at $0.435/$0.87 per 1M input/output tokens (cache-hit $0.003625/M) — a price floor roughly one order of magnitude below Claude Opus 4.8 ($5/$25), GPT-5.5 (broadly comparable), and Gemini 3.1 Pro for the equivalent capability tier. Kimi K2.6 sits at $0.95/$4.00 for the 1T-parameter MoE, 32B-active, 256K-context model released April 20 under a Modified MIT license; DeepSeek V4 weights are MIT-licensed, with V4-Pro at 1.6T total / ~49B active and V4-Flash at 284B / ~13B active released April 24. The structural read isn't 'two open-weight models got cheaper.' It's that the open-weight frontier coding tier now has a permanent price floor that's structurally below the closed Western flagships, that the routing portfolio that was running 100% closed-flagship through Q2 is about to discover the cost-per-successful-task line carries the omission as the cohort that did the multi-vendor routing work catches the bill arbitrage, and that the eval discipline that hasn't graded the open-weight tier on the customer's specific codebase is the discipline that has to do the work this quarter before the FY27 budget conversation resolves on the wrong axis. Here's what that does to the multi-vendor routing matrix at the model layer, the in-perimeter self-hosted deployment path the MIT/Modified-MIT licenses open up, the cost-per-successful-task attribution discipline, and the gold-set authoring work that grades a non-Western-trained model honestly.

Sonnet Code Editorial Team · June 11, 2026

AI Development10 min read

Microsoft Just Closed the Last Procurement Gap Holding Regulated Industries Off Agentic Coding — VS Code Agents Hit Stable in 1.120, Air-Gapped BYOK Shipped in 1.122, and GitHub Copilot's April FedRAMP Moderate Authorization Pairs With Both to Make the Compliance Story Defensible for the First Time at the Engineering-Org Scale Regulated Buyers Actually Need.

The VS Code 1.120 release on May 13, 2026 moved the Agents window from experimental to Stable preview; the 1.122 release on May 28 removed the GitHub OAuth dependency for bring-your-own-key models so the IDE can run fully offline against Ollama, vLLM, or Foundry Local; and GitHub Copilot's FedRAMP Moderate authorization from April 2026 gives the regulated-industry IT administrator the compliance posture to deploy the surface across the full engineering organization rather than the small pilot cohort that's defined the agentic coding rollout in those verticals through the last eighteen months. The structural read isn't 'VS Code shipped an agents window.' It's that the last operational gap holding regulated industries — financial services, healthcare, defense, public sector — off the agentic coding surface just closed at the IDE-incumbent layer, that the compliance posture is now defensible at the engineering-org scale rather than at the pilot scale, and that the procurement object the regulated buyer has been waiting for through 2025 (an IDE-native agentic surface that runs against the in-perimeter inference path with a FedRAMP-grade compliance story behind it) is now on the table. Here's what that does to the regulated-industry AI coding rollout, the in-perimeter inference path the IDE now treats as a first-class endpoint, the policy-layer posture the enterprise admin has to author, and the eval discipline that has to grade both the cloud-hosted and the air-gapped configurations honestly side by side.

Sonnet Code Editorial Team · June 11, 2026

AI Development10 min read

Cursor Restructured Team Pricing Into Standard ($32/Seat) and Premium ($96/Seat) Tiers the Same Week Gartner Said 40% of Enterprise Applications Will Have Task-Specific Agents by Year-End — the AI Coding Procurement Surface Just Bifurcated Into Two Structurally Distinct Workload Classes, and the Buyer Who Sizes the Seat Mix on the Org Chart Instead of the Workload Distribution Will Pay 3x the Right Rate on a Meaningful Fraction of the Engineering Org.

Cursor's June 2026 Teams pricing restructure split the procurement surface into Standard seats at $32/seat/month on annual billing ($40 month-to-month) and the new Premium seats at $96/seat/month with 5x the Standard usage envelope and prioritized access to the heaviest agent models — landing the same week Gartner published its updated guidance that 40% of enterprise applications will integrate task-specific AI agents by the end of 2026 (up from less than 5% in 2025), and against Cursor's own consolidated reporting of roughly two out of three Fortune 500 companies with at least one team on Cursor and the revenue mix now 60% enterprise. The structural read isn't 'Cursor changed its pricing.' It's that the AI coding procurement object bifurcated into two structurally distinct workload classes — the sidekick class where the Standard envelope covers comfortably, and the long-horizon agentic class where the Premium capacity tier exists precisely because the Standard envelope is the binding constraint — and that the 3x per-seat cost delta between the two tiers, applied to a meaningful fraction of the engineering org, is the kind of FinOps mistake that doesn't surface in the aggregate monthly bill but does surface in the cost-per-successful-task per team. Here's what that does to the seat-allocation discipline (which has to anchor on workload class, not seniority), the FinOps line decomposition (which has to surface per-tier per-team, not aggregate), the re-evaluation cadence (which has to be quarterly, not annual), and the eval matrix (which now grades tier as well as model).

Sonnet Code Editorial Team · June 10, 2026

AI Development10 min read

ServiceNow and Accenture Launched a Joint Forward-Deployed Engineering Program to Move Agentic AI From Pilot to Production at Enterprise Scale — the Operational Engineering Layer Between Platform Purchase and Shipped Workloads, the Real Bottleneck of the Last Eighteen Months, Just Got Productized at the Largest Tier of the Enterprise Market.

ServiceNow and Accenture announced a joint forward-deployed-engineering program this quarter, with both vendors' FDE teams operating inside the customer's environment together to ship agentic AI workloads from pilot to production at scale on the ServiceNow AI Platform. The framing matters because the bottleneck on enterprise agentic AI through 2025 was never the platform availability — Claude Managed Agents, Gemini Enterprise, MAI, and the rest of the platform-tier surfaces have been credible for months — but the operational engineering layer between 'we bought the platform' and 'the workloads are in production, instrumented, governed, and producing measurable business value'. The structural read isn't 'two vendors announced a consulting partnership.' It's that the FDE-as-a-program structure, previously reserved as a bespoke arrangement for the top ten accounts of a frontier lab, has been productized — and that the procurement object for enterprise agentic AI now has to expand from 'the platform' to 'the platform plus the engineering team that deploys it', with the buyer who runs the procurement on the old shape ending up eighteen months behind on the deployment the business case priced. With 81% of enterprises planning more complex agentic use cases this year and Gartner projecting 40% of enterprise applications agent-integrated by year-end, the supply curve on the senior engineering pipeline that ships these workloads is the binding constraint — and FDE-as-a-program is the supply-side answer. Here's what that does to the build-vs-buy boundary on the operational engineering layer, the contract structure for FDE engagements, and the in-house team hiring profile that takes over after the FDE team rotates out.

Sonnet Code Editorial Team · June 10, 2026

AI Development10 min read

Google Just Shipped Gemini Enterprise Agent Platform as a Single End-to-End Surface for Building, Deploying, and Governing Agents — the Enterprise Agentic AI Procurement Conversation Now Has Three Platform-Tier Vendors (Anthropic, Microsoft, Google) Competing on the Governance Layer, Not on the Model Itself, and the Buyer Who Reads the Decision as 'Which Model' Has Misidentified the Lock-In.

Google's Gemini Enterprise Agent Platform — formally announced and into general availability through May and the first ten days of June 2026 — collapses agent build, deploy, observe, and govern into a single end-to-end enterprise surface, with native multi-model routing across Gemini 3.1 Pro, Gemini 3.5 Flash, Claude Opus 4.8, GPT-5.5, Llama 4, DeepSeek V4 Pro, Qwen 3.7 Max, and the broader catalog. Combined with Microsoft's MAI-and-Foundry surface and Anthropic's Claude Platform on AWS, the enterprise agentic AI procurement object just resolved into three platform-tier vendors competing on the governance, observability, and runtime layer — with the model as a routing decision underneath. The structural read isn't 'Google shipped an agent platform.' It's that the durable lock-in moved from the model layer (where the multi-vendor routing strategies of the last two years had collapsed it) up to the governance surface (where the audit log structure, the policy DSL, the IAM bindings, the evaluation rubrics, and the cost-attribution dashboards now sit), and the buyer who reads the procurement decision as 'which vendor's model do we license' is buying the wrong procurement object by an order of magnitude. Here's what that does to the multi-vendor routing strategy at the platform layer, the protocol-stack posture (MCP, A2A, ACP) that keeps the platform decision reversible, and the in-house agent build-vs-buy question now that the runtime is platform-managed by default.

Sonnet Code Editorial Team · June 10, 2026

AI Training10 min read

The AI Training Labor Market Just Resolved Into Six Distinct Job Categories With Public Rate Cards From $15/hr Annotator to $1,000/hr Domain Expert Evaluator — Skilled Reviewers Are Now the Scarce Resource of Frontier-Tier Model Quality, and the Enterprise Buyer Who Treats Them as a Cost Line Is Pricing the Wrong Side of the Curve.

The data labeling industry resolved through 2025 and into 2026 into six structurally distinct job types — data annotators at $15-25/hr, AI tutors and trainers at $20-55/hr, RLHF specialists at $50-65/hr, prompt engineers at $40-65/hr, domain expert evaluators at $130-1,000/hr, and red-teamers at $100-200/hr — with the entire shape of the cost curve and the talent pipeline now legible in a way it wasn't a year ago. Roughly 70% of enterprise LLM deployments now ship some variant of RLHF, DPO, KTO, GRPO, or DAPO post-training; AI assistants are responsible for ~50% of freshly-written code in production while the churn rate of that code has risen 41%; and the buyers who get value from their alignment spend are the ones who treat the senior end of the reviewer pool as a managed strategic asset rather than a procurement line item. The structural read isn't 'AI labor is expensive.' It's that the supply curve of high-context human judgment — the engineers, the domain experts, the bilingual reviewers, the red-teamers who can catch the failure modes a frontier model actually produces — is the hard ceiling on enterprise alignment quality through the rest of 2026, and that the buyer who builds the talent pipeline early is the buyer whose model quality compounds while the cohort that anchors on per-annotation rates keeps watching their dashboards move slowly. Here's what that does to the make-vs-buy decision on the human-feedback layer, the FinOps shape of an honest alignment budget, and the eval-and-governance discipline that turns the talent investment into compounding model quality.

Sonnet Code Editorial Team · June 9, 2026