SONNET CODE
← Back to all articles
AI DevelopmentJune 27, 2026·7 min read

GitHub Copilot Goes Usage-Based: AI Credits Replace Flat Plans

What changed on June 1, 2026

GitHub flipped the switch on the largest pricing reboot the coding-assistant category has seen since Copilot launched. Every Copilot plan — Free, Pro, Pro+, Business, Enterprise — is now on usage-based billing. Premium Request Units (PRUs), the abstraction that hid the meter behind a fixed monthly count, are gone. In their place: GitHub AI Credits, billed against actual input, output, and cached token consumption at the published per-model API rate.

The headline numbers from GitHub's June 1 announcement:

  • Copilot Pro — $10/month, $10 of monthly AI Credits included.
  • Copilot Pro+ — $39/month, $39 of monthly AI Credits included.
  • Copilot Business — $19/user/month, $19 of monthly AI Credits per user.
  • Copilot Enterprise — $39/user/month, $39 of monthly AI Credits per user.
  • Overage — billed at end of cycle at the per-model API rate; admins can set per-user budget caps.

The May preview-bill experience let admins see the projected monthly meter against the previous PRU baseline before the cutover. The pattern matches what Cursor, Codex, and Claude Code already do — the four-vendor standing map for premium coding tools is now uniformly metered, with the flat-tier-with-included-bucket as the only consumer-facing simplification.

Why the flat-plan era ended

The long-running-agent generation broke the flat-plan economics. A Pro user running an interactive autocomplete loop in 2024 generated a predictable per-day token cost; the same user in 2026 dispatching parallel agents into git worktrees, running overnight refactors, and routing through frontier models for hard subtasks generates a token cost that swings 10x to 100x between a quiet planning day and an active migration day.

Flat pricing under that distribution forces one of two outcomes:

  1. The vendor eats the long tail. Heavy users subsidize the model's marginal-cost exposure, and the vendor's gross margin compresses each quarter as the frontier-model token rate climbs.
  2. The vendor caps usage. PRUs were the soft cap — they bought the vendor a year of runway by making the meter opaque, but the meter still ran and Copilot users were the ones reading throttling notices in the IDE by Q1 2026.

Usage-based billing makes the cost visible to the buyer, lets the buyer route their own per-feature budget, and removes the gross-margin compression that the long-running-agent generation forced. It also makes the procurement conversation honest: the FY27 line item is no longer seats × $/month — it's seats × included-credit-bucket + per-team variable consumption tracked monthly.

What the FY27 plan has to encode

Four shifts that follow when the coding-assistant slot crosses from flat-pricing to metered-pricing across the standing-vendor map.

The per-team variable-cost line item becomes a first-class procurement artifact. The team that runs the standing FY27 plan against the old flat-rate spreadsheet is the team that gets surprised by the first overage invoice when an active migration sprint spikes the per-user meter. The remediation is the per-team monthly token-budget cap, the per-feature consumption attribution (autocomplete vs. agent dispatch vs. background-run), and the per-month variance band the FinOps function can underwrite against the standing budget.

The per-model routing decision moves from a developer-experience preference to a unit-cost optimization. Under metered billing, the developer's choice between route this prompt through the small fast model and route this prompt through the frontier model is no longer a quality preference; it is a per-prompt unit-cost decision. The team that writes the per-task model-routing policy explicitly — small model for autocomplete, mid model for inline edits, frontier model for long-horizon agent runs, with the routing matrix maintained as a code-review-ready artifact — buys back the productivity-per-dollar curve the metered substrate exposes.

The dual-vendor portability commitment becomes the standing-contract anchor. A four-vendor map (Cursor under SpaceX, Claude Code under Anthropic, Codex under OpenAI, Copilot under Microsoft) all now on uniform usage-based billing is exactly the shape where the standing contract should anchor against two vendors and second-source the rest. The portability lever is the per-team workflow contract — the per-prompt policy, the per-agent verification harness, the per-repo bootstrap configuration — written portably enough that a quarter's notice on a per-vendor pricing slip lets the team route the standing workload to the second-source vendor without rewriting the workflow.

The senior-engineering-attention budget becomes the load-bearing throttle. The metered substrate exposes the cost of every keystroke that loops back into a frontier-model invocation. The team that confuses the cheapened per-keystroke cost for the cheapened per-decision cost ships the first metered-quarter post-mortem on the agent loop that ran $40,000 of frontier-model tokens against an ambiguous spec the team hadn't tightened before dispatching the run. The FinOps function alone can't carry that throttle — the throttle is the team's senior-engineering-attention budget against the per-dispatch decision, with the per-dispatch verification contract as the standing artifact.

Where this hits the AI-integrated product team hardest

The product team that ships AI features against a frontier-model API has been on usage-based billing since day one — the new news is that the coding-assistant substrate the team uses to build those features is now metered too. The compounding read is real: the per-feature inference budget on the production side and the per-team coding-assistant budget on the development side both run against the same FinOps spreadsheet, both swing on the same long-running-agent generation, both reward the same per-prompt routing discipline.

The teams that already have the per-feature inference budget under control will adapt to the Copilot reboot inside one quarter. The teams that have been treating their frontier-model bill as a fixed cost-of-doing-business will find both budgets misbehaving at the same time, and the FY27 plan that did not budget for either swing will be the FY27 plan that gets re-cut in the August forecast cycle.

The procurement question is no longer which seat tier does the team buy; it is which two of the four metered vendors anchor the standing contract, which per-team token-budget cap the FinOps function underwrites, which per-task model-routing policy the engineering function maintains as a code-review-ready artifact, and which senior-engineering-attention budget throttles the per-dispatch decision the metered substrate now puts a price on. The teams that ask the right question this quarter buy back the productivity gain the metered substrate exposes; the teams that ask the wrong one buy themselves the August invoice nobody on the team owned the dispatch decision for.


At SONNET CODE we run our own engineering team on the same metered substrate, and the per-task model-routing policy plus the per-feature inference budget are the two artifacts we hand back to clients on every AI-integration engagement. If your team is rebuilding the FY27 procurement plan around the new meter, schedule a call — we'll walk you through the per-task routing policy we run against the four-vendor map.