Sonnet Code
← Back to all articles
AI DevelopmentJune 22, 2026·10 min read

xAI Landed Grok 4.3 on Amazon Bedrock on June 15 at $1.25 Per Million Input Tokens and $2.50 Per Million Output Tokens — the Cheapest US-Lab Frontier Reasoning SKU on Bedrock With a 1-Million-Token Context Window, the Lowest Hallucination Rate Among Frontier Models, the #1 Slot on the Artificial Analysis Omniscience Benchmark, and Configurable Reasoning Effort From None to High — the Bedrock Routing Matrix Just Acquired a Third US-Lab Frontier Reasoning Candidate at a Price Point That Resets the AWS-Native Floor Below the Comparable Claude and OpenAI Options, and Every Team Routing Reasoning Work Through Bedrock Has a Re-Grade Job the Procurement Spreadsheet Doesn't Yet Reflect.

What landed on Bedrock on June 15 and the pricing commitment that lands with it

On June 15, 2026, xAI brought Grok 4.3 to the Amazon Bedrock catalog at the following on-demand pricing:

  • $1.25 per million input tokens.
  • $2.50 per million output tokens.
  • $0.20 per million cached input tokens.

AWS Bedrock's product team confirmed in the launch post that the listing makes Grok 4.3 the cheapest US-lab frontier reasoning SKU on Bedrock. The model brings the operational profile that has carried the Grok 4.3 launch through the spring:

  • 1-million-token context window — wide enough to hold the team's largest single repository in a single call, with the standard Bedrock context-window-billing primitive applied.
  • Lowest hallucination rate among frontier models — currently #1 on the Artificial Analysis Omniscience benchmark, against the Claude and OpenAI frontier SKUs the buyer's Bedrock routing matrix has been grading against.
  • Configurable reasoning effort across four bandsnone, low, medium, high — controllable per-call against the Bedrock invocation primitive. The per-call effort band lets the team's routing matrix grade cost-per-successful-task per workload class against the effort the workload actually needs, rather than against a single fixed reasoning posture per SKU.
  • Native on Bedrock — no parallel contract, no parallel residency story, no parallel data-handling posture; the SKU lands under the buyer's existing Bedrock contract, the buyer's existing Bedrock IAM policy surface, the buyer's existing Bedrock VPC-and-PrivateLink primitive, and the buyer's existing Bedrock observability surface.

The operationally important pieces:

  • The price-per-token floor on the Bedrock frontier-reasoning shelf just moved. Until June 15, the Bedrock-resident US-lab frontier reasoning options priced against an implicit floor set by the Claude and OpenAI SKUs the catalog already carried. Grok 4.3's listing prices below that floor by a meaningful margin on both input and output, with a deeper cached-input discount than the comparable cohort. The buyer whose Bedrock-routing-matrix default-cost-per-call was set six months ago is operating against a floor the catalog has structurally moved past.
  • The Bedrock contract-and-posture surface is the same surface the SKU lands on. The procurement primitive that Bedrock is engineered to deliver — one AWS contract, one IAM surface, one residency story, one observability surface across every model in the catalog — applies to Grok 4.3 on the same terms it applies to the Claude and OpenAI SKUs. The procurement-leverage point the buyer holds on AWS is not diluted by the third frontier-vendor entry; it is strengthened — the buyer now has three candidates the contract grades against, with the existing contract terms.
  • The configurable-reasoning-effort primitive lets the team's routing matrix grade against the right effort band per workload class rather than against a fixed-effort SKU. The cost-per-successful-task per workload class is not the cost-per-token times the average-token-count; it is the cost-per-token times the workload-appropriate effort band's token-count. The configurable-effort primitive makes that distinction operationally cheap; the team that grades cost-per-class with effort-band routing wired in gets a finer attribution than the team whose routing matrix treats the SKU as a fixed-effort node.
  • The hallucination-rate-leader signal is a per-workload-class procurement column, not a marketing bullet. Lowest hallucination rate among frontier models is the column the regulated-industry buyer, the customer-facing-agent operator, and the safety-critical-workflow team have been grading every routing decision against. The Bedrock-native option that leads the column at the cheapest price point on the catalog is a procurement-decision data point the buyer's routing-matrix has to grade against, on every workload class where the hallucination-rate column is the deciding column.

The structural read isn't AWS added another model. It's that the Bedrock-resident US-lab frontier reasoning catalog now has three credible candidates — Claude, GPT, and Grok — at a price-quality-latency profile spread that turns the buyer's Bedrock-routing-matrix decision from pick the default to route per workload class against the priced floor and the per-class quality measurement, and the buyer whose Bedrock contract is still defaulting to one SKU has a re-grade job for Q3 the procurement spreadsheet doesn't yet have a column for.

What the Bedrock catalog landing restructures about AWS-native AI routing

Four concrete shifts that follow when the Bedrock catalog acquires a third credible US-lab frontier reasoning candidate at the cheapest price-per-token in the cohort.

The Bedrock-routing-matrix decision becomes a per-workload-class decision, not a single-default-SKU decision. Twelve months ago, the buyer's Bedrock-resident AI default was route everything through the one model contract we already grade against, because the other Bedrock-resident options didn't beat the default at a price point worth the routing-matrix complexity. The Grok 4.3 listing reframes the decision: the catalog now has three frontier-reasoning candidates whose per-workload-class cost-quality-latency profile is materially different, and the buyer's routing matrix has to grade per-class rather than per-default. The Q3 work the procurement spreadsheet doesn't yet have a column for is the per-workload-class re-grading the new candidate triggers.

The cost-per-successful-task budget per workload class becomes a priced-against-three-floors number, not a priced-against-one-floor number. With one Bedrock-frontier-reasoning candidate as the default, the cost-per-successful-task budget per class is set against that candidate's per-token price. With three candidates at materially different per-token prices, the cost-per-class budget is priced against the candidate that wins the per-class measurement at the lowest per-token cost — and the candidate that wins per-class varies across the workload-class distribution. The FinOps line item the team has been carrying as a single-vendor cost-per-class is now a three-candidate cost-per-class with the routing matrix as the attribution primitive.

The hallucination-rate-leader column becomes the per-class decision-weight the routing matrix lands harder on per workload class than the cost-per-token column. For the regulated-industry buyer, the customer-facing-agent operator, the safety-critical-workflow team — which model has the lowest hallucination rate on the workload class the team's gold set grades against is the procurement column the routing decision weights heaviest. Grok 4.3's #1-on-Omniscience signal is the data point the team's routing matrix should re-weight against on those workload classes specifically, before re-weighting against the cost-per-token column on the lower-stakes classes.

The configurable-reasoning-effort primitive becomes the per-call cost-attribution lever the team's FinOps surface has been waiting for. With a fixed-effort SKU, the team's per-call cost is the per-token cost times the SKU's fixed effort token-count. With a configurable-effort SKU, the team's per-call cost is the per-token cost times the workload-appropriate effort band's token-count. The team that wires effort-band routing into the per-class routing matrix gets a finer cost-attribution per class than the team whose routing matrix treats every call to the SKU as the same call.

Where the listing is signal and where it is noise

Four honest reads on what the June 15 Bedrock landing actually tells the buyer.

Signal: the cheapest US-lab frontier reasoning SKU on Bedrock claim is the priced-floor-just-moved signal. AWS confirming the cheapest-in-catalog positioning is the catalog operator's read on what the cohort looks like with Grok 4.3 in it. The buyer who treats the claim as marketing language is missing the structural commitment AWS is making: the catalog operator is signaling the priced floor on the frontier-reasoning shelf has moved, and the buyer's existing contract grades against the new floor whether the buyer re-grades or not. The contract that's still defaulting to the old floor is paying the old floor's cost-per-class.

Signal: the Bedrock-native, same-contract-and-posture listing is the zero-procurement-friction-to-add-the-candidate signal underneath the launch. A Bedrock-resident listing is not a parallel contract, parallel residency, parallel observability. The buyer's existing AWS contract grades the new candidate on the same terms it grades the existing cohort. The friction-to-add is the routing-matrix update, not the contract renegotiation; the contract is the substrate the routing decision lands inside.

Noise: the headline benchmark position is not the per-buyer routing decision. #1 on Artificial Analysis Omniscience is an aggregate-benchmark signal. The per-buyer routing decision is what does the per-class gold set say about Grok 4.3 on the buyer's specific workload distribution, against the buyer's specific framework mix, against the buyer's specific failure-mode tail. The aggregate benchmark is the should-we-pilot signal; the per-buyer measurement is the should-we-route decision.

Noise: the cheapest-on-Bedrock positioning does not eliminate the per-class quality measurement. A cheaper per-token cost does not buy a routing decision; the routing decision is bought by the per-class success rate against the buyer's gold set at the new candidate's per-token cost. The team that routes-to-the-cheapest without measuring the per-class success-rate-delta is the team that ships the per-class regression six weeks later and reads the routing-decision retrospective at the next architecture review. The per-class measurement is the team's work; the cheapest-on-Bedrock signal is the data the measurement grades against.

What the team should do inside the next quarter

Four concrete actions that close the gap between the June 15 Bedrock landing and the routing-matrix discipline the new candidate requires.

Audit the team's Bedrock-routing-matrix default against the three-candidate cohort. For the team whose Bedrock contract is still defaulting to a single frontier-reasoning SKU, the Q3 work is audit the default against the three-candidate cohort, per workload class, on the team's existing gold set. The audit answers: which workload classes win on cost-per-class with Grok 4.3, which workload classes win on hallucination-rate-leader with Grok 4.3, which workload classes still win with the existing default, which workload classes are within the noise floor and should be routed against price. The audit is the data the routing-matrix update should grade against.

Pilot Grok 4.3 against one well-defined Bedrock-routed workload class, not against the whole routing matrix. The right pilot is not re-route every Bedrock call to Grok 4.3; it is pick one workload class the configurable-effort primitive or the hallucination-rate-leader signal makes a strong candidate for — long-context-window reasoning, low-hallucination-tolerance customer-facing answering, configurable-effort cost-per-class optimization — and grade the per-class cost-per-successful-task, latency, and quality-against-gold-set for 30 to 60 days. The pilot is the data the per-class routing decision should grade against.

Wire effort-band routing into the per-class routing matrix as a first-class engineering artifact. The configurable-reasoning-effort primitive only delivers the cost-attribution win if the routing matrix grades against the workload-appropriate effort band per class, not against a fixed-effort default. The Q3 engineering work is the per-class effort-band selection wired into the routing matrix, with the per-class measurement of the cost-per-class delta across effort bands as the artifact the next routing-matrix review reads.

Update the team's FinOps surface to attribute cost-per-successful-task per workload class across the three-candidate Bedrock cohort. The cost-per-class line item the team's FinOps surface has been carrying as a single-vendor number is now a three-candidate number with the routing matrix as the attribution primitive. The FinOps surface needs the per-candidate, per-class cost-per-successful-task as a first-class metric, with the per-class success-rate and per-class fallback-rate as the joint surfaces the cost-per-class grades against.

What this does not change

Three honest caveats.

It does not eliminate the per-class eval-rubric authoring. A new candidate on the routing matrix per workload class still requires the per-class gold set, the per-class rubric, and the per-class senior-review queue calibration. The model release is the substrate; the eval rubric is the team's. Grok 4.3 lands the new candidate; the per-class measurement is the team's measurement.

It does not eliminate the per-vendor commercial diligence on the SKU. A Bedrock-native listing is the contract-and-posture substrate; it does not broker the per-vendor commercial-and-roadmap relationship. The team that wires Grok 4.3 into the routing matrix as a critical-path node still owes the diligence on xAI's per-SKU roadmap, per-SKU SLA, per-SKU deprecation posture, and per-SKU regulatory posture. The Bedrock substrate is the engineering primitive; the per-vendor diligence is the team's work.

It does not eliminate the per-class senior-judgment workload behind every routing decision. Each routing decision per workload class has a per-class failure-mode tail — the confidently-wrong output the team's gold set didn't catch, the technically-correct response that misses the dispositive context, the clean-looking action with subtle downstream consequences. The senior-review queue calibrated per workload class against the new candidate's per-class failure-mode tail is the human-judgment workload the new candidate imposes on the team — the same workload the team owed against every prior candidate.

Where Sonnet Code fits

The June 15 Bedrock landing is the architectural commitment that turns the single-frontier-reasoning-default the Bedrock contract has been carrying into a three-candidate-routed-per-workload-class matrix the team has to grade. The Bedrock-routing-matrix audit, the per-class pilot, the effort-band routing engineering work, the FinOps-surface attribution update, and the per-class senior-judgment rubric calibration are the engineering-and-human-judgment work the new candidate imposes on the buyer.

AI development at Sonnet Code is the engineering half: auditing the team's Bedrock-routing-matrix default against the three-candidate cohort on the team's existing gold set; piloting Grok 4.3 against a well-defined Bedrock-routed workload class with the per-class cost-per-successful-task, latency, and quality-against-gold-set measurement; wiring effort-band routing into the per-class routing matrix as a first-class engineering artifact; updating the team's FinOps surface to attribute cost-per-successful-task per workload class across the three-candidate cohort; and integrating the new candidate into the team's existing Bedrock-native observability and incident-response surface so the per-class failure modes are tractable.

AI training at Sonnet Code is the human-judgment half: senior engineers and domain experts who author the per-workload-class gold sets that grade Grok 4.3 honestly against the team's specific workload distribution; design the per-class senior-judgment rubrics that calibrate the senior-review queue for the per-SKU failure-mode tail xAI's training distribution does not pre-empt for the buyer's specific industry; refresh the gold sets and rubrics quarterly so the routing decisions do not silently drift as xAI ships the next Grok release and as the rest of the Bedrock cohort ships their own next releases; and serve as the senior-judge pool whose calibrated decisions feed the routing-matrix updates the next release cycle resolves against.

The Bedrock-resident US-lab frontier reasoning catalog now has three credible candidates at a meaningfully different price-quality-latency spread per workload class. The teams that walk into Q3 with the Bedrock-routing-matrix audit run against the three-candidate cohort, the per-class pilot run against the team's own gold set, the effort-band routing wired into the routing matrix as a first-class artifact, and the per-class senior-judgment rubric calibrated against the new candidate's per-class failure-mode tail are the teams that turn the June 15 Bedrock landing into a compounding cost-and-quality advantage on the AWS-native AI stack. The teams that read the listing as a model-vendor reshuffling and stop there will discover the routing-matrix gap, the per-class eval-rubric debt, and the senior-review queue calibration the new candidate does not deliver — six months after the buyer down the road figured out how to grade the three-candidate cohort honestly.