Sonnet Code
← Volver a todos los artículos
AI Training23 de junio de 2026·12 min read

Surge AI Bootstrapped Past Scale AI on Revenue at $1.2B vs $870M — by Charging Up to 10x Premium for Frontier-Lab RLHF and Paying Domain-Expert Annotators 30 to 40 Cents per Working Minute ($18-$24/hr) Instead of Crowd-Worker Rates, While Scale Defends Market Share With Bundled and Subsidized RLHF Pricing — Top Labs Already Spend on the Order of $1B per Year Each on Human-Provided Training Data, RLHF-Trained Models Produce 40% Fewer Toxic Outputs Than Synthetic-Only Models, and the AI-Training-Data Slot of the FY27 Procurement Plan Just Had Its Incumbent Dethroned — The Diligence Sprint to Grade the Premium-Domain-Expert Pool vs. the Commodity-Bundled-Bench vs. the In-House-Headcount Alternative Has to Run This Quarter.

What the Surge-past-Scale revenue inversion signals for the AI-training slot

The AI-training-data-procurement slot of the buyer's FY27 plan just had its incumbent dethroned, and the procurement-spreadsheet now reads the inversion the analyst-coverage envelope has been telegraphing for two quarters: Surge AI bootstrapped to $1.2 billion in revenue in 2024, surpassing Scale AI's $870 million — without venture rounds, without a public bench of crowd-worker labelers, without the bundled-and-subsidized RLHF pricing the previous incumbent rode to category leadership. The headline mechanic is Surge charging up to 10x the commodity-bench rate for complex, high-value frontier-lab RLHF work, paying its approved annotators 30 to 40 cents per working minute — $18 to $24 per hour — far above the typical crowd-worker rate, and packaging the senior-domain-expert pool against the regulated-industry-vertical failure tail the commodity bundled bench cannot grade.

The complement signals the procurement-spreadsheet now reads alongside the revenue inversion:

  • Top AI labs — Google, OpenAI, Meta, Anthropic — each spend on the order of $1 billion per year on human-provided training data, a category-spend number that puts the buyer's per-vertical AI-training line item against a procurement substrate that is larger than the model-licensing line item it sits next to in the FY27 plan.
  • Enterprise Surge contracts start at $50,000 to $60,000 annually and scale higher — the senior-domain-expert-pool managed-service contract shape, not the per-task-API-rate metered-billing-line-item shape the commodity bench priced against.
  • Scale's paid tiers range from $49 to $1,999 per month with enterprise options that include custom infrastructure, advanced integrations, and white-glove support, and the public commentary describes Scale using bundled offerings and subsidized RLHF pricing to maintain market share — the commodity-substrate pricing-power defense the incumbent runs when the premium-domain-expert pool is taking the regulated-industry top of the wedge.
  • RLHF-trained models produce 40% fewer toxic outputs than models trained solely on synthetic data — the outcome-grade signal the buyer's safety-and-compliance committee now reads on top of the per-vertical accuracy signal, and the line item the FY27 plan can no longer defer to the model vendor will handle alignment for us.

The procurement-question for every buyer whose FY27 plan has an AI-training line item is no longer do we run the per-vertical RLHF data pipeline; it is which substrate — premium domain-expert pool or commodity bundled bench — fills which per-vertical slot of the buyer's training-data substrate, and the diligence sprint that grades the slot composition has to run this quarter.

Why the premium-domain-expert pool wins the regulated-industry tail

The commodity-bundled-bench pricing the previous incumbent rode is the crowd-worker-substrate shape: a large pool of annotators with general-grade training, paid at a low per-task rate, graded against a per-task quality envelope the platform substrate's quality-control layer enforces. The substrate is efficient at scale on tasks the per-task quality envelope can grade — image-classification labeling, bounding-box drawing, transcription, generic conversational-response ranking. The substrate does not survive contact with the per-vertical regulated-industry failure tail because the failure tail is, by construction, the tail the per-task quality envelope cannot grade: the silent-wrong-completion where the model's response is grammatically perfect, plausibly correct, internally consistent, and wrong in a way only the senior domain expert can catch.

The premium-domain-expert pool is the senior-judgment-overlay substrate: a smaller pool of annotators with domain-specific expertise, paid at a premium per-working-minute rate, graded against the per-vertical failure-tail composition the senior-judgment overlay encodes. The substrate is inefficient at the commodity-bench scale on tasks the per-task quality envelope can grade — the unit cost per accepted annotation is higher because the per-task time is longer and the per-minute rate is multiples of the crowd-worker rate — but the substrate wins the per-vertical regulated-industry tail because the failure tail is exactly the substrate the senior-judgment overlay was designed to grade. The buyer that runs the per-vertical RLHF data pipeline against the commodity-bundled-bench substrate on the regulated-industry tail is the buyer that pays the cheaper per-task rate and ships the silent-wrong-completion to the per-vertical failure tail the buyer's compliance committee will eventually find — and pays the cost of the post-deployment retraining cycle, the per-vertical accuracy regression, and the procurement-credibility loss against the buyer down the road that paid the senior-domain-expert pool premium up front and shipped the per-vertical accuracy gain the FY27 plan was supposed to encode.

The revenue inversion is the market-pricing signal that the premium-domain-expert pool already won the regulated-industry tail at the FY26 budget cycle. The buyer that does not encode the inversion into the FY27 plan is the buyer that defers the per-vertical accuracy gain by one full release cycle while the buyer down the road compounds against it.

What the $1B-per-frontier-lab line item tells the procurement spreadsheet

The $1 billion per year per frontier lab on human-provided training data number is the substrate-market-size anchor the procurement-spreadsheet line item grades against. Four frontier labs at $1B each is a $4 billion floor for the AI-training-data substrate — and the floor does not include the enterprise-buyer-side per-vertical RLHF spend the FY27 plan is now encoding against the model-vendor-side substrate. The category-spend number tells the buyer the substrate is not a fixed-cost amortization the buyer can avoid by paying the model vendor for aligned-out-of-the-box model access; it is a per-vertical operating-expense line item the buyer's FY27 plan has to grade against the buyer's own per-vertical failure tail, because:

  • The model vendor's alignment substrate is graded against the model vendor's gold set — the cross-vertical, cross-industry, cross-failure-mode envelope the model vendor's safety classifier was trained against. The model vendor's alignment substrate is necessary but not sufficient for the buyer's per-vertical failure tail: the buyer's per-vertical failure mode is, by construction, the failure mode the model vendor's gold set could not grade because the failure mode is per-vertical-specific and the model vendor's gold set is per-vertical-agnostic.
  • The buyer's per-vertical senior-judgment overlay is the substrate that grades the per-vertical failure tail, and the substrate has to be built on top of the per-vertical domain-expert pool the buyer maintains — either internally (the buyer hires the senior-domain-expert pool in-house, the FY27 plan encodes the headcount line item, the platform team owns the overlay-calibration cadence) or externally (the buyer contracts the senior-domain-expert pool from a premium-RLHF-substrate vendor like Surge, the FY27 plan encodes the managed-service-contract line item, the platform team owns the overlay-calibration cadence and the vendor owns the per-minute-rate annotator-pool maintenance).
  • The annotator-pool-maintenance cost the premium-RLHF-substrate vendor amortizes across the buyer-side line item is the cost the buyer's internal headcount line item would otherwise have to absorb — recruiting senior-domain-experts at $18 to $24 per hour, training them on the per-vertical failure-mode rubric, calibrating them against the buyer's per-vertical gold set, retaining them against the salary-and-benefits envelope a competitive senior-domain-expert pool now commands. The premium-RLHF-substrate vendor's per-minute-rate is the make-versus-buy pricing signal the buyer's FY27 procurement function reads against the in-house headcount alternative.

The diligence sprint to run this quarter on the AI-training slot

The buyer that walks into Q3 with the AI-training-slot diligence sprint already run is the buyer whose FY27 plan encodes the per-vertical AI-training line item against the substrate the buyer's per-vertical failure tail actually requires — not against the substrate the previous procurement contract defaulted to because the FY26 line item did not yet have to grade the per-vertical accuracy gain honestly.

The sprint has four components the buyer's central platform team can no longer defer:

  1. Map the per-vertical failure tail composition — for each vertical the buyer's FY27 plan ships an AI-integrated product against, the platform team has to document the per-vertical failure-mode shape the senior-judgment overlay needs to grade: silent-wrong-completion rate, plausible-but-incorrect-reasoning rate, instruction-drift rate, regulated-domain-specific-violation rate, tool-call-correctness rate. The per-vertical failure-tail map is the asset the substrate-selection diligence sprint grades against; the buyer that runs the sprint against an undocumented failure tail is the buyer that picks the substrate against the easy-to-grade portion of the workload and ships the substrate-mismatch loss to the hard-to-grade tail.
  2. Grade the per-vertical RLHF-data substrate candidates against the per-vertical failure tail — premium-domain-expert pool (Surge-shape: senior-annotator-pool, per-minute-rate, managed-service-contract), commodity-bundled-bench (Scale-shape: large-annotator-pool, per-task-rate, bundled-tier pricing), and the buyer-in-house alternative (FTE-headcount, internal-pool-managed, per-vertical-overlay-calibration owned). Each candidate has a different per-vertical pass-rate-per-dollar profile against the per-vertical failure tail; the diligence sprint's conclusion is the per-vertical substrate-allocation the FY27 plan should encode, not a single-substrate procurement decision for all verticals.
  3. Calibrate the per-vertical senior-judgment overlay against the chosen substrate — the overlay's calibration is the substrate-specific asset that decides whether the per-vertical RLHF data pipeline produces the per-vertical accuracy gain or the per-vertical accuracy regression. The overlay needs per-vertical gold sets, per-vertical failure-mode rubrics, per-vertical inter-annotator-agreement targets, and a refresh cadence that tracks the model-vendor's release cycle. The buyer that runs the overlay against a generic rubric on a substrate that grades against the per-vertical failure tail is the buyer that loses the per-vertical accuracy gain the substrate was designed to deliver.
  4. Plumb the per-vertical RLHF-data pipeline into the routing-matrix substrate — the per-vertical RLHF-data substrate is upstream of the per-vertical routing-matrix decision the buyer's FY27 plan encodes (Gemini 2.5 Pro Deep Think vs. Opus 4.7 vs. GPT-5.5 vs. open-weights-frontier). The per-vertical RLHF data the buyer collects has to be portable across the model-vendor-substrate boundary, because the export-suspension volatility the FY27 plan now grades against (Fable 5 / Mythos 5 on June 12, the next model-vendor-availability shock the procurement function has to defend against) means the per-vertical training-data substrate cannot be locked to a single model vendor's fine-tuning API. The substrate-portability filter is the routing-matrix-portability filter the agent-framework slot already encoded; the AI-training slot now has to encode it too.

What the make-versus-buy line item looks like for the FY27 plan

For the buyer whose per-vertical failure tail is dense in regulated-industry domain expertise the commodity bench cannot grade, the make-versus-buy line item resolves against the premium-domain-expert pool: the per-minute rate is multiples of the commodity-bench rate but the substrate-pricing-power matches the per-vertical failure-tail-grading capacity the buyer's senior-judgment overlay requires, and the managed-service-contract shape lets the buyer scale the per-vertical RLHF data pipeline against the FY27 plan's vertical-product-launch cadence without absorbing the senior-domain-expert recruiting-and-retention envelope into the in-house headcount plan.

For the buyer whose per-vertical workload is dense in commodity labeling tasks the per-task quality envelope can grade, the make-versus-buy line item resolves against the commodity-bundled-bench — the per-task rate is the right pricing-substrate, the per-vertical failure tail is shallow enough that the per-task quality envelope catches the failure mode, and the bundled-tier pricing maps to the FY27 plan's volume-line-item shape.

For the buyer whose per-vertical workload sits across both — most regulated-industry buyers in financial services, healthcare, public sector, insurance, life sciences, regulated-manufacturing, and regulated-utilities — the make-versus-buy resolves against a substrate-composition the FY27 plan encodes: commodity-bundled-bench for the per-vertical-easy-tail (the volume tier the per-task quality envelope grades), plus premium-domain-expert pool for the per-vertical-hard-tail (the senior-judgment-overlay tier the per-minute-rate-annotator-pool grades), plus in-house senior-domain-expert headcount for the routing-matrix-overlay-calibration the buyer cannot outsource because the calibration cadence is the per-vertical-procurement-credibility asset the buyer's FY27 plan owns end-to-end.

What the substrate-composition pull-forward means for the buyer's pipeline

The revenue inversion is the market-pricing signal that the premium-domain-expert pool already won the regulated-industry tail. The four-component diligence sprint is the work that translates the signal into the buyer's per-vertical FY27 line item. The integrator pipeline that has the per-vertical failure-tail mapping capacity, the per-vertical RLHF-substrate-grading capacity, the per-vertical senior-judgment-overlay calibration capacity, and the routing-matrix-substrate-portability plumbing capacity at the same time is the integrator whose FY27 calendar is already booked against the buyer's downstream pull-forward — and the buyer that has not retained the integrator with the diligence-sprint capacity at scale is the buyer whose FY27 plan ships the substrate-mismatch loss to the per-vertical hard tail while the buyer down the road compounds against the substrate-composition advantage.

The AI-training-data slot just got its incumbent dethroned at the FY26 budget cycle. The buyer's FY27 plan has to encode the substrate-composition the per-vertical failure tail actually requires — not the substrate the FY26 line item defaulted to. The diligence sprint is the work that closes the gap; the buyer that runs it honestly wins the per-vertical accuracy curve, and the buyer that defers it loses the per-vertical accuracy gain to a substrate-mismatch the senior-judgment overlay was never given the gold set to catch.