Google Antigravity 2.0 and Managed Agents Land at I/O 2026 — Agent Execution Is Now an API Primitive, and the Build-vs-Buy Line Just Moved

The release, in one paragraph

At Google I/O 2026 on May 19, Google shipped Antigravity 2.0 as a standalone, agent-first desktop application — separate from the existing Antigravity IDE and built entirely around agent orchestration — alongside an Antigravity CLI for terminal users and an Antigravity SDK that lets developers define and host custom agents on their own infrastructure. The piece that matters most for production teams is Managed Agents in the Gemini API: with a single API call, a developer can spin up an agent that reasons, calls tools, and executes code inside an isolated Linux environment, powered by the Antigravity agent harness built on Gemini 3.5 Flash, exposed through the Interactions API and Google AI Studio. Gemini 3.5 Flash — the engine underneath it — reportedly outperforms last quarter's Gemini 3.1 Pro across most benchmarks while running roughly four times faster, which is the property that makes a per-call managed agent economically and latency-wise viable. Enterprise customers get Antigravity wired into Google Cloud through the Gemini Enterprise Agent Platform.

The surprising line isn't "Google goes agentic." Every frontier vendor has been shipping agent surfaces for a year. The surprising line is which layer Google chose to commoditize: not the model, not the editor, but the execution substrate — the sandbox, the harness, the tool loop, the isolation boundary that every team building an internal agent platform spent the back half of 2025 constructing by hand. Managed Agents turns that substrate into a primitive you rent by the API call. That collapses weeks of platform engineering into a single request, and it quietly redraws the build-vs-buy line under every "agent platform" initiative a team kicked off last year. The interesting question for 2026 stops being "can we run an agent in a safe sandbox." It becomes "if the runtime is rented, what do we still need to own — and where does renting it create a dependency we'll regret."

Why a managed agent runtime is the build-vs-buy decision, not a convenience API

For the last eighteen months, "we're building an agent platform" has meant, in practice, building the unglamorous substrate: a sandbox to execute model-generated code without it touching the host, a harness that runs the reason-act-observe loop, a tool-invocation layer, resource limits, teardown, and the isolation guarantees the security team needs before any of it can touch real data. Managed Agents offers that entire substrate as a single call. That's not a convenience wrapper. It's a direct challenge to the decision a hundred platform teams made in 2025 to build it themselves.

The sandbox was the hard part, and it just got commoditized. Executing untrusted, model-generated code safely — isolated filesystem, network egress control, resource caps, clean teardown, no lateral movement — is genuinely difficult infrastructure, and it's where most internal agent platforms sank their first two quarters. A managed runtime that ships that isolation as a default removes the single biggest reason to build rather than buy. Teams sitting on a half-built sandbox should be asking, honestly, whether the remaining work is differentiated or whether they're now reinventing a rented primitive.

"Single API call" changes the unit of the platform. When standing up an isolated, tool-using, code-executing agent is one request, the agent runtime stops being a system you operate and becomes a call you make. That's the same transition serverless made for compute — and it has the same consequence: the value migrates out of running the substrate and into everything wrapped around it. The orchestration, the tool catalog, the eval and supervision layer, the domain logic — that's what you still own. The runtime is no longer where you add value; it's where you stop having to.

The SDK-on-your-own-infra option is the tell that this is a real platform play. Google didn't only ship the hosted managed runtime; it shipped an SDK to host custom agents on your own infrastructure. That two-pronged shape — rent the runtime or self-host with the same harness — is what platforms do when they're trying to become the default substrate rather than just another hosted product. It also gives architects a genuine choice instead of a lock-in: prototype on Managed Agents, move latency- or compliance-sensitive workloads to the self-hosted SDK, keep one mental model across both.

Why Gemini 3.5 Flash being the engine is the part that makes it work

A managed per-call agent runtime is only viable if the model underneath is fast and cheap enough that spinning one up per task isn't prohibitive. That's the role Gemini 3.5 Flash plays here, and it's why the model choice matters more than the benchmark headlines suggest.

Speed is what makes per-call agents economical. An agent that runs a multi-step reason-act loop multiplies the model's per-step latency by the number of steps. A model that's four times faster turns a loop that would have felt sluggish into one that feels interactive, and turns a per-call cost that would have been prohibitive into one a team will actually run thousands of times a day. The harness being built on a fast model isn't a footnote — it's the enabling condition for the whole "single API call" pitch.

"Beats last quarter's Pro across most benchmarks while running 4x faster" is the commoditization story in one line. When the fast, cheap model from this quarter outscores the flagship model from last quarter, the model layer is converging — and the differentiator moves up the stack to the harness, the tooling, and the eval suite. Antigravity is Google's bet that the action in 2026 is in the agent substrate and the developer experience around it, not in another point of benchmark lead. That bet looks correct, and it's the same bet the rest of the frontier is quietly making.

A fast model in a loop hides the eval problem until it bites. The flip side: a fast agent that runs many steps cheaply will take many steps — and a fast wrong answer is still wrong, just produced sooner and at scale. The speed that makes per-call agents viable also makes it cheaper to run a flawed agent thousands of times before anyone notices the trajectory was bad. The runtime got faster; the eval-and-supervision burden didn't get smaller.

What this actually changes for production teams

Prototyping an agent collapses from a project to an afternoon. The first viable version of an internal agent — isolated execution, tool use, code running safely — used to be a platform-team milestone. With Managed Agents it's a single call and an afternoon. That's a real acceleration for the exploration phase, and teams should use it: prove the workflow with the managed runtime before committing to any infrastructure.

The differentiated work moves up the stack. When the runtime is rented, your moat is the tool catalog you expose, the orchestration logic that sequences the agent's work, the eval rubrics that grade its trajectories, the supervision UI your reviewers use, and the domain knowledge encoded in all of it. None of that ships with the managed runtime. The teams that win treat the rented substrate as a foundation and pour their engineering into the layers above it.

Build-vs-buy needs a re-decision, not a default. A team mid-flight on a hand-built agent platform should run the honest comparison now: what does our sandbox do that the managed runtime doesn't, what would migrating cost, and what's the dependency we take on if we rent. For some workloads — compliance, data residency, extreme latency, deep custom isolation — self-build or the self-hosted SDK still wins. For most exploratory and internal workloads, the managed runtime is the faster path. The point isn't that buy always wins; it's that the decision changed this week and deserves a fresh look.

The harness choice becomes a portability question. Building on the Antigravity harness — hosted or via the SDK — means adopting Google's agent abstractions. That's fine, and the SDK softens the lock-in by letting you self-host the same harness. But architects should keep the orchestration and tool-definition layer thin enough that the harness underneath could be swapped, because the multi-vendor reality of enterprise AI means you'll likely run more than one.

What it doesn't change

Renting the runtime doesn't rent you the eval. The managed runtime executes the agent safely; it does not tell you whether the agent did the right thing. The eval rubric, the trajectory grading, the human-in-the-loop review for low-confidence runs — all of that is still yours to build, and it's still the part that decides whether the agent is production-safe.

Isolation is a runtime property, not a permission policy. Managed Agents isolates execution — the code can't escape the sandbox. It does not decide what the agent is allowed to reach: which tools, which data, which production systems. That permission scoping is your security team's call, made before the agent touches anything real, regardless of how good the sandbox is.

A faster, cheaper agent is easier to misuse at scale. The economics that make per-call agents attractive also make it trivial to deploy one broadly before it's been properly evaluated. The runtime removes the infrastructure friction that used to act as an accidental brake on premature rollout. Replace that accidental brake with a deliberate one — an eval gate — or the speed becomes a liability.

The managed runtime is a dependency like any other. Renting your agent execution substrate from a single vendor is a real dependency with a real failure surface: pricing changes, deprecations, regional availability, the vendor's roadmap. The SDK's self-host option mitigates it, but the team that builds deeply on the hosted runtime without a fallback plan is taking a concentration risk it should name out loud.

Where we'd push back on the framing

"Single API call" describes the happy path, not the production path. Spinning up an agent in one call is real and impressive. Running that agent reliably in production — against your tools, your data, your latency budget, your compliance constraints, with observability and rollback — is the same multi-quarter discipline it always was. The demo collapses to one call; the production system does not. Budget for the gap.

"Agent-first desktop app" is a workflow bet that won't fit every team. A standalone agent-orchestration desktop app is a strong opinion about how developers should work — orchestrating agents rather than editing in an IDE that happens to have an agent. Some teams will find that natural; others will find it a context-switch away from the editor where their actual work lives. Don't adopt the workflow because the platform is good; adopt it if the workflow fits how your team actually builds.

The benchmark-beating-Flash story is real and also a moving target. "This quarter's Flash beats last quarter's Pro" is true and it's also the kind of claim that's true every quarter now. It's a reason to build on the converging model layer rather than chase leads — not a reason to assume Google's harness is permanently ahead. The harness you build on this year is a bet on this year's substrate; keep the orchestration layer portable.

"Host custom agents on your own infrastructure" still routes through Google's harness. The self-hosted SDK reduces data-residency and latency lock-in, which is genuine. It does not make you vendor-neutral — you're still building on Google's agent abstractions, just running them on your own metal. Read the SDK as "lower lock-in," not "no lock-in," and architect accordingly.

What we'd build differently this week

Run a one-afternoon spike on Managed Agents. Pick the internal agent your team has been meaning to build and stand up a prototype on the managed runtime. The point isn't to ship it — it's to calibrate how much of your planned platform work just became a rented primitive.
Re-open the build-vs-buy decision for any in-flight agent platform. Honestly compare your hand-built sandbox to the managed runtime: what's differentiated, what's reinvention, what would migration cost, what dependency would renting create. Make the call deliberately now that the option set changed.
Keep your orchestration and tool layer harness-agnostic. Whatever you build on top — tool definitions, sequencing, eval hooks — design it so the harness underneath (Antigravity hosted, Antigravity SDK, or a competitor) could be swapped. Multi-vendor is the enterprise reality; portability is cheap insurance.
Build the eval gate the runtime doesn't ship. The managed runtime executes safely; it doesn't grade correctness. Stand up the trajectory eval and the human-review surface for low-confidence runs before you let a fast, cheap agent run at scale.
Scope agent permissions before the first real run. Isolation handles "can't escape the sandbox." You still have to decide which tools, data, and systems the agent may reach. Write that policy and enforce it where the agent is configured, not after the first incident.

Sonnet Code's take

Antigravity 2.0 and Managed Agents are the moment agent execution stopped being infrastructure you build and became a primitive you rent. That's genuinely good news — the hardest, least differentiated part of building an agent platform just got commoditized, and the prototyping loop collapsed from a quarter to an afternoon. The trap is reading "the runtime is solved" as "the agent is solved." The runtime was never the part that decided whether your agent was safe, correct, or worth shipping. That was always the eval rubric, the tool catalog, the supervision layer, and the domain judgment wrapped around the runtime — and all of that is exactly what the managed primitive does not hand you.

That's where our work lives. AI development at Sonnet Code is the engineering above the rented runtime — the orchestration logic, the tool catalog, the eval gate wired into your pipeline, the supervision and observability surfaces, the harness-agnostic abstraction that keeps you from getting locked to one vendor's substrate. AI training is the senior-practitioner side: the engineers and domain experts who author the trajectory rubrics, the failure-mode catalogs, and the correctness criteria that decide whether the agent did the right thing — the judgment the managed runtime executes against but can't supply. If your team watched I/O this week and started wondering whether your half-built agent platform is now reinventing a rented primitive, the next conversation isn't about whether to adopt Managed Agents. It's about what you still own when the runtime is rented, and who builds the eval and orchestration layer that turns a one-call agent into a production system you can defend.