Google I/O 2026 Shipped Antigravity 2.0, Gemini 3.5 Flash, and a Managed Agents Tier in the Same Week. The Agent Runtime Just Stopped Being a Build-Your-Own Decision.

What shipped in one week

At Google I/O 2026 on May 19, Google released Gemini 3.5 Flash as generally available — 76.2% on Terminal-Bench 2.1, 83.6% on MCP Atlas, 84.2% on CharXiv Reasoning, and 4× faster output per token than any other frontier model on agentic benchmarks. Within the same keynote window, Google also shipped Antigravity 2.0: what was a single AI-powered IDE last year is now a platform with a standalone desktop application, the Antigravity CLI (rewritten in Go, with the existing Gemini CLI deprecated on June 18), an SDK that exposes the same agent harness powering Google's own products, and — the piece most teams will sleep on and then have to rip out their own infrastructure to adopt — a Managed Agents tier in the Gemini API in public preview.

Managed Agents lets a developer hand the Gemini API a multi-step task and have it executed inside a Google-hosted, sandboxed Linux container with file system, shell, browser, and computer-use tools attached. The container is stateful for the duration of the run, log-stream observable, and billed per agent-minute. The new general-purpose antigravity-preview-05-2026 agent autonomously plans, reasons, writes and executes code, manages files, and browses the web inside that sandbox. None of those primitives are individually new — open-source projects have built versions of each — but Google now ships them as a single managed product behind one API, with a frontier model wired in by default.

The agent runtime just stopped being a build-your-own decision

For the eighteen months between mid-2024 and the first half of 2026, the dominant engineering pattern for serious AI features looked roughly the same across every shop large enough to have an AI platform team: assemble the runtime yourself. A sandboxed VM service (Firecracker, Fly Machines, E2B, sometimes a home-grown Kubernetes pattern). A tool harness on top of it (your own MCP servers, your own scheduling logic, your own retry and timeout policy). A logging and audit pipeline glued to both. And finally a model API call into Claude or GPT or Gemini that orchestrated the whole thing through structured prompting and tool calls.

It worked. It also produced an enormous amount of load-bearing custom infrastructure — code that didn't do anything specific to the company's product, but that the team owned because there was nowhere to buy it from.

What Antigravity 2.0 + Managed Agents shipped is exactly that infrastructure, sold as a vendor primitive. The sandbox isn't yours anymore unless you actively choose for it to be. The agent harness isn't yours. The scheduling, the parallelism caps, the file system isolation, the browser session — Google now ships all of it behind a single API endpoint, with a frontier model already wired in, billed as a managed service. The build-vs-buy question that was settled in 2024 by the absence of a credible vendor offering is now back open, and the credibility of the vendor offering just changed shape overnight.

This is the second platform absorption of the quarter — Anthropic's Claude Opus 4.8 Dynamic Workflows did the same thing for the orchestration layer two weeks ago. The pattern is the one every previous platform shift has followed: the primitive gets absorbed by the platform vendor; the differentiation moves one layer up.

What this changes for build-vs-buy

Three concrete decisions any team running its own agent infrastructure in mid-2026 needs to make explicitly, not by default.

Is the sandbox part of your product, or part of your stack? If your team is using sandboxed code execution as a feature you sell to customers — a notebook environment, a code-review surface, a sandboxed playground in your own UI — the sandbox is part of your product, and you should keep owning it because the user-facing behavior is yours to differentiate on. If you're using it internally as a substrate for agents that ship code into your repo, run migrations, or browse the web on a developer's behalf, the sandbox is part of your stack, and the case for renting Google's (or Anthropic's, or whatever comes next) just got much stronger.

Is your orchestration logic worth the maintenance cost? Most homegrown agent orchestration code falls into one of two categories: thoughtful, domain-specific scheduling that encodes your team's actual judgment about how to fan work out (worth keeping), and undifferentiated glue code that reinvents retry, parallelism, and timeout (now competing with a managed product that ships those for free). The audit doesn't take long. The honest version of it will identify five to ten thousand lines of code in most platform teams' agent layer that are now maintenance liabilities rather than moats.

How portable is your agent layer between runtimes? The risk in adopting Managed Agents isn't the price per minute. It's that the harness is Google's, the sandbox semantics are Google's, the tool-call format optimizations are Google's, and a year from now your agent code reads as if Gemini is the only model that could ever run it. Portability between Managed Agents, Claude's runtime, and OpenAI's Codex agent surface is a design choice you have to make explicitly, and the design is write your agents against MCP plus a thin runtime adapter layer. The teams that skip the adapter will get the fastest time-to-market and the highest switching cost when the relative-capability ranking flips.

Where the lock-in lives now

The interesting thing about Antigravity 2.0 as a platform is that the lock-in surface moved. It used to live in the API — call signatures, tool schemas, response formats. With MCP standardized and adopted by every major vendor, the API surface is largely portable now. The new lock-in lives above the API, in three places.

The IDE. Antigravity 2.0's desktop app and CLI are designed so the agent has rich context about the developer's editor, recent edits, terminal output, and active branches. The richness of that integration is the moat. A developer who has spent six months building muscle memory around Antigravity 2.0's UI doesn't switch IDEs over a 3% benchmark gap in a competing model.

The agent harness. Managed Agents' sandboxed environment, file system, browser, and computer-use tooling are a specific implementation. Code written to use those exact tool semantics works in Managed Agents and works less well, or not at all, in a competing runtime. The harness is not standardized the way the model API is.

The observability and audit surface. Google's pitch to enterprise teams will increasingly be: run your agents on Managed Agents, get an enterprise-grade audit trail, compliance certifications, and incident review tooling that ships with the product. That bundle is hard to replicate with a custom runtime — and the more your security and compliance teams come to rely on the vendor's audit surface, the harder it is to swap.

This is not a reason to avoid Antigravity 2.0. It is a reason to adopt it with eyes open about which parts of your stack you're now renting and what it costs to leave.

Where Sonnet Code fits

A managed agent runtime from a frontier-model vendor is the easy half of the story. The hard half is the engineering above and around it that turns a vendor primitive into a portable, evaluatable, governable production capability. AI development at Sonnet Code is that engineering: auditing your existing agent infrastructure for what's now a maintenance liability vs. what's actually a moat, designing the thin adapter layer that lets your team adopt Managed Agents without becoming a Gemini-only shop, and building the observability and rollback design that turns 1,000-subagent runs into operations you can actually defend in a postmortem. AI training is the human-judgment half: senior engineers and domain experts who design the evaluation harnesses that make the new agent runtime measurable in your domain, run the adversarial review that catches the failure modes Google's general-purpose evals would never see, and stand up the senior-reviewer queue that scales with agent throughput instead of becoming the bottleneck.

The agent runtime is now a vendor product. The layer above it — where portability, evaluation, and trust live — is still yours to build. That's where the next twelve months of leverage are.