Sonnet Code
← Volver a todos los artículos
AI Development3 de junio de 2026·10 min read

Anthropic Shipped Self-Hosted Sandboxes and MCP Tunnels for Claude Managed Agents on May 19. The Enterprise Hybrid-Orchestration Pattern Just Became a Real Product — and the "Our Data Can't Leave the VPC" Blocker That Killed Three of Your Agent Pilots Last Year No Longer Applies.

What shipped on May 19

On May 19, 2026 — the second day of Anthropic's Code with Claude London event, the company's first developer conference outside the United States — Anthropic shipped two infrastructure features for Claude Managed Agents that fundamentally reshape the enterprise deployment surface for agentic AI.

Self-Hosted Sandboxes (public beta). Tool execution — the part of an agent's work where it runs code, hits internal APIs, queries databases, modifies files — moves to an environment the customer configures: the customer's own VPC, an enterprise account on a managed sandbox provider (Cloudflare, Daytona, Modal, or Vercel were the launch partners), or a custom runtime. The agent orchestration loop — the model calls, the context management, the planning, the error recovery, the trajectory tracking — stays on Anthropic's infrastructure. The two planes communicate through a structured protocol; the customer's data and tool surface live on the customer's side of the line.

MCP Tunnels (research preview). Claude agents can now reach MCP servers inside the customer's private network without those servers being exposed to the public internet. The mechanism: the customer deploys a lightweight gateway inside their network; that gateway makes a single outbound connection to Anthropic's plane and waits for tool calls; no inbound firewall rules, no public endpoints, and traffic encrypted end to end. Internal databases, private APIs, knowledge bases, ticketing systems, code repositories — anything that previously could not be exposed to a cloud-hosted agent without enterprise-security pushback — becomes a tool the agent can call, without leaving the customer's network perimeter.

These two features ship together because they are the same architecture viewed from two angles. The pattern is hybrid orchestration: the vendor runs the agent loop, the customer owns the tool execution and data surface, and the boundary between the two is a structured protocol that the customer's security team can reason about.

The release also rounds out the May Code with Claude announcement series — Dreaming (the scheduled memory-curation process), Outcomes (measurable success conditions for cloud-hosted agents), and Multiagent Orchestration (lead agent plus parallel subagents with shared filesystem and persistent events) — shipped on May 6. The May 19 follow-on is the enterprise infrastructure layer on top of that runtime, and it's the one that unblocks the deployments that the May 6 features alone couldn't reach.

The deployment gap this closes

To understand why this matters, it helps to be honest about what killed the first wave of enterprise agentic-AI pilots through 2024 and most of 2025. The conversation in the executive briefing was almost always the same. We see the capability. We see the productivity case. We need the agent to read from this database, that ticketing system, that internal code search. We can't expose any of those to a cloud-hosted agent. We can't run the model ourselves either — we don't have the GPUs, we don't have the eval discipline to operate a frontier model in production. So we'll wait until the deployment shape exists.

Three concrete shapes of the deployment shape doesn't exist:

The data-residency block. A regulated enterprise — financial services, healthcare, public sector, defense — has a contractual or regulatory obligation that customer data cannot transit a cloud provider's plane. A cloud-hosted agent that reads the customer database into its context window violates that obligation by default. The mitigation — route the read through an enterprise gateway that proxies the data into the agent context — is half a solution; the agent still has the data in its working memory on the vendor's plane.

The private-network block. An enterprise's most useful internal systems — Jira, GitHub Enterprise, internal Confluence, Snowflake, the corp SSO — are deliberately not internet-reachable. Exposing them to a cloud-hosted agent requires a firewall change, a security review, and the documented risk of an inbound API surface to a critical system. Most security teams say no for good reasons.

The audit-and-attribution block. An agent running on a vendor's plane that takes actions on customer systems generates audit-log entries on the vendor's side, the customer's side, or both. Reconciling those entries for an incident review six months later is hard. The who actually did this action question — which is exactly the question the EU AI Act and every analog regulatory regime is asking — gets harder, not easier, when the action flows through a third-party orchestrator.

Self-Hosted Sandboxes plus MCP Tunnels closes all three blocks in the same architecture. Tool execution and data access happen inside the customer's perimeter. The MCP tunnel means the private network surface stays private — no inbound exposure, no firewall changes, no public endpoint. The audit log on the customer's side is the canonical record of what the agent did because the actions happened on the customer's plane; the audit log on Anthropic's side is the record of what the agent thought and decided, useful for trajectory review but not the system of record for the action itself. The two records compose into a defensible end-to-end account of agent activity that satisfies both the customer's security model and the regulator's expectation.

Why the timing is the story

Two structural pressures arrived in the same six-week window, and they are pushing in the same direction.

The EU AI Act high-risk deployer obligations go live on August 2, 2026 — about sixty days from the release date. The high-risk obligations include human oversight that meets a functional standard (the qualified human must actually be able to intervene), automated event logging retained for the lifetime of the system, and serious-incident reporting on a 15-day clock. An agentic deployment whose action plane lives on the vendor's infrastructure makes every one of those obligations harder to satisfy because the customer-side audit trail is incomplete. An agentic deployment whose action plane lives on the customer's own infrastructure makes the same obligations satisfiable with the audit infrastructure the customer already runs for the rest of their stack.

The pricing pressure on agent inference is rising. GitHub Copilot moved to usage-based billing on June 1; Anthropic and OpenAI are running a coding-tool price war; the customer that pays per token is the customer that has the strongest interest in placing the tool-execution and context-assembly work where it can be optimized, observed, and rate-limited inside their own perimeter. The Self-Hosted Sandbox pattern doesn't lower the per-token cost of the model itself — those tokens still bill on Anthropic's plane — but it does give the customer architectural control over what gets pulled into the context window in the first place, which is the largest single lever on the per-task cost.

The combination produces a deployment surface that, six months ago, the procurement conversation had to invent from scratch and stitch together with a long professional-services engagement. Anthropic just shipped it as a product. Every other agent platform vendor — OpenAI, Google, the open-weights orchestrators — is now under direct pressure to ship the same pattern within the next two quarters, because the customers their sales teams are talking to have been told, on a vendor stage, that the pattern exists.

What to put in front of the platform team this quarter

Five concrete pieces of work, in roughly the order they pay back.

Inventory the agent pilots that were killed for security or data-residency reasons. Every organization has a list. Most lists are not written down. The work this quarter is to write down the list, with the specific blocker for each pilot — the agent needed access to system X, which can't be exposed — and grade each one against the new deployment surface. The pilots that died because of the private-network block are now buildable. The pilots that died because of the data-residency block are now buildable. The list of now-buildable pilots is the platform team's prioritization input for the next quarter.

Stand up an MCP server discipline before the tunnel surface goes live. An MCP tunnel into your private network exposes whatever your MCP servers expose to the agent on the other end. If your MCP servers are scoped loosely — expose everything in the database, the LLM is smart enough to figure out what it needs — the tunnel is a sharper version of the same risk that killed the previous pilots. The discipline is deliberately narrow MCP servers per use case, with scoped permissions, structured audit identity on every call, and per-agent rate limits. That discipline is the prerequisite for the tunnel actually being a security upgrade over the alternative, not a faster path to the same exposure.

Wire the customer-side audit log to the Anthropic-side trajectory log on a stable trace ID. Hybrid orchestration produces two audit streams: one on the customer's plane (what the tools did, with what arguments, against what data) and one on Anthropic's plane (what the agent decided, with what context, against what objective). The reconciliation is trace ID propagation: every tool call carries a stable identifier that ties it back to the agent trajectory that produced it, and the two logs are joinable in your existing SIEM. Standing up that join is a one-week engineering job done in advance; it is a two-month forensics nightmare done after the first incident.

Refresh the prompt-injection and tool-call adversarial review for the tunnel surface. A misbehaving or compromised MCP server on the customer's side of the tunnel is a place where the agent can be tricked into actions the customer didn't intend, and the tunnel surface concentrates risk on the customer's perimeter rather than diluting it across a public API. The defense is real adversarial review of every MCP server in the tunnel chain — what can an attacker who controls one upstream input cause the agent to do? — and the senior-review queue that catches the cases where the answer is more than we'd like.

Negotiate the procurement contract with the new deployment surface explicitly priced in. The enterprise contract for cloud-hosted agentic AI that your procurement team signed six months ago does not assume Self-Hosted Sandboxes and MCP Tunnels. The renewal in twelve months will. The team that walks into the renegotiation with a written hybrid-orchestration architecture and a procurement ask that prices the orchestration plane separately from the inference plane will get a meaningfully different contract from the team that walks in with the same SKUs as last year.

What hybrid orchestration does not solve

Three honest caveats.

It does not solve the trust gap with the orchestration vendor. Self-Hosted Sandboxes mean tool execution happens on the customer's plane; the agent loop still happens on Anthropic's. The data that flows into the model's context window is the data the customer-side tool execution sent there, and the discipline of which data flows into the context window is the customer's discipline to write. A pattern that exfiltrates data through context payloads to the vendor's plane defeats the architectural point. The eval question is which fields of which records does the agent actually need in context, and what's the minimum subset that satisfies the workflow — and that question is harder to answer than the architectural choice that enabled the deployment in the first place.

It does not eliminate the operational burden of running the customer-side plane. The sandbox runs on the customer's infrastructure; the customer's platform team owns its uptime, scaling, and security. The MCP tunnel runs on a customer-deployed gateway; the customer's platform team owns its uptime, scaling, and security. The hybrid-orchestration pattern moves work to the customer, not away from it. That trade is the right one for regulated and security-sensitive workloads; it is not free.

It does not replace the eval discipline at the boundary. The model on Anthropic's plane is still the same Claude model — Sonnet, Opus, soon Mythos — and the eval discipline that grades whether it's actually right on your workload still applies. Hybrid orchestration changes where the data sits; it does not change whether the model's outputs are correct, well-calibrated, or safe at your workload's tail. The harness, the gold sets, the senior-review queue all stay load-bearing.

Where Sonnet Code fits

Hybrid orchestration shipping as a real product is the easy half of the story. The hard half is the engineering above the architecture — the MCP server design, the scoped tool surface, the trace-ID propagation between vendor-side and customer-side audit logs, the adversarial review of the tunnel chain, the eval harness that grades the agent on the workload that actually got built — that turns the architecture is possible into the deployment is defensible under a regulator inquiry, a security review, and a customer audit. AI development at Sonnet Code is that engineering: designing the MCP servers that expose the right scoped surface of your internal systems, deploying the tunnel gateway and the sandbox runtime inside your perimeter on the providers you already trust, wiring trace-ID propagation into your existing observability stack, and standing up the per-agent rate-limiting and structured audit identity that satisfies the AI Act's lifetime-retention requirement. AI training is the human-judgment half: senior security engineers, domain experts, and regulatory specialists who run the adversarial review on the tunnel chain, design the rubrics that decide which actions are autonomous and which escalate to human review, calibrate the senior-review queue for the failure modes a private-network agent surface introduces, and stand up the gold sets that grade the agent honestly against the customer's workload — not the public benchmark.

The hybrid-orchestration era is open. The agent pilots that died in 2024 because the deployment shape didn't exist are now buildable. The procurement conversation that starts next month is the one where the platform team that put in the work this quarter walks in with an architecture, a roadmap, and a defensible compliance story — and the platform team that didn't walks in saying we'll figure it out in Q4. That gap compounds. The work to close it starts now.