Ensayos y notas de campo sobre IA, ingeniería de software, diseño y el oficio de construir equipos de producto que entregan. Escrito por los ingenieros que hacen el trabajo. Publicaciones en inglés.

Deloitte's State of AI in the Enterprise puts agent deployment at 97% of organizations, but only 23% report significant ROI from those agents and just one in five has a mature governance model for autonomous workflows. The gap isn't model capability — it's the integration, eval, and governance scaffolding that nobody buys and everyone needs.
Leer el artículo
As of May 6 the Claude Code marketplace lists 4,200+ skills, 770+ MCP servers, 2,500+ marketplaces, and 9,000+ plugins total. Prismatic Skills shipped May 4 explicitly to make integration work — auth, multi-tenant deploy, webhooks, connectors — feel native inside Claude Code. The ecosystem has tipped: the IDE is the model's runtime, but the integration layer is now where the differentiation lives, and most of it ships outside Anthropic.

On May 5 Anthropic shipped ten reference agent templates for financial services and insurance — Pitch builder, Earnings reviewer, Month-end closer, KYC screener, and seven more — bundled with Excel/PowerPoint/Word/Outlook context-sharing and a Moody's data partnership. Read alongside Vals AI's 64.4% Finance Agent benchmark for Opus 4.7, the move is less about ten more agents and more about making the template — skills + connectors + subagents bundled together — the unit of vertical procurement.

At Google Cloud Next '26, Google folded Vertex AI into the Gemini Enterprise Agent Platform — one surface for 200+ models including direct access to Anthropic's Claude Opus, Sonnet, and Haiku. Multi-model is now the platform default. The integration question (which 46% of orgs still cite as their #1 deployment block) doesn't go away — it just moves up the stack to routing and governance.

Xcode 26.3 added native integration for Anthropic's Claude Agent and OpenAI's Codex inside the IDE, plus exposed Xcode's own capabilities as an open Model Context Protocol server. Apple is the most conservative IDE vendor in the industry. When the conservative vendor ships agentic coding as a default surface, the category isn't experimental anymore — it's platform infrastructure.

Cursor 3 replaced the Composer pane with a full-screen Agents Window built for parallel execution across local, worktree, SSH, and cloud sessions. Cursor 3.2 added /multitask sub-agents two weeks later. The IDE is no longer where a developer types — it's the runtime where a fleet of agents executes. The teams that win this cycle are the ones who already have a scaffold, a routing layer, and an eval suite to point at it.

A new analysis of 12 major agentic benchmarks found a 37% gap between lab scores and production deployment performance, with up to 50x cost variation for similar accuracy and validity issues affecting 7 of 10 widely-cited evals. The work that closes that gap is not bigger benchmarks — it's task-specific eval suites grounded in domain expertise.