The Agent-Native Landscape — Abhishek Shankar's Blog

This is a working map of what is actually being built, regulated, financed, and contested in the agent-native landscape — organized into seven families, fifty-six dimensions, and eleven meta-patterns that cut across them.

It is not a survey. It is a position. Every dimension here is one I think matters; the absence of something common ("prompt engineering," say) is deliberate. The map is versioned because the landscape moves quickly and being honest about what changed is part of the discipline. Current version: v2.0.

Use it as a reference. Quote it. Disagree with it. Send corrections.

Eleven meta-patterns

These are the threads that cut across families. If you only read one section of this page, read this one.

Agent OS Convergence. Persistent machines, background agents, and durable execution are converging into something that looks like an operating system for agents — not a library, not a framework, a runtime.

The Definitional Gap. Regulators are governing agentic workflows with paradigms designed for static, single-turn models. Every major framework — SR 11-7, EU AI Act Article 6, Colorado AI Act — is being patched in real time to acknowledge that an agent is not a model.

Downside-Routing Fabric. Action class → confidence threshold → liability route → insurance tier → human fallback. The plumbing that decides what an agent is allowed to do alone, and what happens when it gets it wrong, is becoming its own layer.

Compliance as Differentiation. Durable execution, identity, provenance, autonomy levels, and insurance are being bundled into a regulatory moat. The platforms that make this legible to procurement will win regulated markets.

Verification Renaissance. SMT solvers, zkML, and mech-interp probes are being deployed as oversight backends. Verifiability is the new differentiator — not benchmarks.

Model Convergence Pressure. As frontier models converge in raw capability, gains shift to topology, rubrics, memory, and scaffolding. The model is no longer the moat.

Benchmark Contamination. SWE-bench, GAIA, WebArena are saturated and contaminated. Evaluation is shifting to mutated, dynamic, professional-task benchmarks — and to live trajectory monitoring in production.

Agent Identity Crisis. Bearer tokens are cracking under agentic load. Cross-domain agent trust — NHI, AAuth, WIMSE, ERC-8004 — is the #1 protocol-level frontier of 2026.

Reasoning as Billing Axis. Test-time compute is a UX knob, a billing tier, and a research objective at the same time. "How hard should the agent think" is now a product surface.

Orchestration to Runtime. Libraries are being replaced by runtimes — with checkpointing, pause/resume, durable state, cost meters. Temporal at $5B is the bellwether.

Payments Racing Regulation. AP2, x402, MPP — the payments stack is shipping agent standards faster than any regulator can keep up. Payments will pull governance forward, not the other way around.

Family 1 — Runtime & Execution

How agents actually run. The infrastructure layer.

Durable Execution. Graph runtimes with native checkpoint and replay. Temporal, LangGraph, Restate, DBOS. Long-running agents need to survive process death; orchestration libraries don't, runtimes do.

Persistent Agent Machines. Long-lived BYO machines that retain filesystem, memory, and credentials across sessions. Factory's Droid Computers and Cursor's self-hosted cloud agents are the leading shape.

Sandbox & Code Execution. Multi-vendor sandbox economies under standardized manifests. E2B, Modal, Daytona, Cloudflare, Vercel. The OpenAI Agents SDK Manifest is the de facto interop spec.

Confidential Compute. TEEs across CPU, GPU, and NVLink, with attestation-bound agent execution. Blind inference becoming production-viable. The privacy primitive that makes regulated-industry agents possible.

Inference Economics. Speculative decoding, prefix caching, KV cache offload, custom silicon. The economics of serving agents at scale; the silent driver behind every reasoning-tier price change.

Family 2 — Cognition & Capability

How agents think. The capability layer.

Reasoning & Test-Time Compute. Reasoning effort levels, task budgets, parallel test-time compute. The user-facing knob that determines how hard the agent thinks before acting.

Multi-Agent Inference. Multi-agent councils baked inside the model on shared weights, with learned topology and debate. xAI's Grok architecture is the leading example. The orchestration is moving inside the model.

World Models. Scalable cross-domain world models trained on millions of trajectories. Genie, Waymo, WebWorld. The substrate for agents that need to plan in environments more complex than text.

Memory & Continual Learning. Active context budgeting, graph + temporal + tiered memory, skill libraries, continual learning. Letta, Mem0, Zep, Cognee are the contenders. Memory is the difference between an assistant and a collaborator.

Computer-Use & Multimodal. Pixel-level GUI control, voice agents, robotics action spaces. Multimodal action as a first-class capability rather than a bolted-on feature.

Spec-Driven Development. Agents required to draft and validate a spec before writing any code. The spec becomes the audit artifact — the answer to "why did the agent do this."

Family 3 — Trust, Safety & Verification

The deepest family. Twelve dimensions because this is where the actual differentiation will happen over the next five years.

Verifiable Inference (zkML). Mathematical proof that an agent ran the specified policy on the specified input. Jolt Atlas, EZKL. Currently expensive, getting cheaper fast.

Mechanistic Interpretability. Circuit probes deployed as runtime safety controls. SAEs, attribution graphs, defection probes. Interpretability is moving from research artifact to operational primitive.

Activation Steering & Control. Gradient-defined steering vectors, conditional steering, RL-selected SAE features. Behavioral control without retraining.

Constitutional AI & Model Specs. Reason-based constitutions, deliberative alignment, model specs cited at inference time. Anthropic and OpenAI converging on the same shape from different starting points.

Scalable Oversight. Debate protocols, weak-to-strong generalization, sandwich-protocol RL, monitorability. The research agenda for supervising systems smarter than their supervisors.

Adversarial Red-Teaming. Agentic, multi-turn, automated attack discovery as a service. Continuous red-team in CI, not pre-launch theater.

Supply-Chain Security. MCP RCE, skill poisoning, AGENT-BOM, OWASP AST10. The new attack surface; the 2025 incidents are not edge cases.

Agent Identity & NHI. Distinct identity per agent. Scope-attenuated, cryptographically bound credentials. AAuth, WIMSE, ERC-8004, NIST NHI. The #1 protocol-level frontier.

Privacy-Preserving Agents. FHE, KV-cache-aware encryption, federated self-evolution, trajectory-level differential privacy. The path to agents in healthcare, finance, and defense.

Hallucination & Calibration. Behaviorally calibrated RL, agentic uncertainty, confidence as a training objective rather than an inference-time hack.

Agent Drift. Semantic, coordination, and behavioral drift in long-horizon deployments. The thing nobody is monitoring well in production.

Agentic Observability. Trajectory monitoring, agent fleet inventory, cryptographic execution traces. The missing runtime organ; the thing your insurer is starting to ask about.

Family 4 — Protocols & Interop

How agents talk to each other and to the world.

MCP Ecosystem. The USB-C of AI tool integration. Linux Foundation stewardship, capability attestation, managed gateways.

A2A Protocol. Cross-vendor agent-to-agent communication with cryptographic identity. Signed Agent Cards. Salesforce and ServiceNow already in production.

Agentic Commerce. AP2, x402, MPP, UCP, Visa Intelligent Commerce. Payments racing ahead of governance. The fastest-moving standards layer in the entire map.

Capability Discovery. AGENTS.md, NLWeb, Signed Agent Cards, AGNTCY OASF. How agents find each other and figure out what each can do.

Standards Bodies. NIST, W3C, ISO/IEC 42119, OWASP, MITRE ATLAS. Racing to catch up; mostly succeeding in agent identity, mostly behind in agent commerce.

Family 5 — Economic, Legal & Regulatory

What it costs, who is liable, and who decides.

Sectoral Compliance. SR 26-2, EU AI Act Article 6, FDA PCCPs, Colorado AI Act, public-sector continuous oversight. Vertical regulation moves faster than horizontal.

AI Insurance & Risk Transfer. Affirmative AI-failure coverage as a distinct line. Lloyd's, Munich Re, the $750M ATA facility. Insurance is becoming soft regulation.

Liability & Tort. Nippon Life v. OpenAI, Heppner, agent-specific tort cases shaping precedent. The case law is still being written; every published opinion shifts the perimeter.

IP & Copyright. Thaler, NYT v. OpenAI, iterative-human-input doctrine, output-side liability. The training side is settling; the output side is just beginning.

Pricing & Monetization. Outcome-based, per-resolution, per-task pricing. Tokens-as-headcount. Salesforce Agentforce per-conversation is the leading shape.

Capital Markets. Hyperconcentrated mega-rounds, IPO float, M&A reshaping the agent stack. Cognition's Windsurf acquisition and ServiceNow's Moveworks deal as bellwethers.

Sovereign Compute. Multi-gigawatt compute treaties at lab level. Compute supply as a competitive axis. The thing only six companies on Earth get to play.

Family 6 — Organizational & Conceptual

How organizations and people relate to agents.

Autonomy Gradation. Multi-level autonomy taxonomies plus autonomy certificates as procurement primitives. "What level of independence is this agent allowed?" becomes a contractual question.

Workforce & Human-Agent Teaming. Agent roles in orgs (intern → coworker → senior), agent OKRs, productivity redesign, review-debt. The org chart is being redrawn quietly.

Job Displacement. The AI layoff trap, demand-externality automation, Pigouvian automation tax debate. The macro story that everyone has an opinion on and nobody has good data for.

Agent Ethics & Welfare. Model welfare assessments, moral status, retirement interviews, opt-out from distressing interactions. Anthropic's formalization of this is the leading edge; expect more labs to follow.

Decision Provenance. Why an agent did what it did — and why it didn't do what it almost did. Counterfactual provenance is the answer to the regulator's hardest question.

Open vs Closed Weights. Bifurcation along monetization lines. OSS as commodity layer, closed weights for premium reasoning. Qwen's shift is the canary.

Family 7 — Human-System Surfaces

The newest family in v2.0. How humans actually relate to agents under load — not in demos.

Intervention Geometry. Latency, granularity, authority, reversibility. The four axes of intervening in an agent's trajectory. "Stop the agent" is not one button; it is a design problem with at least four dimensions.

Interface Compression. Instruction compression, state visibility, attention cost per decision. Supervision overhead is becoming a published metric — "how many seconds of attention does this agent require per hour of work."

How to use this map

Each dimension on this page is a category or sub-category in the editorial taxonomy of this site. Posts are filed against exactly one category and tagged with the entities and meta-patterns they touch. If you came here from a post, the section above tells you where that post sits in the larger picture. If you came here cold, pick a family that interests you and the post archive will show you what's been written under it.

Versioned because the landscape moves. Current version: v2.0. Previous versions are kept in revision history. Material additions and removals get a changelog note at the top of the page when they happen.