Spark and the end of the chat-first era

On May 14, 2026, an APK teardown of Google app beta v17.23 surfaced a feature with a name nobody had heard before: Gemini Spark. Within thirty-six hours, hands-on screenshots from TestingCatalog, Andrew Curran, and BuildFastWithAI showed the actual onboarding screen. The Gemini app had grown a second tab. Not a setting. Not a labs toggle. A second top-level tab inside the main consumer surface, labeled "Agent," sitting beside the existing tab labeled "Chat."

That tab is the story. The leaks are about model upgrades and codenames and warm browser sessions, but the structural news is the tab itself. Google has put two AI modalities — synchronous and asynchronous, conversation and delegation, the foreground and the background — on equal footing inside its flagship consumer surface, and the second one is the one with the longer roadmap. The chat is now half the building.

I think the next decade of consumer computing is going to be defined by the shift the Spark tab makes visible, and I think the canonical framing — "AI is moving from a chatbot to an agent" — undersells what is actually happening. The shift is not a feature upgrade. It is the third major reorganization of where consumer work gets done, and it has all the structural fingerprints of the previous two.

The three eras, named for where the work happens

Computing UX gets defined by what the user holds attention on, and where the work happens when they do.

Search-first ran from roughly 1998 to 2022 — call it from the launch of Google through the launch of ChatGPT. The user's job was to formulate a query. The system's job was to navigate them somewhere. The work itself — reading, comparing, deciding, booking, transacting — happened after the click, on destination sites, by the user. Google was a switchboard with an opinion about which connections were good. The economics followed: tax the merchant for attention, give the user free navigation. The trust model was thin and durable — trust the link if the source looked credible. The identity model was anonymous queries against a non-personal index.

Chat-first ran from late 2022 through about now. The user's job was to describe what they wanted. The system's job was to reason and return an answer. The work shifted: cognition moved to the model side, but execution stayed on the user. The user still had to copy the answer somewhere, click the link, file the email, complete the booking. The chat window was a better-than-search interface for problems whose hard part was thinking — drafting, summarizing, analyzing, explaining — and a worse-than-search interface for problems whose hard part was doing. The economics inverted: tax the user via subscription, give the merchant nothing. The trust model thickened — trust the answer, trust the model's reasoning. The identity model became an authenticated session against a personal account.

Agent-first is what Spark is putting a tab on. The user's job is to delegate. The system's job is to do. The work moves all the way to the agent side: the agent reads the inbox, books the flight, fills the form, completes the purchase, and reports back. The user shifts from operator to supervisor. The economics flip again — and I'll come back to this — because the action itself becomes the taxable surface, not the attention or the cognition. The trust model thickens further: trust the actor on your behalf, with credentials, with payment authority, with cross-domain reach. The identity model breaks open — you are no longer using your own account; the agent is, while pretending to be you.

The three eras are not just UX modes. They are three different time-shapes, three different loci of work, three different trust contracts, three different economics, and three different identity stories. The Spark tab is the first time a major lab has shipped a front-door product that admits the difference.

Spark was always going to come through the front door

The first thing to notice about Spark is what it isn't. It isn't a separate app. It isn't a labs experiment. It isn't a developer surface. It isn't gated to Workspace customers. It is a tab inside the consumer Gemini app, and it has been rebranded from something Google had already shipped.

The 9to5Google teardown of v17.23 was specific: the strings that previously said "Gemini Agent" had been mass-renamed to "Gemini Spark." The internal codename, per TestingCatalog and corroborated across leaks, is "Remy." This is not a new product. It is the existing agent infrastructure, repackaged under a brand that suggests something more than experimental, with a UI elevation from buried setting to top-level tab. Spark is what happens when Google decides the agent is ready to be a verb.

That choice — front door, not lab — tells you everything about Google's competitive read of the moment.

Search-first was Google's home court. The company built it, kept it for twenty-five years, and still owns it commercially even as the queries leak elsewhere. Chat-first was not Google's home court. OpenAI defined it in November 2022. Anthropic built a parallel surface inside it. Google played catch-up across two model generations and never set the defaults. By any honest measure, Google lost the first agent era of consumer AI — the one where "AI" meant "chat with the model" — even as Gemini's underlying model improved.

Agent-first is up for grabs. The home court is wherever the user's data graph lives, and Google's data graph is the deepest of the three. Gmail, Calendar, Drive, Docs, Maps, Photos, Chrome, Android — these are the substrates an agent needs to actually do things in someone's life, and Google has all of them as first-party services, not OAuth approximations. If Google can fuse those into the action substrate before OpenAI or Anthropic builds a credible parallel, agent-first becomes Google's home court, and the era of search-first decline reverses into the era of agent-first dominance.

This is why Spark is in the navigation drawer of the consumer Gemini app and not in a separate "Google Labs" badge. The point of putting it at the front door is to make the data-graph integration impossible to substitute. A third-party agent has to ask for OAuth scopes to read your Gmail; Spark just reads it. A third-party agent has to scrape your calendar; Spark queries it natively. The structural advantage is not the model — it's the proximity to the data.

And the timing is not subtle. The v17.23 build that contained the Spark strings shipped four days before Google I/O 2026, which opens today. The leak path — APK teardown, onboarding screenshots, viral TestingCatalog thread — is the path Google uses when it wants the conversation framed before the keynote. The "leak" is, more often than not, the press release.

The two-tab design is architecture, not UX

The thing the Spark tab admits, that almost no other agent product has been brave enough to admit, is that synchronous and asynchronous AI are different modalities — not the same product with a different model setting.

In a chat-first product, you send a prompt and you wait. The interaction is dialog. The unit of work is a turn. The user is present and watching. The model's response arrives in seconds, or — for reasoning models — minutes, but the user is still mostly in the loop, supervising in real time.

In an agent-first product, you describe a task and you walk away. The interaction is delegation. The unit of work is a task, sometimes a recurring task. The user is absent during the work and present at the report. The agent's output arrives in minutes, hours, or days, and the user has to trust the actor across that absence.

These are different cognitive contracts. They are also different system architectures. A chat needs an inference endpoint and a context window. An agent needs an inference endpoint, a planner, a tool router, a worker pool, durable state, a scheduler, a queue, an audit log, and a status surface for the user to check what the agent is doing while they aren't watching.

Most agent products have hidden this difference by stuffing both into one chat window. OpenAI's Operator started in a separate surface, then folded the agent behavior back into the main ChatGPT thread. Anthropic's Claude for Chrome is a browser extension that the user invokes from within a conversation. Manus runs in its own app. The various wrapper startups present an agent UX that looks like a chat that takes a long time to reply. The two modalities are forced through the same socket.

Google's two-tab design is the first major-lab front-door product to refuse this conflation. The Chat tab is for synchronous dialog. The Agent tab is for asynchronous tasks. The Agent tab, per the leaks, opens to a view that lists active tasks and scheduled tasks — a jobs dashboard, not a message thread. The interaction primitive in the Agent tab is "create a task," not "send a message." The visible state is "running / completed / scheduled," not "user typing / model thinking."

This is what the eleventh pattern of the Agent-Native Map calls Orchestration to Runtime — the move from libraries-of-helpers to runtimes-with-durable-state, with checkpointing, pause/resume, cost meters, and a real notion of jobs. The Spark UI is the consumer face of that runtime. The "Chat" tab is the chat-first product Google already shipped. The "Agent" tab is the agent-first product Google is now shipping. They sit beside each other because they are not the same thing, and Google's product team appears to have decided to stop pretending they are.

The journalists framing Spark as "Gemini's new agent mode" are missing the structural admission. Spark is not a mode of Gemini. Spark is the surface where Google's agent runtime becomes visible to the consumer, with its own time-shape, its own primitives, and its own dashboard. The chat is, increasingly, the vestigial half.

What "remote browser data" gives away

The richest detail in the leaked onboarding text is six words: "remote browser data, like login details and remote code execution data." Read that twice. It is the architecture spec.

"Remote browser data" tells you Spark is not running on the user's device. It is running in a server-side environment, with browser sessions kept somewhere Google controls. "Login details" tells you those browser sessions are being kept authenticated — the agent does not re-login to your bank every time it goes to check a balance; it reuses the session it established the first time. "Remote code execution data" tells you those sessions also retain execution state — Python kernels, scratchpad files, intermediate artifacts — between invocations of the same task.

What that adds up to is a per-user worker pool: persistent, authenticated, stateful, server-side. This is the first pattern of the Agent-Native Map — Agent OS Convergence — made consumer-facing. Background agents, persistent machines, durable execution converging into agent runtimes. Spark is what it looks like when one of those runtimes gets a top-level tab.

The cost structure of this is its own admission. A chat session is effectively free to keep around — a context window in memory, evicted within minutes of inactivity. A warm-browser session is not. A headless Chromium instance with retained cookies, local storage, and execution state costs real money to keep idle, and Google would not be paying that cost per user unless the projected lifetime value of an agent-mediated user is higher than the projected lifetime value of a chat-only user by a meaningful multiple. The infrastructure choice is a revenue prediction. Somewhere in a Google planning document, the multiple is written down, and it is large enough to justify a fleet of warm per-user browser sessions running on Borg or its successor twenty-four hours a day. That number is the agent-first thesis quantified.

But the architectural detail is not the interesting part. The interesting part is what Google has chosen to do with the authentication.

The same onboarding screen says, in the language Google's lawyers presumably blessed: "While it is designed to ask for your permission before taking sensitive actions, it may do things like share your info or make purchases without asking." Read in the context of the worker-pool design, that sentence is doing something specific. It is telling you that Spark, holding your authenticated sessions inside its server-side environment, will act as you across third-party sites — submitting forms, completing checkouts, transferring data — using credentials you logged into once and never had to re-supply.

This is the bearer-token impersonation pattern, shipped as a feature. I wrote a few months ago that OAuth's bearer-token model assumes a human-paced caller and breaks under agentic iteration speed; that the agent identity question is unresolved at the protocol layer; that the right answer requires cryptographically bound, scope-attenuated, end-to-end auditable credentials per agent. None of that infrastructure exists yet at consumer scale. Spark does not wait for it. Spark ships the broken-but-functional version — your existing logged-in session, reused by a server-side worker, with a warning text where the protocol fix ought to be.

This is not a security oversight. This is a deliberate trade. Google has decided to take the capability now and pay the bill later. The bill comes in three forms, none of which Google's product team is unaware of.

The first is the viral incident. An agent that "may complete purchases without asking" will, within six months of launch, complete a purchase someone really wishes it hadn't. The screenshot will go viral. The refund will be issued. The product will absorb the hit. Google has the absorption capacity — Account, Activity, kill switch, Google Pay reversal pathways — and the engineering team has clearly calculated that the absorption is cheaper than the delay.

The second is the regulatory bill. The EU AI Act's transparency and human-oversight obligations for general-purpose AI begin applying on August 2, 2026. Article 50 obligations for consumer-facing AI systems — disclosure, audit trails, the right to a meaningful explanation — apply on the same schedule. Spark in its leaked form fails several of those tests. Third-party data sharing is opaque in the onboarding text; the lack of pre-action confirmation contradicts the spirit of human oversight; the warm-session model makes per-action auditability nontrivial. Google's history suggests Spark will arrive in the EU later than in the US, with a more constrained feature set, and that the team has costed that lag into the launch.

The third bill is the one the Map calls Regulation by Actuarial Table. AI insurance underwriters — Munich Re's AI cover, the ATA syndicate at Lloyd's, Allianz Trade's AI errors-and-omissions products — are starting to price agent identity hygiene as a coverage variable. The actuarial signal is more immediate than the regulatory one because it shows up in next year's premiums, not next decade's case law. The first insurance market that prices Spark-style impersonation as uninsurable, or insurable only at premium rates that make enterprise adoption uneconomic, will force the retrofit faster than any EU regulator. I have argued in earlier writing that the bearer token will die because of insurance pressure faster than because of any law, and Spark is the most expensive test of that thesis yet.

The point is that Google is not unaware of any of this. Google is making a wager. The wager is that the home court of the agent-first era is worth the eventual cost of retrofitting identity properly, and that being six months earlier with capability is worth being six months later with hygiene. AP2 — the Agent Payments Protocol that Google itself helped originate — and x402 — Coinbase's payment-rail standard for agents — are both being designed exactly so that agents can transact without bearer-token impersonation. Google is shipping the impersonation version first, with the proper-protocol version implicitly on the roadmap. The bet is that consumer behavior will lock in first, and the underlying protocol will improve under the same UX. It is the iPhone-without-app-store bet, structurally speaking. Capability first. Hygiene later.

The economics nobody has named out loud

Each era of computing UX has had a defining economic move, and the move maps to where the work happens.

Search-first taxed the merchant. The economic move was advertising against intent. The user typed a query, the system showed results, and the merchant paid for placement against the query's commercial inflection. The user got free navigation as a side effect of the merchant's willingness to pay for attention. This was a $300 billion annual business at its peak, and it built Google.

Chat-first taxed the user. The economic move was subscription against cognition. The user paid a flat monthly fee for access to the model's reasoning, and the merchant got nothing — there was no obvious surface to advertise against, no inflection point to monetize. Twenty dollars a month for ChatGPT Plus, twenty for Claude Pro, two hundred for Claude Max. This is a real business, but it is structurally a fraction of search-first's, because the buyer surface is hundreds of millions of consumers rather than tens of millions of merchants, and the willingness-to-pay is bounded by what a personal-productivity tool can extract.

Agent-first taxes the action itself. This is the move nobody is naming clearly yet, because it has not fully arrived, but Spark's "purchases without asking" is the first beachhead.

When an agent completes a transaction on behalf of a user, three parties touch the transaction: the user (who delegated), the agent operator (who executed), and the merchant (who fulfilled). Search-first inserted the agent operator between the user and the merchant for the matching step only; the transaction itself happened on the merchant's site, and the merchant paid for placement. Agent-first inserts the agent operator into the transaction itself. The agent fills the form. The agent enters the credentials. The agent confirms the purchase. The agent reports back.

That intermediary position has commercial weight. The merchant will pay the agent operator for the same reason merchants paid Google for ads — the operator delivers a high-intent, payment-ready, identity-verified buyer. But the take-rate moves from cost-per-click (a few percent of the eventual transaction, in aggregate) to take-of-transaction (a few percent of every transaction the agent mediates). At scale, this is bigger than search-first ads, not smaller. The base rate of consumer transactions Google can plausibly mediate is in the trillions of dollars per year; even a single-digit take-rate on a single-digit share of that is a multi-hundred-billion-dollar revenue line.

This is why AP2 and x402 are not the technical curiosities they look like. AP2 is Google's effort, with payment networks attached, to standardize agent-mediated payment mandates with cryptographic delegation. x402 is Coinbase's HTTP-native equivalent, designed to make every web resource priceable in stablecoin per-call. Both are payment-rail standards. Both are also, not coincidentally, the standards that resolve the bearer-token problem first — the payments people had the cryptographic primitives because they had to, and the agent identity work is being smuggled in under the payment-protocol cover. This is the eleventh pattern of the Agent-Native Map: Payments Racing Regulation. The payments stack is shipping agent-identity-grade primitives faster than any regulator can keep up, because the financial incentive to settle agent-mediated transactions is enormous and the regulatory clock for everything else is slow.

The Spark warning text — "may do things like complete purchases without asking" — reads, in this light, as a soft launch of the agent-mediated transaction layer. The first hundred million Spark users who complete a purchase through the agent are the training set for the merchant-side product. The merchant who notices, six months in, that 4% of their checkout traffic now comes from Spark sessions and starts asking how to optimize for that channel — that's where the second revenue line opens. The take-rate gets negotiated. The category exclusivity gets negotiated. The "Spark Verified" merchant badge gets sold. The era's economics gets built.

For Google, the prize is not the subscription line. Spark's subscription tier, by the leak's pricing inference, sits around $19.99/month. That's a real business but it's not the era-defining business. The era-defining business is the merchant-side take-rate, which will be sold under whatever name AP2 ends up with at I/O. Search-first was a $300B ads business built on the merchant side of free consumer navigation. Agent-first could be a $300B transactions business built on the merchant side of paid consumer delegation, with structurally higher margins because the operator's leverage is greater.

The merchants, by the way, will pay. They have to. The alternative is that their products become invisible to a generation of consumers whose purchase intent is mediated entirely through agents. The merchant who refuses to integrate with the dominant agent operator's transaction layer is the merchant who refused to appear in search results in 2003. The pricing power tilts entirely toward the operator.

The case against Spark, and why Google shipped it anyway

The case against Spark, made well, is real. It is the case that, in a more cautious world, the product would not ship in this form.

The first half of the case is that Spark is contained. Spark operates inside Google's ecosystem and inside Chrome. It does not — at least in the leaked form — have OS-level reach across Windows, macOS, or arbitrary desktop applications. OpenAI's Windows-integrated agent and Anthropic's Computer Use tooling both reach further into the user's environment. A reader who wants an agent that drives Photoshop or VS Code or a non-Chrome browser is not the target user for Spark. This is real, and Google's roadmap presumably has a desktop-side answer in 2026 — the leaked desktop Gemini app references a "Spark mode" with local file and window awareness — but the May 14 leak is mostly a phone-and-web product.

The second half is the regulatory exposure. The EU AI Act's Article 50 obligations for consumer-facing AI begin on August 2, 2026. Spark's onboarding text — sharing info with third parties without confirmation, executing purchases without confirmation, opaque audit surface — is essentially the consumer-facing AI system the AI Office had in mind when it drafted those obligations. The likely outcome is that Spark either ships with a constrained European feature set or ships late in Europe, and that within twelve months the European Commission's Article 50 guidance includes language specifically describing Spark-class behavior. This is the Definitional Gap pattern from the Map: regulators governing agentic workflows with paradigms designed for a different shape of system, and the system shipping anyway because the gap leaves enough air to operate.

The third half — and yes, I know I have three halves — is the predictable horror story. There will be an incident. A Spark user will wake up to a purchase confirmation for something they did not authorize, or a third party will receive their personal information through a chain of inferences they did not anticipate, or an inbox-cleanup task will archive a legal notice. Google will absorb the incident — refund, account remediation, public apology, additional safeguards in v17.31 — because Google has the absorption capacity. But the incident will land, the screenshots will circulate, and a meaningful slice of the addressable market will pull back from agent delegation for six to twelve months as a result.

All three of these are true, and Google's product team is not blind to any of them. The strategic logic of shipping anyway is what the editorial frame I have been arguing predicts: agent-first home court is up for grabs, the home court is worth more than the absorption costs, and being late is more expensive than being early. The Compliance as Differentiation pattern from the Map gets to play in two directions — the cautious vendor positions on hygiene and trades capability for trust; the aggressive vendor positions on capability and trades trust for adoption. Google has chosen capability. The bet is that, by the time the compliance bill arrives, the agent-first behavior is locked in, and the regulators are codifying what users already expect.

I think this bet is mostly correct, with one caveat. The caveat is that the regulatory cycle is faster than it used to be. The EU AI Act moved from proposal to applicability in five years. The US state-level patchwork is moving faster. The Korean and Japanese consumer-protection regulators are matching the EU's pace. The window in which Google can ship Spark-as-leaked without compliance retrofits is narrower than it would have been for the equivalent move in 2018. Six months of front-door dominance, then a forced retrofit, is still a winning trade. Two months and a forced retrofit might not be.

What to watch over the next eighteen months

I/O is today, and the keynote will tell us which parts of the leak Google chose to lead with and which it left in the build. The pieces of Spark that matter, regardless of what the marketing emphasizes, are the ones that determine whether the agent-first era's home court actually consolidates around Google.

The first thing to watch is the Skills API. The leaks reference "skills" as a first-class concept inside the Spark architecture — modular templates, reusable runbooks, parameterized tasks. If skills become a developer surface — if third parties can register their SaaS as agent-callable tools, the way developers registered as Assistant Actions in the earlier Google Assistant era — that is the Plugin moment of the agent era. It is also the moment the platform economics of agent-first solidify, because every registered skill is a tax point and every category leader's incentive to register is the same as the early App Store calculus. The right signal is a public Skills SDK before the end of 2026.

The second thing to watch is the merchant side of AP2 and x402 adoption. The technical specs are interesting but the commercial test is which large travel, retail, banking, and subscription companies announce agent-payment compatibility. Watch United Airlines, Marriott, Stripe, Shopify, DoorDash, and the major media subscription services. If five of those announce AP2 or x402 integration before Q4 2026, the merchant-side take-rate business has a real adoption curve. If they sit it out for another year, Google has a Spark product without a Spark monetization engine, and the era's economics get postponed.

The third is the insurance signal. Watch Munich Re's AI Underwriting Facility, the ATA syndicate at Lloyd's, and Allianz Trade's AI E&O products. The first published rate sheet that prices "agent identity hygiene" as a coverage variable is the inflection. The first public denial of a claim on the grounds that an agent operated under reused bearer credentials is the inflection after that. The insurance market is the regulator nobody talks about, and it operates on a one-year cycle rather than a five-year one.

The fourth is the regulatory move. Watch the European Commission's published guidance on consumer-facing agents under Article 50 of the AI Act, due over the summer. Spark in its leaked form is the test case the AI Office has been waiting for. The guidance will either accommodate the warm-session impersonation model — in which case Google's wager is fully vindicated — or it will explicitly require per-action confirmation and revoke the warm-session shortcut. The second outcome is the one that forces the AP2 / x402 retrofit on Google's timeline rather than Google's preferred timeline.

The fifth, almost ironically, is the model. The leaked "Cappuccino" checkpoint, internally tracked as Gemini 3.5 Pro, is meant to ship alongside or shortly after Spark. The model matters less than the chatter suggests, because model capability is converging across labs faster than any of them want to admit. The Map's sixth pattern — Model Convergence Pressure — predicts that gains move to topology, rubrics, memory, and scaffolding rather than to raw capability. Spark is exactly the scaffolding move. The model is necessary; the agent is decisive.

The chatbox is the search bar of the 2020s

Twenty-seven years ago, Larry Page and Sergey Brin built a company whose business was sending you somewhere. The link was the unit. The user navigated, and Google's revenue came from owning the surface where navigation happened. The brand became a verb.

Twenty-six years later, OpenAI built a company whose business was answering you in place. The chat turn was the unit. The user dialogued, and OpenAI's revenue came from owning the surface where the cognition happened. The chatbox briefly looked like the next search bar.

But the chatbox was always a way station. A chat that answers you in place still leaves the doing on your side. The user has to take the answer, open another tab, log into the merchant, fill the form, confirm the purchase, file the receipt. The cognition moved; the action did not. The chat-first era was the era of better-thinking-without-better-doing, and the productivity ceiling of better-thinking-without-better-doing is much lower than the ceiling of moving the doing too.

Spark is Google's bet — and, more importantly, Google's first front-door product — on the doing moving. The Agent tab is what it looks like when a major lab decides the chatbox is no longer the destination, just one of two tabs, and the more interesting tab is the one where you describe a task and walk away. The two-tab design is the visible admission that the chat is the past and the agent is the next decade.

The economics will follow the action. Search-first taxed merchants for attention. Chat-first taxed users for cognition. Agent-first taxes the action itself — and the operator that mediates the action gets the take-rate. Google's data graph, its payment infrastructure, its identity stack, and now its top-level Agent tab are all positioned to make Google that operator at consumer scale. The home court is winnable.

The risks are real, the bearer-token retrofit is coming, the regulatory bill will land, the viral horror story is approximately six months out. None of that changes the structural move. For twenty-seven years, Google's job was to send you somewhere. With the second tab inside the Gemini app, Google's job is now to go for you. The chat-first era ended in a tab nobody had heard of four days ago, and the agent-first era is the one with the longer runway.

The bet is not that Spark v1 is the product that wins the era. The bet is that the era is now defined by who ships the front-door tab first, and Google has just shipped it.