2026 Is the Year of the Tool

Gustave Principio watching the moon

Gustavo Principio, Galileo’s obscure, under-credited, almost certainly fictional tea servant, has just come from the study, where he spent the last hour pouring tea while Galileo rambled about the craters and mountains he’d been mapping on the lunar surface through his telescope. Gustavo nodded politely through all of it. Now he’s standing outside, alone, squinting up at the Moon with his own eyes.

He mutters: “Someday people will walk there.”

A bold prediction. Also useless for about four hundred years. Nobody builds a rocket because a tea servant had a feeling. The feeling might be right. It might even be profound. But it doesn’t tell you what to do on Monday.

I keep thinking about Gustavo, because the AI discourse is full of him right now.

But I have a hunch of my own. Not a benchmark hunch. Not a “new SOTA just dropped” hunch. A different kind.

I think 2026 is not going to be the “wow, such model” year, the way 2025 felt. I think 2026 is going to be the “wow, such tool” year.

Models will keep improving. But the center of gravity shifts. 2025 was model shock. Every few months, wait, it can do what now? 2026 looks more like the year people take what we already have, or mostly already have, and build machinery around it that actually changes how work gets done.

And yes, AI is coming for a lot of work. I don’t know about jobs though. Not in the clean, headline-ready way people keep pretending they do. I don’t trust anyone who claims they can map that from here. But it is coming for a lot of the work inside jobs: the searching, the switching, the stitching, reformatting, checking, reminding, and “let me just do this manually because the tools don’t talk to each other” sludge that eats half the week.

That part feels less speculative every month.

The Prophet Problem

The funniest thing about the AI discourse is how many people have been saying, with full chest, for years now: “AI is going to transform work.”

Well. Yes. Probably.

But some of them say it with the confidence of a prophet and the precision of a weather app that only knows one sentence: “It will rain tomorrow.” Every day.

My dad used to call these people rain predictors. Someone who says “it will rain” every morning isn’t a forecaster. They’re just patient. The prediction costs nothing, and when the sky finally opens, they get to feel like a genius for free.

That’s not the same thing as actually seeing what’s coming.

There is a difference between being directionally right about a giant trend, being strategically right early enough to build something, and being surgically right in a way that cashes out almost immediately. That difference matters now, because we’re past the point of merely predicting “AI will matter.” We’re starting to get clues about how it will matter.

My bet: tooling. Delegation. Control. Verifiable execution. And honestly, I think this is more exciting than the model improvements, not less. Because right now, something funny is happening. People are furiously building tools that will improve their performance, and then… not quite building anything with that improved performance yet. Everyone is sharpening the axe. Almost nobody has started swinging.

I get it. It’s a natural step. There is so much uncharted territory, so much ungrabbed land, that just mapping the tooling layer feels like a full-time job. But that’s exactly what makes this moment electric. The tools are arriving. The things people will build with those tools haven’t been imagined yet. That gap is where the real story of 2026 lives.

Three Kinds of Right

Back to Gustavo? He’s the first kind.

Gustavo right is the prophecy that doesn’t help you build anything. “One day we’ll cure this.” “One day we’ll fly.” It may be true. May even be profound. But it’s cheap, because it costs nothing to say and earns nothing until someone else does the engineering.

A lot of AI commentary is still Gustavo-coded. “Agents will be big.” Probably. “Interfaces will change.” Sure. “A lot of work will change.” Yes, fine, thank you.

Then there is PC right. “A computer on every desk and in every home.” That’s not a sci-fi vibe. That’s a direction, a market thesis, a product strategy, a stack of constraints, and a decades-long building plan. It was right in a way that shaped actual execution, but it still took years of hardware improvements, software platforms, standards, distribution, and relentless productization. PC right is expensive. It requires taste and patience.

Then there is the rarest kind: Neptune right. Not “something big is out there somewhere.” More like: there is a thing, it should be there, and if you point a telescope at those coordinates next year, you’ll find it. Neptune was predicted mathematically before anyone observed it. Astronomers found it in 1846 by aiming where the calculations said to aim.

Neptune right doesn’t just point at the future. It gives you coordinates. The rain predictor says “it will rain.” Neptune right tells you the date, the hour, and which street will flood.

So the question that actually matters: are we getting any Neptune-ish clues about where AI value is headed?

I think we are.

The Real Shift

While people were still arguing about benchmarks, the hard problem changed shape.

It’s no longer “can the model do the task?” It’s whether the system around the model can decompose work, assign it, monitor it, adapt when something breaks, and verify results. All without surprising anyone with consequences they didn’t approve of. A recent DeepMind paper on AI delegation lays this out cleanly: the bottleneck isn’t intelligence, it’s the entire scaffolding of authority, accountability, trust, and control that makes delegation survivable in real environments.

And it comes with a warning worth taking seriously. Scale delegation without meaningful control and you get opacity, responsibility diffusion, and what the authors call the “moral crumple zone”, the grim arrangement where humans still carry accountability but have lost real control over what the system did and why. Anyone building agent tools should read that less as a warning and more as a spec.

The winning tools in 2026 won’t just be powerful. They’ll need to be delegation-grade.

Why the Tooling Layer Flips Now

Here’s the shortest version I can give:

2026 is the year organizations realize that agents are not “chatbots with ambition.” They are software with side effects.**

Reads, writes, API calls, code changes, payments, task handoffs, messages, all the messy consequences of touching real systems. Once you see agents that way, you immediately need everything normal software systems need and then some: isolation, access control, logging, audit trails, verification, policy, secrets handling, and protection against prompt injection. The agent is just a new kind of software. One that runs in a world that already has rules.

Agents are entering the workflow. Not sitting beside it

A year ago, most agent experiences lived in a sidebar. A chat window next to the thing you were actually doing. The interaction pattern was: do your work, occasionally ask the chatbot something, paste the answer back in.

That’s collapsing. Agent capability is moving inside PR flow, issue flow, repo flow, review structures. Once it’s embedded there, it stops being “extra chat” and becomes part of the team’s control system. Claude Code, Cursor, GitHub Copilot CLI. These are converging fast, and the gap between them isn’t really about who has the smartest model anymore. It’s about loop design, review ergonomics, and how naturally the agent fits where work already happens.

Organizations adopt workflow-compatible tools, not abstract potential. That’s always been true. It’s just that now the tools are finally reaching the workflows. And once they’re inside, the question stops being “should we use AI?” and becomes “how do we govern what it’s already doing?”

The context window isn’t free, so tooling has to get smarter

Long context windows sound like they solve everything until you run the numbers. A million tokens of context is not the same as a million tokens of reliable context, and even when it works, the costs become absurd fast. If tool definitions and prior results eat most of your context before the model even sees the user’s actual request, you haven’t built an agent.. you’ve built an expensive parrot with amnesia.

That pressure is pushing teams toward better tool registries, smarter loading, execution outside the prompt, artifact-based workflows, and verification pipelines. The practical advantage shifts to whoever builds the best machinery around the model.

Governance stops being an enterprise extra

The moment an agent can read or modify code, data, or business systems, the organization has to answer a very boring, very adult question: what did the agent do, and why? Tracing, auditability, observability, these are becoming standard requirements, not features you sell to compliance-heavy verticals. Side effects plus stakes equals audit trails. Every time.

The Loop That Makes It Real

If I had to compress the “year of the tool” into one mechanism, it would be this:

Plan → Execute → Verify.

Policy checks before action. If needed, Sandboxed execution during. Artifact collection after. Diffs, logs, findings Specs. Verification against expected outcomes. Human sign-off before merge or deploy.

That loop is carrying more weight than it looks. It turns a model from an answer machine into a participant in a system you can actually audit. And it sneaks in the thing demos always skip: recovery. If verification fails, you return to plan. If policy denies, you stop and ask. If execution produces garbage, you inspect the artifacts and try again.

A demo is “look, it did a thing.” A work system is “look, it did the thing, showed its work, failed safely, and tried again.” That difference is a tooling victory, even if the model in the middle is exactly the same one we had six months ago.

Standards, Protocols, and the Infrastructure That Wins

Tooling momentum also compounds when people stop rebuilding the same connective tissue from scratch.

MCP, A2A, ACP, AGENTS.mds, tracing layers, these reduce integration friction and make the ecosystem less like a pile of bespoke adapters. ACP is a good example of how specific this is getting: a protocol that standardizes how coding agents talk to the environments they work in, editors, CLIs, remote hosts, the same way LSP standardized language servers. Plumbing with a spec, not a pitch deck. These standards are incomplete, especially around policy, liability, and deep verification. But incomplete standards that people actually adopt beat perfect frameworks that live in a whitepaper.

This is why “wrapper” isn’t the right insult anymore. Some wrappers are still wrappers. But the interesting stuff has become infrastructure: permission layers, trace layers, policy-as-code, agent gateways, signed tool catalogs, runtime isolation, review surfaces, eval and monitoring systems.

The plausible next bets are specific enough to be useful: policy-as-code as a mandatory agent layer, supply-chain security for agent tools and plugins, agent gateways that combine API management with secrets brokering and observability. This is already starting to trend, and it should, because it’s high-leverage. That’s exactly the kind of prediction I’d rather stake money on than “one model to rule them all.”

Useful Tools Can Still Build Bad Systems

Saying “2026 is the year of the tool” is not the same as saying “2026 solves agent reliability” or “2026 is safe by default.” It probably won’t be.

The next year or two will be defined by an awkward tension: teams want speed, vendors want adoption, users want magic, and reality keeps demanding controls, friction, and accountability.

The bad version of 2026 is easy to imagine. Pretty mission-control dashboards around brittle pipelines. Fake audit trails that log everything and explain nothing. Over-delegation with unclear ownership. Humans reduced to rubber stamps on work they no longer understand well enough to evaluate. Juniors missing the reps they used to get from the tasks everyone called boring.

If humans retain liability but lose meaningful control, that’s not augmentation. That’s administrative theater.

The tools that win long-term will be the ones that understand this early. Not as an ethics footnote. As a product requirement.

Where I Place the Bet

2026 will not mainly be remembered for one giant model leap that made everything obvious overnight. It will be remembered as the year a lot of people realized the real leverage was in the tooling layer: systems that make delegation explicit, workflows that make agent work verifiable, protocols that make tool ecosystems composable, and controls that make all of this survivable in real organizations.

2025 gave us a lot of “look what the model can do.” 2026 will increasingly be: “look what this tool can get done, safely enough, cheaply enough, and inside the way we actually work.”

Less theatrical. More useful. Much harder to tweet, ehrm x.. x? Anyways, and much more likely to change your Tuesday.

And yes, some people predicting “AI will transform work” will get to take their victory lap when it rains. Good for them.

But I’m less interested in being Gustavo right. I’m trying to be at least PC right. And if we’re lucky, in a few specific places, maybe even a little Neptune right.

Here’s the thing about Neptune, though. When astronomers finally pointed their telescopes at the right coordinates in 1846, they found a planet that Galileo himself had seen two centuries earlier and mistaken for a fixed star.

As star, or is it Neptun, with an what seems to be a rocket in front of it. Gustavo’s boss was looking right at it. He just didn’t have the framework to know what he was seeing.