Beginnings Never Look Like Beginnings

alt text

A couple of days ago, someone on X said OpenClaw isn’t useful.

I’ve been thinking about it since. Not because the take was interesting, it wasn’t, really, it was six words and a period (yeah sure, and some text bellow that, so probably a hook, I didnt bite on more than the first part), but because of how it was delivered. The confidence and clean verdict. No hedging, no “yet,” no acknowledgment that the thing has existed for about five minutes in technology years. A dismissal, performed with the certainty of a closing argument, on a platform that rewards exactly that. And I’ve noticed so many of those lately.

Look, they might be right. OpenClaw might amount to nothing. A lot of “beginnings” are just beginnings of nothing. The graveyard of technologies that were supposed to change everything is gi-normous, and it mostly sits there and probably not changing anything. Or?

But the thing that’s been gnawing at me isn’t the conclusion, but the confidence. Because the same move gets made in the other direction too. The “this changes everything” crowd commits the exact same sin, just with better vibes. Some new thing shows up, and within forty-eight hours the internet has split into people who are certain it matters and people who are certain it doesn’t, and nobody is saying the only honest thing: I haven’t got a fucking clue yet.

I’ve been using OpenClaw. I like it. I’m also not sure what it is yet. And I think that’s the only honest position available right now.

A Friday Afternoon That Almost Didn’t Happen

What I actually use it for atm:

My kids go to a school that publishes next week’s plan on their homepage every Friday. The plan includes everything, lessons, topics, outdoor activities, skiing trips, barbecues in the woods, what to pack, when to show up early. Every Friday I’d forget to check, or I’d check and forget to transfer the details into our family calendar, or I’d transfer half of it and miss the part about bringing sausages for the forest day, and then Monday morning would arrive with its usual small chaos.

So I told OpenClaw to handle it. Every Friday, it goes to the school homepage, fetches the plan for the coming week, parses what’s actually happening each day, and fills in our Google Calendar with the relevant entries. Skiing trip Wednesday, nature walk Thursday, bring rain gear. Done. I don’t open the school site. I don’t copy-paste into Calendar. I don’t forget. It just appears, correct and quiet, like it was always there.

Not a revolution, but it’s a chore I no longer do. And that’s exactly what makes it interesting. It’s not just a demo, but a signal. Because the thing that separates OpenClaw from a chatbot isn’t intelligence. It’s that it has access. It runs on my machine. It can reach a website on a schedule, touch my actual calendar, and operate across systems I use without me being in the loop for every step. It’s not answering questions. It’s actually completing tasks. Small ones, boring ones, but very real ones, with real side effects in real systems.

That distinction between software that talks and software that acts matters more than it sounds like it should. A chatbot can tell you what’s on the school website. An agent can put it in your calendar while you’re making dinner. That’s a different thing entirely, and most of the hard questions start there.

And also, perhaps, some dangerous ones.

The Confident Dismissal, and Its Mirror Image

There’s a specific genre of internet commentary that treats certainty as a personality trait. The dismissal post. The “I tried it for twenty minutes and here’s my final verdict” thread. You pick a side fast, you state it clean and you move on. The algorithm rewards it. Some dudes validate it. And the result is a discourse where everybody has an opinion and almost nobody is saying the only honest thing: I don’t know, and neither do you, and the history of this kind of call should make us both very nervous.

Thirty-five years ago, email wasn’t useful for my dad. And he wasn’t wrong, not exactly, not at that moment. He communicated fine with paper and phone calls. The world hadn’t reorganized yet. Email was, for him, in his context, a solution to a problem he didn’t have. The mistake wasn’t the observation. It was treating a larval-stage observation as a permanent verdict.

Ray Tomlinson sent the first network email in 1971 and described it as a neat idea. Nobody knew. Nobody could have known. And then, centimeter by centimeter, the world quietly rearranged itself, bank, school, employer, receipt, confirmation, calendar, my dad, until not having email became the weird choice.

Sure, boosters love the email story. The “everyone laughed, and then it changed the world” arc. Clean, triumphant, easy to tweet. What they never do is name the technologies that failed. Because naming failures would complicate the narrative, and complicated narratives don’t get engagement.

So let’s try to name some. And then let’s watch what happens when we do.

The Segway was going to reshape cities. Steve Jobs reportedly said it was more important than the PC. Dean Kamen unveiled it in 2001 with the kind of fanfare usually reserved for moon landings. It was going to replace walking. It flopped. The product became a punchline, mall cops and tourist groups. Definitive failure, right? Case closed.

Except, walk through any Norwegian city today, or maybe any European city, and you’ll trip over electric scooters. Rental ones, app-enabled, parked on every corner, more numerous than seagulls in some places. The Segway failed. But the underlying idea of small, electric, personal transport you don’t own turned out to be exactly right. It just needed a different form factor, a different business model, and about fifteen years.

Or take Google Glass, mocked into oblivion in 2013. “Glasshole” became an insult. The product died what looked like a definitive death. And now, a decade later, smart glasses are quietly re-emerging. Meta shipping real units with Ray-Ban. The AR display ecosystem maturing. I wrote in a previous essay about how lightweight eyewear might become the natural interface for an agent-driven world, not for immersion, but because glanceable supervision is cheaper than pulling out a phone. Glass wasn’t wrong about the future. It was wrong about the decade.

Or take Quibi, a name that might not mean much outside the US (I had never heard about it, found it during research), so a quick introduction. Quibi was a streaming platform that launched in April 2020, founded by Jeffrey Katzenberg and Meg Whitman, backed by nearly two billion dollars in funding. Its premise was premium short-form video for mobile, Hollywood-quality shows in episodes of ten minutes or less, designed to be watched on your phone. It shut down six months later. Gone.

And then, almost immediately, the world did exactly what Quibi said it would. Musical.ly morphed into TikTok and detonated. Instagram launched Reels. YouTube launched Shorts. It turns out people desperately wanted short content on their phones, just not premium, not ten minutes, and not behind a paywall. More like thirty seconds to two minutes, made by anyone, algorithmically surfaced. We are literally losing our kids to the thing Quibi was reaching for. They were off by a few centimeters and a business model, and those few centimeters were the difference between a two-billion-dollar crater and the most dominant media format of the decade.

Here’s what I find unsettling about all three of these. I went looking for clean failures. Technologies I could point to and say “see, sometimes the confident dismissal is just correct.” And I couldn’t find them. Not really. Every “failure” I picked turned out to be directionally right in a form nobody predicted. That’s not proof that failed products are secretly early winners. I chose these examples, and someone else could choose differently. But it does mess with the clean sorting. A product can fail completely while the instinct behind it turns out to be right, and that makes the whole game of confident early verdicts harder than anyone wants it to be. The Segway is an electric scooter. Glass is a Ray-Ban. Quibi is TikTok.

Which means the sorting game is even harder than I thought. It’s not just that you can’t tell, in the early days, whether something will succeed or fail. It’s that “succeed” and “fail” might not even be the right categories. The underlying instinct might be correct while the specific product is wrong. The need might be real while the form is off. And you often can’t distinguish between “this is nothing” and “this is something, but not this” until years after the verdict was confidently delivered.

If we can’t cleanly sort things into “right” and “wrong” even with a decade of hindsight, what business do we have sorting them on day thirty?

Where the Skeptic Draws Blood (And They Do)

Here’s the part I want to be careful about, because I actually think the “not useful” camp has real ammunition. And I don’t want to do the booster thing where I pretend the objections are weak just because I like the product.

Viral is not the same as reliable. Well over a hundred thousand GitHub stars in weeks is extraordinary. It’s also a novelty metric. Stars prove demand and curiosity. They don’t prove durability. The gap between “impressive in a demo” and “I would trust this with my Monday morning” is real and, right now, large.

The thing that makes agents powerful is exactly what makes them dangerous. Access plus action is the entire value proposition. But also the entire attack surface. This isn’t hypothetical. Early 2026 has already delivered concrete evidence: malicious extensions in plugin ecosystems disguised as legitimate integrations. Infostealer malware extracting API tokens from agent configurations. Prompt injection attacks, where untrusted content hidden in an email or web page tries to hijack the agent’s next action, that security researchers describe as potentially worse than SQL injection, because you can’t simply patch language out of a language model. My school-calendar thing is cute. The same access pattern pointed at a bank account is a nightmare.

The benchmarks are not flattering. OSWorld, which tests agents on real operating system tasks, reports human performance above 72% and best-evaluated model performance around 12%. Twelve percent. “Digital junior operator” is still more aspiration than job description for general, open-ended task sequences.

The skeptic is right about all three of these. The question is what you conclude from them. “The category is wrong” is one conclusion. “The category is early” is another. And I want to be honest.I lean toward “early,” but “early, not wrong” is itself a prediction, and I’ve just spent several paragraphs arguing against premature predictions, so I don’t get to exempt my own.

It might be early. It might be wrong and early. It might just be wrong. I don’t think so, but I don’t get to be certain about that either.

The Factory Floor Problem

There’s a reason the “not useful” verdict feels so compelling at the start of every shift. It’s not stupidity. It’s timing.

Paul David, an economic historian, studied what happened when factories got electric motors. The answer: almost nothing, for thirty years. Not because electricity didn’t work. Because nobody redesigned the factory. They just bolted electric motors onto machines that had been arranged for steam power. The whole floor laid out around a central steam shaft, belts running to every station. Electric motors made the steam shaft unnecessary, but the factory floor didn’t change, because changing it meant rethinking everything. The building. The workflow. The training. The organizational assumptions.

The real gains from electrification arrived a generation later, when new factories were built from scratch around the capabilities electricity actually offered, individual motors at each station, flexible floor plans, no central shaft. The technology had been “available” for decades. The transformation required everything else to catch up.

Agent software, right now, is the dynamo bolted onto a steam-era factory. We’re connecting agents to tools and workflows that were designed for humans clicking through interfaces. The agent can click faster and more reliably, sure, but the workflows themselves, the approval chains, the permission models, the way information moves between systems, were shaped around the assumption that a person would be in the middle of every step.

This is something I keep circling back to. I wrote previously about what I called the dot interface, the idea that as agents mature, the interface itself wants to shrink, because navigation stops being the dominant activity. When the system can reliably convert intent into outcomes, you don’t need menus, app stores, and home screens competing for your attention. You need a way to state what you want, approve what will happen, and confirm what changed. That’s the redesigned factory floor. Not a better arrangement of the old machines, but a fundamentally different assumption about where human attention belongs in the loop.

We haven’t built that yet. What we have, OpenClaw included, is the dynamo. Real, genuinely promising, and bolted onto a world that was designed for steam. The gains that matter, the ones the skeptics are right to say haven’t arrived, require the floor to change. And floors change slowly, because they’re not just technology. They’re habits, org charts, trust, and institutional muscle memory.

What Would Actually Change My Confidence

If I’m resisting confident predictions, I should at least name the signals that would move me in either direction.

Security that’s structural, not aspirational. The containerization moment for agents isn’t a better model. It’s when tool access gets hard versioning, attestation, and security scanning as defaults. The shipping container didn’t transform global trade because it was a better crate. It transformed trade because it was a standard, and standards unlock ecosystems. The malware events in plugin marketplaces right now are the equivalent of pre-standardization port chaos: real, dangerous, and not necessarily fatal to the category. But if this doesn’t get solved. Genuinely solved. Not papered over with disclaimers. The whole thing stalls. You can’t build an agent economy on a foundation where every plugin is a potential supply-chain attack.

Benchmark numbers that actually move. Not from 12% to 14%. From 12% to 35% on realistic, open-ended task sequences. That’s a different category of tool. That jump, if and when it happens, will matter more than any GitHub star count or marketing claim.

Boring institutional adoption. Not a case study on a vendor’s blog. The real signal is when an ops team somewhere quietly stops doing their Monday morning copy-paste workflow because the agent handles it reliably enough that nobody even made a decision to stop, they just stopped. I described that exact Monday morning ritual in the dot interface essay, the four project boards, the status updates copied into slides, the budget cross-references, the summary email to leadership. Every organization has some version of it. It’s pure glue work, and everyone knows it, and everyone still does it manually. The day that stops, not with a or a transformation initiative but with a shrug, will be the signal. That’s how email won. Not through evangelism. Through boring institutional gravity, the slow accumulation of “well, we just do it this way now” until the old way feels strange.

None of these signals have fully arrived. Some are forming. That’s exactly what early looks like. Or what a plateau looks like. I genuinely cannot tell you which.

The Part About My Dad

My dad eventually got email. Not because someone convinced him. Because the world reorganized around it until not having it cost more than having it. One day the school sent a notice by email, and then the bank, and then his brother on the other side of the contry, and then it was just how things worked. He didn’t adopt email, just stopped resisting the tide.

Nobody needed him to predict that outcome. Nobody needed him to be confident about the future of electronic communication. He just needed to not treat “useless right now” as a permanent truth about a thing that was still becoming whatever it was going to be.

The first SMS was sent on December 3rd, 1992. It said: Merry Christmas.

Not exactly a paradigm shift. Not exactly nothing, either. Just a beginning that didn’t look like one yet. The way beginnings almost never do.