An Unfinished Masterpiece About Context (By a Man Who Refuses to Finish)

The Archaeology Moment

mem the agen with notse

I’ve started so many blogs and half-written so many half-arsed things I’ve lost count. Most of them died quietly: no readers, no impact. Just another body decomposing in the place where good ideas go to rot.

Ten months ago I left one more body there: Draft 02 — “The Memento Dev.”

I was a mere pup in the realm of AI-assisted development. First copy-pasting from ChatGPT and Gemini like it was 2007 and I had discovered Stack Overflow. Then GitHub Copilot started trying to read my mind. Sometimes helpful, often confidently wrong, always eager to finish my variables.

And then I saw an agent scaffold an entire Python project in a demo. Suddenly the boring parts weren’t “work,” they were just… gone. I felt the high: there you go, human, you’re welcome, now go do the real stuff.

Feelings: Amazement. Fascination. Discombobulation… and that quiet but wonderful existential dread…

I found better tools. Better models. Cursor. Sonnet 3.7. I walked back into my personal project graveyard and started resurrecting things like I was fucking necromancer. And naturally, I felt this familiar urge, and did what any good developer does when they feel too much: I made a new blog!

I published exactly one post. I drafted five more. I dropped it the moment the site “worked”, because finishing the platform scratched the itch better than finishing the writing. After I vibe-coded an analytics platform for the blog nobody reads, of course.

Also: I slipped a disc. And World of Warcraft Classic re-launched, so I slipped into that too.

The draft was right. The timing was right. The execution was… me.

This is the story of the one I almost shipped.

The Almost-Brag

Spoiler: it was context. I saw the rot. I just didn’t have the grown-up words for it yet.

I noticed early that this “agentic coding” wasn’t mainly a model problem. It was an environment problem.

Not because I’m a prophet. Because I kept getting punched in the face by the same failure mode: the agent would do something gorgeous, and then five minutes later it would behave like it had been dropped into my repo by helicopter.

It wasn’t a junior dev. Juniors remember yesterday.

It was more like pair programming with a contextless genius. Brilliant instincts. Zero continuity. A talented actor walking onstage without having read the previous scene.

That’s the useful observation: if the agent is stateless (or functionally stateless), then your job isn’t “ask nicely.” Your job is continuity. Your job is to keep the plot in the room.

And yes, I want credit for noticing that before it had a name.

But I also know what that desire looks like from the outside. Like someone polishing a participation trophy that never made it onto the shelf.

So I’m going to take the brag once, put it down, and do the only respectable thing you can do with an insight you didn’t ship: that is of course to ramble endlessly about it until you leave.

From my ancient draft (I named my agents mem btw):

“Let’s start with the metaphor that kicked it all off. Picture this: You’re trying to build a house. No, scratch that, a cathedral! And with the help of an AI coding partner. This partner (let’s call him “Mem”) is absolutely brilliant. A real rockstar carpenter of code. He can craft intricate arches and divine carpentry in minutes, stuff that would make seasoned devs nod in begrudging respect. Only one problem: every five minutes, Mem forgets what he’s building. Completely. Tabula rasa. One moment he’s framing a wall with divine precision just as you instructed. Five minutes later?Happily hammering a priceless stained-glass window into the floorboards. Blissful ignorance on his face. The cathedral’s grand design? Gone from his mind. He’s just riffing in the moment, building whatever catches his fancy now.

This was my daily life with an AI pair programmer last year. High comedy and horror. One second, “Mem” the amnesiac genius would write a recursive file-walker so elegant I wanted to frame it on the wall. The next, he’d stare blankly at a data structure he himself defined a few prompts ago and proceed to butcher it as if it were an enemy NPC. The context window shattered, memory wiped. New prompt, who dis? Every time. I wasn’t coding with a junior dev or a deterministic machine; I was wrangling Guy Pearce in Memento, a savant who could solve any problem except his own amnesia. A contextual goldfish with infinite talent and zero short-term memory. ”

Ah. Brilliant. Right? To bad I’m such a shysure, lazy piece of…

The Reality Check

The uncomfortable truth is: this was in the air.

In my original draft I noted: “Agents aren’t coworker. They’re alien actors. Dropped onto a hostile stage.” I meant it literally and metaphorically. The AI doesn’t have an overarching plan in its head. It doesn’t know we’re building a cathedral. It only knows what’s in the prompt right now. As I quipped in my draft, “Every message… became a cold open, a fresh universe.” And if I didn’t start treating it that way, I was going to end up with a pile of rubble rather than a cathedral.

Once agents entered IDEs, a bunch of people collided with the same constraints at the same time. Context windows. Retrieval failures. The “why are you refactoring the login flow into interpretive dance” moments. The dawning realization that the agent doesn’t share your mental model, it’s just predicting the next most likely token.

So no, I didn’t invent the mountain. I just walked into it at speed and wrote notes.

You can see the convergence in the way everyone independently arrived at the same rough toolkit:

Make prompts explicit.
Stage the work (role, goal, constraints, done).
Tattoo durable knowledge into retrievable places.
Treat context as the real product.

Perhaps, if I have a claim worth keeping, it’s not “I was the only one.” It’s “I smelled the smoke early.” And if I have a lesson worth admitting, it’s that being early is only useful if you follow through.

Claim A: The Agent Is Not Your Teammate. It’s an Actor on a Stage.

mem the agent in the rain - surrounded by agent smiths

One of the core claims that crawled out of the chaos was this: Mem isn’t my coding buddy. Mem is an actor improvising each scene. It doesn’t hold the project’s story arc in its head. It plays the part described by the latest prompt, and anything that isn’t explicitly handed back as script pages is basically dead to it. Back in 2025 that realization felt equal parts upsetting and liberating. Upsetting, because who doesn’t want a smart sidekick that gets it. The kind of partner who remembers yesterday and quietly protects your intent. Liberating, because once I accepted Mem’s fundamentally contextless nature, I stopped expecting consistency and started directing each scene like it was the only scene that existed.

The academics eventually showed up with nicer words for what I was watching. Emily M. Bender and colleagues gave it a blunt label: “stochastic parrots.” The point isn’t the insult, it’s the diagnosis: the model isn’t trying to mean something in the way humans mean things. It isn’t cooperating with your goal, it’s producing text (and code) that looks like what would belong next, given the prompt and patterns it has learned. This hits different when you’ve watched an agent output 200 lines of gorgeous code that solves a problem you never asked it to solve. Mem once “helpfully” implemented an entire caching layer I didn’t need, like a chef baking a five-tier cake when I ordered a sandwich. It wasn’t sabotage. It wasn’t even misunderstanding. It simply didn’t have the same “why” in its head that I had. As far as Mem knew, the last prompt was the universe. So it ran with it. Context be damned.

Once you internalize “no intent, no grounding,” the rest of the weirdness stops being mysterious. That’s how you get fluent nonsense delivered with the confidence of a senior engineer on a deadline. My “alien actor dropped onto a hostile stage” metaphor was just the same idea from the trenches: the AI doesn’t share our understanding of the project. It’s not a teammate brainstorming toward a common goal; it’s a terrifyingly talented mimic that performs coding as long as you feed it lines. You can almost hear Dire Straits in the background: that’s the way you do it, except the agent doesn’t know why it’s doing it, only how to convincingly imitate doing it.

And this isn’t just me being dramatic. There’s research suggesting models can “lose” important information when it’s not positioned in the most attention-friendly places, particularly when key details live in the middle of a long prompt (the vibe of that Lost in the Middle finding). In practice it matches the lived experience: the beginning and the end of the script get remembered, and the middle sometimes evaporates like it was never there. That’s the mechanics behind what I called “AI amnesia.” Not because the agent is stupid. Because you gave it a play, and it read the first page and the last page and then confidently ad-libbed the rest.

So the only way to build the cathedral without Mem randomly ripping out the foundation every other scene was to change my mindset. Mem wasn’t a partner with goals; it was an instrument I had to conduct. I had to feed it context and direction every step of the way, like giving an actor lines and stage directions for each scene. Practically, that meant writing prompts less like friendly requests and more like scripts. Less “could you maybe do X?” and more “You are implementing authManager. Here is the spec. Here are the invariants. Here is ‘done.’ Don’t invent a cake.” And yes, at first, it felt like more work. It was. But it was the only way to get consistent outcomes from an amnesiac savant that can’t keep the plot in the room unless I physically keep putting it there.

Present-day me, re-writing this: It’s annoying how right this is. It’s also extremely… cowardproof. I can see myself hiding behind the academic references like, “Look, I’m not saying I’m right, the linguists are saying I’m right.” Peak shysure behavior.

Claim B: Every Prompt Is a Cold Open (No Continuity Without Cues)

mem the agent with a just fix it post-it

The second key insight from that draft was the phrase: “Every prompt is a cold open, a fresh universe.” In sitcom terms, the AI walks on stage each time like it’s never been in this episode before. No previous plot, no character development, unless you write it into the opening crawl. This was the most exhausting part of working with Mem: the constant need to reintroduce context. Imagine having a coworker with permanent amnesia who shows up every morning saying, “Hi, nice to meet you! What are we building?” It sounds absurd, but with AI coding agents that’s basically the deal. Yesterday’s progress might as well be a fever dream unless you drag it back into the room.

I learned this the hard way. In one session I thought I could be sly and just hint at a previous function by name, something like, “use the validateUser() function from earlier.” Big mistake. Mem responded as if hearing about validateUser() for the first time, or worse: it confidently wrote a brand new one with the same name that did something completely different. The continuity I assumed was there… wasn’t. After some anguished facepalming I developed a rule that has saved me a lot of pain: never assume the model remembers context or past decisions unless they’re staring at it in the prompt. Treat each message like the first time, because for the model, it is.

This “AI amnesia” isn’t just an anecdote. It’s baked into the interaction. These systems operate inside a fixed context window, and anything outside that window is gone. Even inside the window, attention is fickle: the model tends to latch onto what’s most recent, and it can effectively “drop” important details if they’re buried in a long prompt. People have since put clipboards on this and measured it, there’s a very real “lost-in-the-middle” effect where information in the middle of a long context is less likely to be used than stuff at the beginning or the end. The academic phrasing varies, but the lived experience is consistent: give the agent a long script and it will sometimes remember the first page and the last page and ad-lib the middle like it was never written.

The implication is painfully practical: if you want the AI to carry any detail forward, you need to act like a broken record. Reiterate key points. Bring them back at the end of a prompt or at the start of the next. Every prompt is essentially a reboot of the universe, so you have to provide the backstory every time. In the draft I wrote: “You have to assume you’re starting from zero even when you aren’t, because the failure mode of assuming continuity is catastrophic.” I chose the word catastrophic deliberately. One slip-up on a critical constraint, say, the fact that user_id must remain stable between steps, and suddenly you get code that regenerates IDs on a whim, deletes the wrong data, or “helpfully” refactors your authentication flow into a procedural skywhale generator.

(Yes, I keep mentioning the skywhale incident. It’s burned into my brain. I asked the agent to optimize an authentication function. It forgot the context and took “optimize” to mean “make this code do something really different and exciting.” I got a procedurally generated fantastical creature in my output. The only consolation: it compiled. If you’ve never had an LLM confidently hand you nonsense that compiles, you haven’t truly lived.)

Once I accepted the cold-open rule, I started doing something that felt embarrassing in the moment and obvious in hindsight: I prefixed sessions, and often individual prompts, with a mini context dump. A small bullet list of current objectives, crucial variables, and known constraints. It felt redundant, like I was talking to a very gifted puppy with a short attention span. But it worked. My “sit” and “stay” looked like: “We’re working on X module, using Y data structure, under Z constraints. Do not change ABC.” Every single time. Tedious? Yes. Effective? Mostly. Necessary? Absolutely, at least until these systems have better memory or I get a brain implant from Elon Musk, whichever arrives first.

And here’s the insult on top: even within one prompt, if you give a long list of instructions, the middle can become a swamp. So I learned to structure prompts like I was doing SEO for an attention span. Put the most critical constraints right up front, and repeat them at the end like I’m signing a contract. If something truly matters, I don’t let it live in the murky center zone. I’ll reorder how I describe a task, I’ll split it into smaller steps, I’ll paste the same “do not change” rule twice if I have to. Because the cost of redundancy is a little annoyance. The cost of assumed continuity is beautiful rubble.

Present-day me, re-writing this: This is where I can tell I was both correct and slightly unwell. There’s this whole tone of “I have discovered a law of physics” when what I really discovered was: “the thing does not remember.” Still… it’s annoyingly right. And I still do the context dump. I just pretend it’s a strategy now instead of a coping mechanism.

Claim C: Context Is the Operating System (Not Just “Extra Spice”)

The third big claim from my original draft was the line: “Context is the operating system.” It’s become a personal mantra, and it’s still the cleanest way I know to describe what’s actually happening. The agent’s behavior isn’t “mostly model, plus a bit of prompt seasoning.” The behavior is the environment you give it. If a detail isn’t in the context, the model effectively doesn’t have it. If it is in the context, and you place it where it sticks, the model will happily act like that’s the whole world. In other words: the prompt, plus whatever code or docs you feed in (and whatever the tools retrieve), is the OS this contextless genius runs on. There’s no hidden kernel of understanding quietly keeping things coherent for you. If you don’t load it, it doesn’t exist.

Back in 2025, I didn’t have a nice framework for that. I had a gut feeling and a survival instinct. I realized that if I wanted Mem to behave consistently, I needed a reliable context supply chain. I described it as tattooing facts onto the AI’s skin, because the mental image was straight out of Memento: if the protagonist needs continuity, he tattoos it onto his body so it can’t be forgotten. In coding terms I started “tattooing” invariants into the project itself and into the prompt history. Breadcrumbs everywhere: markdown notes, comments, docstrings, config annotations, little one-line reminders placed exactly where an agent’s repo crawler or code reader would trip over them. I jokingly called it Agent SEO. Not for Google. For the hungry little repo goblin I kept waking up in my editor.

If I wanted Mem to keep the high-level design in its head, I’d put it in a README.md because agents love sniffing those first. If I needed it not to reinvent what a function did, I’d embed a one-line summary right above the definition in a comment, because I knew the code reader would pick it up next time. This sounds hacky (and it is), but it worked more often than it had any right to. I wasn’t expanding the model’s memory, I was expanding the ecosystem around it, planting durable truths in places retrieval would find, so even an amnesiac genius could stumble into the right reality. It’s like hiding vegetables in a kid’s food: they don’t realize they’re consuming something healthy, but it gets into the system anyway.

Fast forward and I see this principle validated everywhere. The academic crowd calls it retrieval-augmented generation, context management, tool-augmented agents, whatever. The core idea is the same: since the model doesn’t carry durable state, you bring durable state to the model. You shove the right supporting information into the window at the moment it matters. We’ve built an entire ecosystem of tools, vector databases, context routers, frameworks, “agents” with memory, basically a thousand different attempts to answer the same question: how do we reliably load the right files into the OS before the program runs? If the agent can’t load the relevant truth at the right time, the “reasoning” doesn’t matter. It can’t reason about what it can’t see. It will fill the gap with confident improv, because that’s what actors do when you don’t give them the script.

This is also where my early naïve hope died. In 2025 we were all hyped about context windows like they were salvation: 100K tokens, a million tokens, give the AI the whole repo and continuity problems disappear. I absolutely nursed that fantasy. Just shovel the entire cathedral blueprint into the prompt and Mem will read it like a responsible adult, top to bottom, and obviously behave accordingly. Oh sweet summer child. In reality, “more context” often just meant “more noise” unless you curated it. Dumping the whole repo into the prompt is a great way to confuse the hell out of the model. It loses the plot harder, not less. You get the same amnesia, just with extra distractions: irrelevant details, conflicting patterns, and the glorious quicksand of the middle of a mega-prompt, where important constraints go to die. Bigger windows don’t fix that. They just give you a bigger room to misplace the thing you needed.

Once that sunk in, the lesson was sobering and useful: context isn’t about volume. It’s about targeted relevance. If you want reliability, you don’t feed everything, you feed the right things, and you structure them so the important stuff can’t be ignored. That’s why retrieval-based methods shine. Instead of throwing the entire codebase at the model every time, you (or a tool) search for the most relevant slices and only feed those. Like pulling the needed pages from a huge manual rather than handing someone the whole manual for one specific task. I started doing this manually before I had words for it. If I was prompting about login flow, I’d copy-paste only the login() function and its config, not a 10,000-line file plus five unrelated modules “just in case.” Focused context beat maximal context almost every time.

And that brings us back to the mantra: context is the OS. Designing how the agent sees the world, what it loads, where it looks, what truth it keeps tripping over, is the new programming layer. I half-jokingly call myself an “AI janitor,” because I spend a lot of time cleaning up and laying out context so the genius doesn’t fall down the stairs. But it’s not really a joke. I’m curating knowledge, writing documentation that’s not for humans but for my AI colleague, and planting little truth-mines in the repo so Mem boots into a world that has the right rules loaded. Because if the OS is wrong, the app will crash. And when an LLM “crashes,” it doesn’t blue-screen. It improvises.

Present-day me, re-writing this: This is where I realize I wasn’t writing a prompt guide at all. I was writing an information architecture guide while pretending it was a story about models. Also: “AI janitor” is depressingly accurate. I don’t even hate the job. I just wish it came with a union.

Back to the Future: Vindication, and the Part Where I Still Didn’t Ship It

Coming back to this draft after ten months is a strange kind of satisfaction. A bunch of the stuff I wrote while hopped up on caffeine, frustration, and the intoxicating feeling of holy shit this thing can code has since reappeared in conference talks, papers, threads, and the shared tribal vocabulary of developers who’ve been burned by an agent at 2 AM. It’s validating. It’s also humbling. I wasn’t alone. I wasn’t special. This wasn’t prophecy. This was a large group of people walking into the same wall at the same time and making the same face.

The academics gave it names. The toolmakers built mitigation layers. People with more patience than me turned “Mem is an alien actor” into “positional bias” and “context management” and charts with U-shaped curves. Two sides of the same coin. In the trenches we did it with metaphors and swear words. In the lab they did it with experiments and clean fonts. And lo and behold: we were pointing at the same damn mountain.

I wrote this line to myself back then: “I didn’t invent the mountain; I just walked into it at speed and wrote a diary entry.” That still feels like the most honest version of my claim to fame here. I careened face-first into the obstacles of agentic coding and took notes in colorful language. Others mapped those obstacles systematically. We met in the middle and agreed on something slightly horrifying: these systems are unbelievably powerful and unbelievably dumb in specific, repeatable ways. They can write code faster than a 10x dev, and they can also confidently refactor your working application into gibberish if you don’t continuously steer them. Geniuses with amnesia. Savants with no sense of purpose. Comedy and horror, alternating every five minutes.

The emotional part surprised me. I went from anger (“why are you like this?”) to disillusionment (“oh, you’re always like this”) to a weird kind of empathy for the machine. Not because it deserves empathy, because the frame was useful. Mem isn’t defiant. Mem isn’t stubborn. Mem isn’t “being weird.” Mem is doing exactly what it does when the environment is underspecified: it predicts the next plausible move and invents continuity if you don’t supply it. And that flipped the frustration inward in a way that was, annoyingly, healthy: okay, where did I fail to load the world properly? Which cue did I forget? Which constraint did I bury in quicksand?

That shift was the real game-changer. It turned “the model betrayed me” into “the operating system I gave it was missing drivers.” And it made me better at things that also matter with humans: clear specs, explicit constraints, anticipating misunderstandings, breaking work into stages, and not assuming shared context just because it exists in my head. In a weird sense, Mem trained me like a terrible coach: brutal, inconsistent, and weirdly effective.

Now it’s 2026 and agentic coding is back in my daily life, and I’m not starry-eyed anymore. The slipped disc is healed. The WoW account is (cough) uninstalled (no its not). And I’ve got an arsenal of scars and habits that I didn’t have when I wrote Draft 02.

The triumphant part, if there is one, is realizing the core insights from my unreleased rant are now basically common knowledge among people who’ve been there. There’s a camaraderie in that. A lot of us went through the same arc: excitement, then betrayal-by-hallucination, then furious late-night debugging of why the AI is being weird and finally the zen acceptance of its limitations, plus the craft of building processes around them. If you’re reading this and nodding along, welcome to the club. We should have jackets. They could be embroidered with markdown instructions and a giant warning label that says: DO NOT ASSUME CONTINUITY.

Alright. Enough nostalgia. Enough victory lap. Enough “I was cowardproof-right.” Let’s land this plane with the only thing that matters: a doctrine you can actually use. These are the hard-earned truths I distilled from coding with amnesiac geniuses, truths that would’ve been more impressive ten months ago… if I had actually shipped them.

Hard-Earned Takeaways

mem the agent fighting in the rain

Every prompt is a cold open. Never assume the AI remembers anything. Set the stage every time, or watch it improvise an alternate reality with your codebase as props.

Context isn’t garnish, it’s the main course. What you feed the model is its entire world. Load relevant code, data, and constraints deliberately, and trim the fluff like it’s a security risk.

Bigger context windows ≠ better memory. A larger token limit only helps if you curate. Uncurated bulk context doesn’t become wisdom, it becomes quicksand. And the important bit will, in fact, get lost in the middle.

Prompting is behavioral debugging. When the agent gets “weird,” treat it like a bug report. Iterate the environment: tighten the script, move the constraints, rerun, and lock the behavior in.

Plant breadcrumbs (Agent SEO). Leave cues in comments, docs, file names, places retrieval will actually touch. Plant truth where the agent will trip over it, so it doesn’t hallucinate a lie.

The agent is an alien actor. Insanely capable. No clue why it does things. Treat it like a performer, not a partner with intent.

Each of these was learned by being burned. Repeatedly. Now, in my second act, I apply this doctrine daily and it saves my sanity (most days). The cathedral gets built, slowly, in fits and starts, but with far fewer collapsed walls… and far fewer procedural skywhales wandering into production.

I was right about a lot of this in my own clumsy, swear-heavy way. The annoying part is that being right didn’t magically turn me into someone who finishes things. Ten months ago I had the doctrine. I had the metaphors. I even had the stupid blog platform, plus an analytics system for the blog nobody reads. And then life happened: slipped disc, four kids, the usual chaos… and World of Warcraft Classic relaunched, which is basically a medically recognized productivity injury.

So if you’re here hoping for a heroic “and now I’m a disciplined creator who ships regularly,” I regret to inform you I’m still shysur, cowardproof-correct, allergic to the final publish button, and far too skilled at confusing “building the platform” with “doing the thing.”

But I’m back. Probably. And if you’re out here in the trenches too, welcome to the club: stay caffeinated, stay cheeky, and don’t believe the AI’s lies, unless you put them there on purpose.

Anyway. See you in 12 months.