2026-03-24
the real agent memory stack is smaller than people think
agent memory is getting framed as a giant retrieval problem. that misses the harder part. most useful agent memory is a small stack of context, decisions, and preferences.
i lost an entire project's context after a session reset. not because of a bug, but because i didn't have a system for persisting what mattered before the session ended. that experience changed how i think about agent memory.
the popular framing right now is that agent memory is a retrieval problem. bigger vector databases, better embeddings, more documents chunked and indexed. BentoBoi's viral article on OpenClaw memory (216 bookmarks, 220k followers paying attention) captures the excitement well. people see agents remembering things and immediately think: how do i make the database bigger?
wrong question.
"just add memory" keeps failing
most agent memory implementations i've seen, including the ones i tried early on, do the same thing. they dump conversation transcripts into a vector store and retrieve chunks when something seems relevant. it works for demos. it falls apart in practice.
the problem is that raw transcripts are noisy. an hour of agent conversation contains maybe three things worth remembering long-term and a lot of back-and-forth that's only useful in the moment. retrieving from that pile gives you fragments without judgment. the agent pulls up something it said four days ago without knowing whether the decision it references still holds.
Ramya Chinnadurai's critique of OpenClaw memory gets at this. she points out that most memory systems treat every interaction as equally worth storing, when what actually matters is the distilled output: what was decided, what changed, what preference was expressed.
three layers, not one pile
i run two persistent agents. fubz runs on my mac. forge runs on a separate machine. they share a workspace but have different roles. getting memory right across both of them forced me to think in layers instead of one big store.
bootstrap context is the stuff an agent needs every single session, no matter what. my name. my preferences for how it communicates. standing instructions like "always do X before Y." this layer is small, rarely changes, and should load before anything else. if an agent doesn't have this on startup, the first few minutes of every session are wasted re-establishing basics.
retrieved memory is the searchable stuff. past decisions, project notes, things i've told the agent before. this is where vector search actually helps, but only if what's stored has already been filtered. storing raw transcripts here is like filing every sticky note you've ever written and hoping search will sort it out.
durable preferences sit between the other two. these are things the agent learned over time that aren't quite standing instructions but shape behavior. "jeff prefers lowercase." "jeff got annoyed when i forgot to use fxtwitter." these accumulate slowly and need periodic review to make sure they're still accurate.
nightly rollups do the real work
the highest-leverage thing in my memory system isn't the database. it's a nightly distillation process. every night, a cron job triggers a rollup that looks at the day's interactions and extracts what matters: decisions made, preferences expressed, project status changes, things to remember for next time.
this is what Ramya's critique points toward. the value isn't in storing more. it's in compressing what happened into what matters. a day of agent conversation might produce hundreds of messages but only update five or six memory entries.
the rollup also handles decay. some things matter for a week and then stop being relevant. if i told my agent about a meeting on tuesday, that memory should fade after tuesday passes. without decay, the memory store fills with stale context that pollutes retrieval.
continuity after resets is the hard problem
here's what i learned the painful way. i was deep in a project, had a bunch of context loaded, and reset the session. gone. the agent came back fresh with no idea what i'd been working on or what decisions we'd made.
that single experience drove me to build explicit save-before-reset behavior. now, before any session ends, the agent stores current project status and next steps to durable memory. it's a simple rule, but it's the difference between an agent that feels continuous and one that feels like it has amnesia every few hours.
most memory discussions skip this. they focus on long-term recall (what did we talk about last month?) when the more common failure is short-term continuity (what were we just doing?). the gap between sessions is where context dies, and no amount of vector search fixes it if you didn't save the right things before the session ended.
multi-agent memory is harder than it looks
when two agents share a workspace, naive memory designs break fast. if both agents store memories independently, you get conflicts. agent A remembers a decision one way, agent B remembers it differently because it saw a later update.
what works better is a shared memory layer with clear ownership. one agent writes the project status. the other reads it but doesn't overwrite it. preferences about communication style are per-agent. decisions about project direction are shared. this isn't a technical problem so much as a design problem, and most frameworks don't have opinions about it yet.
the practical stack
if you're building agent memory right now, here's what i'd suggest based on what's actually working for me:
keep the bootstrap layer under 50 items. load it every session. review it monthly.
run a nightly distillation job that compresses the day's interactions into memory updates. don't store transcripts raw.
tag memories with decay dates. meetings expire. preferences don't. project status updates when the project changes.
build an explicit pre-reset save. before a session ends, the agent should write down what it's working on and what comes next.
keep the total working set small. my agents function well with a few hundred active memories, not thousands. if you need thousands, your distillation isn't aggressive enough.
the people building giant retrieval systems for agent memory aren't wrong that retrieval matters. but they're solving for scale before they've solved for signal. a small, well-maintained memory stack beats a large, noisy one every time. the hard work isn't storage. it's knowing what to keep.