The one-line version
A Karpathy-style memory system is a small set of plain text or markdown files that your AI reads at the start of every session and updates as it works. The chat window forgets; the files don't. That's the whole idea.
Why the context window is the bottleneck
Every session with an AI starts from near-zero. You re-explain your project, your preferences, the mistake you corrected yesterday. The context window is finite and resets, so the same context gets re-loaded by hand, forever. People try to fix this by chasing a bigger or smarter model. But a bigger engine doesn't help if the car has no trunk.
The fix is to stop treating memory as something the model provides and start treating it as something you own. You write down what the AI should never have to be told twice, you keep it in files, and you make reading those files the first step of every session.
What the system actually contains
- An index file. The front door. It lists what exists and what to read for a given kind of work, so the model loads a few relevant pages, not the whole pile.
- Topic pages. One durable fact or area per file: how your infrastructure fits together, project state, conventions.
- A regressions file. Every mistake the AI makes that you correct, one line each, loaded every session. The cheapest behavior change per word there is.
- A running log. What changed and why, so the next session inherits the reasoning, not just the result.
The part that makes it stick: enforcement
Notes rot. A memory system that depends on you remembering to update it will drift out of date and become a liability. So the system assumes rot and fights it mechanically: a hook that flags changed files for re-documentation, a scheduled pass that hunts stale pages and dead links, and a rule that work isn't "done" until the docs reflect it. None of that needs intelligence. It needs plumbing.
Why file-based and model-agnostic
Because the tools change every week and the files don't. When a new model ships, you point it at the same memory and it picks up where the last one left off, with your context instead of a blank chat. The model is the engine, and the engine swaps. The memory is the car. Plain files also mean you can read, edit, version, and own the whole thing without a vendor in the loop.
Frequently asked
- Is this an official Karpathy project?
- No. It's a community pattern named after his framing of context as the scarce resource. The implementation here is independent.
- How is it different from RAG or vector memory?
- RAG retrieves chunks from an embedded store by similarity at query time. This is smaller, human-readable, and deterministic: a curated index and a few high-value pages, files you can open and edit by hand. They can coexist; the memory layer is the durable, owned core.
- Where does CLAUDE.md fit?
- CLAUDE.md is the front door: the file the assistant reads first, which points to the rest of the memory and tells the model how to bootstrap itself each session.
- Do I need to be technical?
- To start, no. One folder and one index file change the experience in a day. The enforcement layer is where it helps to have someone wire the plumbing.