What is the Karpathy memory system?

It is a pattern for giving an AI persistent, file-based memory: a small set of plain text or markdown files the model reads at the start of every session and updates as it works. It is inspired by Andrej Karpathy's framing of the context window as the scarce resource. The model is the processor, the context is RAM, and durable memory has to live in files outside the chat. It is not an official Karpathy product; it is a community pattern named after the idea.

How is it different from RAG or vector memory?

RAG retrieves chunks from an embedded store at query time. A Karpathy-style memory system is smaller, human-readable, and deterministic: you can open the files, read them, and edit them by hand. It favors a curated index and a few high-value pages loaded per session over similarity search across a large corpus. The two can coexist, but the memory layer is the durable, owned core.

What does CLAUDE.md have to do with it?

CLAUDE.md is the front door. It is the file an assistant like Claude reads first, and it points to the rest of the memory: the wiki index, the rules, the running log. CLAUDE.md is where you tell the model how to bootstrap its own memory at the start of every session.

Start with one folder of markdown, one index file as the front door, and a rule that reading it is the first step of every session. Add enforcement later: a hook that flags changed files, a scheduled pass that hunts stale pages. A full, free, forkable template is documented at eqctrl.io/karpathy+.

The Karpathy Memory System: persistent memory for your AI, explained

Persistent, file-based memory your AI reads at the start of every session, in plain language, and how to build one yourself.

The one-line version

A Karpathy-style memory system is a small set of plain text or markdown files that your AI reads at the start of every session and updates as it works. The chat window forgets; the files don't. That's the whole idea.

On the name. This pattern is named after Andrej Karpathy's framing of how to think about LLMs, not an official project of his. The mental model: the model is the processor, the context window is RAM, and anything you want to survive past a single session has to live in files, outside the chat. The system below is an independent take on that idea, built for daily use.

Why the context window is the bottleneck

Every session with an AI starts from near-zero. You re-explain your project, your preferences, the mistake you corrected yesterday. The context window is finite and resets, so the same context gets re-loaded by hand, forever. People try to fix this by chasing a bigger or smarter model. But a bigger engine doesn't help if the car has no trunk.

The fix is to stop treating memory as something the model provides and start treating it as something you own. You write down what the AI should never have to be told twice, you keep it in files, and you make reading those files the first step of every session.

What the system actually contains

An index file. The front door. It lists what exists and what to read for a given kind of work, so the model loads a few relevant pages, not the whole pile.
Topic pages. One durable fact or area per file: how your infrastructure fits together, project state, conventions.
A regressions file. Every mistake the AI makes that you correct, one line each, loaded every session. The cheapest behavior change per word there is.
A running log. What changed and why, so the next session inherits the reasoning, not just the result.

The part that makes it stick: enforcement

Notes rot. A memory system that depends on you remembering to update it will drift out of date and become a liability. So the system assumes rot and fights it mechanically: a hook that flags changed files for re-documentation, a scheduled pass that hunts stale pages and dead links, and a rule that work isn't "done" until the docs reflect it. None of that needs intelligence. It needs plumbing.

Why file-based and model-agnostic

Because the tools change every week and the files don't. When a new model ships, you point it at the same memory and it picks up where the last one left off, with your context instead of a blank chat. The model is the engine, and the engine swaps. The memory is the car. Plain files also mean you can read, edit, version, and own the whole thing without a vendor in the loop.

Frequently asked

Is this an official Karpathy project?: No. It's a community pattern named after his framing of context as the scarce resource. The implementation here is independent.
How is it different from RAG or vector memory?: RAG retrieves chunks from an embedded store by similarity at query time. This is smaller, human-readable, and deterministic: a curated index and a few high-value pages, files you can open and edit by hand. They can coexist; the memory layer is the durable, owned core.
Where does CLAUDE.md fit?: CLAUDE.md is the front door: the file the assistant reads first, which points to the rest of the memory and tells the model how to bootstrap itself each session.
Do I need to be technical?: To start, no. One folder and one index file change the experience in a day. The enforcement layer is where it helps to have someone wire the plumbing.

The Karpathy memory system, explained