Karpathy+
A system for giving LLMs structured, persistent memory that survives across sessions, surfaces, and projects. Built on Karpathy's wiki pattern, extended with the enforcement infrastructure that makes knowledge maintenance automatic instead of aspirational.
00TL;DR
The goal: a shared memory between you and your AI that survives every session, updates when things change, and doesn't quietly fail.
Three layers (sources, wiki, schema). Three operations (ingest, query, lint). One hard boundary (knowledge vs runtime). Enforcement by defense in depth: behavior rules, a session hook, scheduled lint, git history, and a constitutional completion gate. Progressive disclosure keeps the per-session cost low: schema plus index plus two to four pages, not the whole corpus.
Build: under an hour. Steady state: a few minutes of human review per week. Failure mode closed: silent drift.
01Why Most Systems Fail
LLM memory systems tend to die in the same three ways. Every design decision here is a response to one of them.
02Three Layers
Knowledge flows upward with increasing structure and decreasing volume.
- Sources (
wiki/sources/) are raw inputs: clipped articles, meeting transcripts, tool-intel feeds, bookmarks. Messy. The LLM synthesizes upward. - Wiki (
wiki/) is the curated layer. Target: 15 to 30 pages. LLM-maintained. - Schema (
CLAUDE.md, ~50 lines) is behavior rules plus wiki conventions. No knowledge lives here.
03Boundary & Buckets
One rule prevents the three-copy drift problem before it starts:
Knowledge goes in~/AI/. Runtime goes in~/.claude/.
~/AI/wiki/(canonical)- Project repos
- Career, personal, outputs
- Raw sources before curation
~/.claude/- Schema (
CLAUDE.md) - Hooks, skills, settings
- Sessions, history, telemetry
- Auto-memory
Inside ~/AI/, no loose files at root. Every file lives in a bucket: wiki/, project repos, career/, personal/, OUTPUTS/, CLIENT-OUTPUTS/, inbox/ (untriaged), scratch/ (throwaway). Archive and backups live at ~/Archive/, outside ~/AI/ entirely, because the wiki's MCP is scoped to ~/AI/ and archive inside would pollute every search.
Tool scope is a design input, not a deployment detail. Any scoped tool (MCP server, search index, hook watchpath) creates a zone where content placement is no longer neutral. State it up front.
04Three Operations
Ingest
Triggered after deploys, debugging, corrections, decisions. The LLM updates relevant wiki pages and appends to log.md with a Judgment: line explaining the reasoning behind a fix or tradeoff.
Judgment: lines are not decoration. They are the retrieval path that makes the system self-correcting. A future Claude reading the log doesn't just see what happened, it sees what constraints are still binding. Load-bearing reasoning in Judgment: lines means the system resists drift from the inside.
Query
Triggered at the start of every session. The LLM reads INDEX.md and picks two to four relevant pages. Total injected context: about 50 lines of schema, 30 lines of index, the small set of relevant pages. The old failure mode (loading four hundred lines of rules before the human types) becomes structurally impossible.
Caveat for plan execution: when the task is executing a plan, the plan is not self-sufficient context. Load the project pages the plan references before taking its "next steps" as scope. Plans decay faster than the project state they depend on.
Lint
Runs weekly or on demand. A single slash command performs queue processing, freshness check, broken-link scan, task aggregation, log-gap check, source scan, log rotation, auto-memory drift check, runtime audit, and heartbeat stamp. Output: auto-fix pass plus triage report. Human responds do it / defer it / kill it.
Lint has a second life as a pattern detector. When it surfaces the same issue twice, that is the signal to build a hook or a guard. Rule: flag once via lint, build infrastructure only when the pattern fires twice. Prevents over-engineering on one-time problems.
05Defense in Depth
Discipline alone fails during fires. Multiple independent safety nets, each cheap to run always.
| Layer | Role | Mode |
|---|---|---|
| Schema | Behavior instructions loaded every session | File state |
| Session hook | Writes changed paths to .update-queue, flags LOG GAP when edits skip log.md | Tripwire |
| Wiki lint | Weekly multi-step pass, triage surface | File state |
| Git repo | Wiki is its own repo; divergence visible in git status | File state |
| Heartbeat | "Last lint" timestamp on every session's first read | File state |
| Scheduled agents | Produce expected outputs; a verifier checks freshness | Output verification |
| Completion gate | Nothing is "done" until docs reflect it, smoke passes, deploy followed its checklist | Human contract |
Output verification is the enforcement mode that closes the runtime-state gap. Lint reads files and cannot see whether a cron or launchd agent is actually loaded and running. An output verifier checks "did the scheduled job produce today's file, on time?" Any scheduled agent should declare an expected output path.
06Propagation is a Snapshot
A small principle with wide reach: copies detach from their source at propagation time. Verification is required at consumption, not at creation.
- Auto-memory (
~/.claude/projects/*/memory/) is a copy. If it and the wiki disagree, the wiki wins. Lint step diffs key facts between them. - UI-surface memories (Cowork global instructions, IDE settings) are copies pasted at a point in time. Schema changes require active propagation to each surface.
- Forwarded session instructions are copies of a prior session's verification state. Treat a forwarded instruction as a claim, not a direction. Re-derive scope from current wiki before executing.
Where possible, remove duplicated facts from copies and point them to the wiki. A hardcoded page count will drift. A link cannot.
07Hard Limits
| Limit | Mitigation |
|---|---|
| Hooks can't do LLM reasoning | Hooks are tripwires; lint plus behavior rules do the real enforcement |
| Enforcement is blind to runtime state | Output verifiers check that scheduled agents produced expected outputs |
| Plans decay faster than pages | last_reconciled: frontmatter plus a lint step that flags plans older than the projects they cite |
| Schema has a soft cap (~80 lines) | Stay well under; put knowledge in pages, not rules |
| Sessions are ephemeral | Schema points to INDEX.md every session, no state to lose |
| Auto-memory lives outside your control | Boundary rule plus drift check contains it |
08Build It Yourself
The core pattern is general. Anyone can adapt it.
Hand the build guide .md to your AI. It's written for Claude (or similar) to execute with you step-by-step — walk through setup together and custom-build your own version along this path. Directories, CLAUDE.md, first pages, lint, session hook — in under an hour.
Minimum viable wiki, five steps
- Create
wiki/withINDEX.mdandlog.md. - Write a lean
CLAUDE.md(~50 lines): style rules, "readINDEX.mdfirst," update conventions, the boundary rule. - Create three to five pages covering your most-repeated context.
- Set up lint. Even a manual weekly review counts at the start.
- Use it for two weeks before adding complexity.
Keep these (universal)
Three layers, boundary rule, INDEX.md as entry point, log.md with Judgment: lines, lint as scheduled enforcement, page template for stable parse, defense in depth, completion gate.
Adapt these (local)
Page categories to your domains, lint schedule and notification medium, task tracker of choice, surface routing (skip entirely if single-surface), domain isolation (only matters with multiple clients).
Pushable by default. The wiki is designed to live as a git repo that can be cloned. This is what enables cloud-scheduled agents (Routines, GitHub Actions, external CI) to read canonical state without a filesystem mount. Not a deployment detail, an architectural choice.
The system grows naturally. Sources accumulate. Pages get added when you notice yourself repeating context. Lint catches what you forget. Everything beyond the minimum is earned by need, not added by default.
09What's Next
Planned, not built.
last_reconciled:frontmatter for plans. A plan carries bothlast_updated(any edit) andlast_reconciled(references re-verified against current project state). A lint step flags any plan whoselast_reconciledpredates thelast_updatedof a project page it cites. Cost: one frontmatter key plus one lint step. Deferred until the next plan-execution session consumes it.- Richer auto-memory audit. Current drift check compares page counts, paths, project names. A full audit would diff every factual claim in auto-memory against the wiki.
- Local search over the wiki (qmd or equivalent) once page count crosses ~50. Grep holds below that.
Everything else is earned by need, not added by default. The system is small on purpose.
The system is working well, and building it has been a genuinely valuable learning experience. That said — I'm actively looking forward to replacing most of it.
This page is the amalgamation of the best thinking I've found: Karpathy, Atlas Forge, arscontexta, a handful more. Assembled deliberately, they work. But every layer here is craft — built, maintained, disciplined into use. The wiki is good. It's also a lot of work.
I'm working with the team at HipAI to bring these context-graph capabilities to builders like me — people who want to use, experiment, and build on top of a proper graph without assembling the substrate themselves. An enterprise-grade graph, productized: same failure modes closed, far less setup, relationship traversal the flat wiki pattern can't match. This wiki will keep evolving because the craft is interesting; HipAI is largely what replaces it.
Join the HipAI waitlist10Inspirations
The compounding principle (Atlas Forge) is the spine. The wiki pattern (Karpathy) is the skeleton. The boundary rule, defense in depth, heartbeat, and completion gate are the load-bearing response to failure modes each source acknowledged but did not fully solve.
Tactics, references, and teardowns — when I publish them.
Low volume. No spam. Unsubscribe anytime. Platform TBD — for now your email lives in a CSV I'll port into a real tool later.