The Context System · v1

Karpathy+

A system for giving LLMs structured, persistent memory that survives across sessions, surfaces, and projects. Built on Karpathy's wiki pattern, extended with the enforcement infrastructure that makes knowledge maintenance automatic instead of aspirational.

by entropy / eqctrl · v1 · 2026-04-20

download build guide .md Hand this to your AI to recreate the system.

00TL;DR

The goal: a shared memory between you and your AI that survives every session, updates when things change, and doesn't quietly fail.

Three layers (sources, wiki, schema). Three operations (ingest, query, lint). One hard boundary (knowledge vs runtime). Enforcement by defense in depth: behavior rules, a session hook, scheduled lint, git history, and a constitutional completion gate. Progressive disclosure keeps the per-session cost low: schema plus index plus two to four pages, not the whole corpus.

Numbers

Build: under an hour. Steady state: a few minutes of human review per week. Failure mode closed: silent drift.

01Why Most Systems Fail

Three ways LLM memory systems usually die. Every later decision is built to avoid one of them.

LLM memory systems tend to die in the same three ways. Every design decision here is a response to one of them.

01 · LOAD
Too heavy to load, so it gets skipped
Session-start gates that require reading a stack of files get deprioritized as the context window fills. A system that only works on turn one is a system that doesn't work.
02 · DRIFT
Overlapping files the LLM itself creates
A rules file, a lessons file, a corrections file: three files, same knowledge, different structures, drifting independently. Two copies of anything will drift.
03 · SILENCE
Enforcement fails during fires
You fix the bug urgently, but context isn't captured because the session is focused on the fire. Weeks later the same class of problem recurs.

02Three Layers

Raw notes at the bottom. Curated pages in the middle. Behavior rules at the top. The AI reads only what fits the task.

Knowledge flows upward with increasing structure and decreasing volume.

Sources
wiki/sources/
Raw inputs. Messy. LLM synthesizes upward.
Wiki
wiki/
Curated knowledge. 15-30 pages. LLM-maintained.
Schema
CLAUDE.md
Behavior rules + wiki conventions. ~50 lines. No knowledge.
Session
context-aware LLM
Live work. Ephemeral. Reads 2-4 pages per task.
  • Sources (wiki/sources/) are raw inputs: clipped articles, meeting transcripts, tool-intel feeds, bookmarks. Messy. The LLM synthesizes upward.
  • Wiki (wiki/) is the curated layer. Target: 15 to 30 pages. LLM-maintained.
  • Schema (CLAUDE.md, ~50 lines) is behavior rules plus wiki conventions. No knowledge lives here.

03Boundary & Buckets

A hard wall between what you know and how the AI runs. Everything lives in one place only — never both.

One rule prevents the three-copy drift problem before it starts:

Knowledge goes in ~/AI/. Runtime goes in ~/.claude/.
~/AI/
Knowledge + work
  • wiki/ (canonical)
  • Project repos
  • Career, personal, outputs
  • Raw sources before curation
~/.claude/
Runtime only
  • Schema (CLAUDE.md)
  • Hooks, skills, settings
  • Sessions, history, telemetry
  • Auto-memory

Inside ~/AI/, no loose files at root. Every file lives in a bucket: wiki/, project repos, career/, personal/, OUTPUTS/, CLIENT-OUTPUTS/, inbox/ (untriaged), scratch/ (throwaway). Archive and backups live at ~/Archive/, outside ~/AI/ entirely, because the wiki's MCP is scoped to ~/AI/ and archive inside would pollute every search.

Principle

Tool scope is a design input, not a deployment detail. Any scoped tool (MCP server, search index, hook watchpath) creates a zone where content placement is no longer neutral. State it up front.

04Three Operations

Three things you keep doing: write after changes, read only what you need, check the whole system weekly.
01 · WRITE
Ingest
After deploys, debugging, corrections, decisions. Update pages + append to log.md.
Judgment: lines
02 · READ
Query
At session start. Read INDEX.md, pick 2-4 relevant pages, work.
progressive disclosure
03 · CHECK
Lint
Weekly or on demand. Drift, links, staleness, triage report.
pattern detector
repeats each session

Ingest

Triggered after deploys, debugging, corrections, decisions. The LLM updates relevant wiki pages and appends to log.md with a Judgment: line explaining the reasoning behind a fix or tradeoff.

Judgment: lines are not decoration. They are the retrieval path that makes the system self-correcting. A future Claude reading the log doesn't just see what happened, it sees what constraints are still binding. Load-bearing reasoning in Judgment: lines means the system resists drift from the inside.

Query

Triggered at the start of every session. The LLM reads INDEX.md and picks two to four relevant pages. Total injected context: about 50 lines of schema, 30 lines of index, the small set of relevant pages. The old failure mode (loading four hundred lines of rules before the human types) becomes structurally impossible.

Caveat for plan execution: when the task is executing a plan, the plan is not self-sufficient context. Load the project pages the plan references before taking its "next steps" as scope. Plans decay faster than the project state they depend on.

Lint

Runs weekly or on demand. A single slash command performs queue processing, freshness check, broken-link scan, task aggregation, log-gap check, source scan, log rotation, auto-memory drift check, runtime audit, and heartbeat stamp. Output: auto-fix pass plus triage report. Human responds do it / defer it / kill it.

Lint has a second life as a pattern detector. When it surfaces the same issue twice, that is the signal to build a hook or a guard. Rule: flag once via lint, build infrastructure only when the pattern fires twice. Prevents over-engineering on one-time problems.

05Defense in Depth

No single guardrail is enough. Stack a handful of small ones and the system only breaks when they all miss together.

Discipline alone fails during fires. Multiple independent safety nets, each cheap to run always.

01
Schema
file state
02
Session hook
tripwire
03
Wiki lint
file state
04
Git repo
file state
05
Heartbeat in index
file state
06
Scheduled agents
output verify
07
Completion gate
human contract
LayerRoleMode
SchemaBehavior instructions loaded every sessionFile state
Session hookWrites changed paths to .update-queue, flags LOG GAP when edits skip log.mdTripwire
Wiki lintWeekly multi-step pass, triage surfaceFile state
Git repoWiki is its own repo; divergence visible in git statusFile state
Heartbeat"Last lint" timestamp on every session's first readFile state
Scheduled agentsProduce expected outputs; a verifier checks freshnessOutput verification
Completion gateNothing is "done" until docs reflect it, smoke passes, deploy followed its checklistHuman contract

Output verification is the enforcement mode that closes the runtime-state gap. Lint reads files and cannot see whether a cron or launchd agent is actually loaded and running. An output verifier checks "did the scheduled job produce today's file, on time?" Any scheduled agent should declare an expected output path.

06Propagation is a Snapshot

A copy stops updating the moment it leaves home. Always check the original before acting on auto-memory, UI state, or forwarded instructions.

A small principle with wide reach: copies detach from their source at propagation time. Verification is required at consumption, not at creation.

  • Auto-memory (~/.claude/projects/*/memory/) is a copy. If it and the wiki disagree, the wiki wins. Lint step diffs key facts between them.
  • UI-surface memories (Cowork global instructions, IDE settings) are copies pasted at a point in time. Schema changes require active propagation to each surface.
  • Forwarded session instructions are copies of a prior session's verification state. Treat a forwarded instruction as a claim, not a direction. Re-derive scope from current wiki before executing.

Where possible, remove duplicated facts from copies and point them to the wiki. A hardcoded page count will drift. A link cannot.

07Hard Limits

Things this system can't fully solve. Naming them honestly beats pretending it's airtight.
LimitMitigation
Hooks can't do LLM reasoningHooks are tripwires; lint plus behavior rules do the real enforcement
Enforcement is blind to runtime stateOutput verifiers check that scheduled agents produced expected outputs
Plans decay faster than pageslast_reconciled: frontmatter plus a lint step that flags plans older than the projects they cite
Schema has a soft cap (~80 lines)Stay well under; put knowledge in pages, not rules
Sessions are ephemeralSchema points to INDEX.md every session, no state to lose
Auto-memory lives outside your controlBoundary rule plus drift check contains it

08Build It Yourself

The five-step starter kit. Build in under an hour. Live with it two weeks. Add complexity only when you actually miss it.

The core pattern is general. Anyone can adapt it.

Fastest Path

Hand the build guide .md to your AI. It's written for Claude (or similar) to execute with you step-by-step — walk through setup together and custom-build your own version along this path. Directories, CLAUDE.md, first pages, lint, session hook — in under an hour.

Minimum viable wiki, five steps

  1. Create wiki/ with INDEX.md and log.md.
  2. Write a lean CLAUDE.md (~50 lines): style rules, "read INDEX.md first," update conventions, the boundary rule.
  3. Create three to five pages covering your most-repeated context.
  4. Set up lint. Even a manual weekly review counts at the start.
  5. Use it for two weeks before adding complexity.

Keep these (universal)

Three layers, boundary rule, INDEX.md as entry point, log.md with Judgment: lines, lint as scheduled enforcement, page template for stable parse, defense in depth, completion gate.

Adapt these (local)

Page categories to your domains, lint schedule and notification medium, task tracker of choice, surface routing (skip entirely if single-surface), domain isolation (only matters with multiple clients).

Architecture

Pushable by default. The wiki is designed to live as a git repo that can be cloned. This is what enables cloud-scheduled agents (Routines, GitHub Actions, external CI) to read canonical state without a filesystem mount. Not a deployment detail, an architectural choice.

The system grows naturally. Sources accumulate. Pages get added when you notice yourself repeating context. Lint catches what you forget. Everything beyond the minimum is earned by need, not added by default.

09What's Next

Features I haven't built yet — plus the productized system I'm looking forward to replacing most of this with.

Planned, not built.

  • last_reconciled: frontmatter for plans. A plan carries both last_updated (any edit) and last_reconciled (references re-verified against current project state). A lint step flags any plan whose last_reconciled predates the last_updated of a project page it cites. Cost: one frontmatter key plus one lint step. Deferred until the next plan-execution session consumes it.
  • Richer auto-memory audit. Current drift check compares page counts, paths, project names. A full audit would diff every factual claim in auto-memory against the wiki.
  • Local search over the wiki (qmd or equivalent) once page count crosses ~50. Grep holds below that.

Everything else is earned by need, not added by default. The system is small on purpose.

The system is working well, and building it has been a genuinely valuable learning experience. That said — I'm actively looking forward to replacing most of it.

The Drop-In Upgrade
HipAI — the context graph this wiki is doing the hard way

This page is the amalgamation of the best thinking I've found: Karpathy, Atlas Forge, arscontexta, a handful more. Assembled deliberately, they work. But every layer here is craft — built, maintained, disciplined into use. The wiki is good. It's also a lot of work.

I'm working with the team at HipAI to bring these context-graph capabilities to builders like me — people who want to use, experiment, and build on top of a proper graph without assembling the substrate themselves. An enterprise-grade graph, productized: same failure modes closed, far less setup, relationship traversal the flat wiki pattern can't match. This wiki will keep evolving because the craft is interesting; HipAI is largely what replaces it.

Join the HipAI waitlist

10Inspirations

Six sources I borrowed from. Each contributed one load-bearing idea.

The compounding principle (Atlas Forge) is the spine. The wiki pattern (Karpathy) is the skeleton. The boundary rule, defense in depth, heartbeat, and completion gate are the load-bearing response to failure modes each source acknowledged but did not fully solve.

See also Claude Guide Start with chats and Projects. Ramp into surfaces, modes, and Opus 4.7 effort levels. One sheet.