A Practical Guide to Claude Code Memory

March 24, 2026

Memory

What is Claude Code memory? It’s the system by which you provide persistent instructions to the model across sessions. You write rules in files called CLAUDE.md at various levels of your project, and Claude Code assembles them into a block of text that gets included in every interaction. If you’ve spent any time configuring this system, you’ve probably noticed something odd. Claude follows your rules for a while, then quietly stops. You didn’t change anything. The rules are still in the file. But the behavior shifted.

This isn’t a bug. It’s an architectural property of how Claude Code assembles and delivers instructions to the model. I spent some time looking at the Claude Code binary (v2.1.81) to understand the mechanics, and it turns out the answer is more structural than “the model is unpredictable.”

Loading

Claude Code has an internal function responsible for assembling memory files before each interaction. In the minified bundle it’s exported as getMemoryFiles (minified to a3) alongside related functions like resetGetMemoryFilesCache (cc_), clearMemoryFileCaches (h0), getLargeMemoryFiles (ae), and getUltraClaudeMd (oe). You can find these yourself by running strings against the binary and searching for getMemoryFiles.

It loads files in a specific priority order. First it loads Anthropic’s managed policy rules. Then it loads project-level instructions (./CLAUDE.md or ./.claude/CLAUDE.md). After that come user-level instructions (~/.claude/CLAUDE.md), followed by .claude/rules/* files, then an upward directory walk checking each parent for CLAUDE.md variants. Finally it loads auto-memory from MEMORY.md. The official docs describe this ordering in detail, with more specific locations taking precedence over broader ones.

This upward directory walk is important because it means launching Claude Code from /Users/you/src/project versus /Users/you/src will produce a different instruction set, even though you’re working on the same project. The walk starts from wherever you happen to be.

The other important detail is where these instructions end up. They aren’t placed in the core system prompt. They’re injected as a <system-reminder> block within the user-facing context, where they sit alongside tool results, conversation history, and everything else in the context window. The string system-reminder appears 75 times in the v2.1.81 binary. The official docs confirm that CLAUDE.md content is delivered as a user message after the system prompt, not as part of the system prompt itself. They don’t get special treatment from the model. They compete for attention like any other content.

Budget

There’s a hard character budget governing how much memory content can be loaded. In the binary, these are defined as module-level constants:

// from the minified bundle (v2.1.81)
var cA = "MEMORY.md";
var Iz = 200;                   // MEMORY.md line cap
var fB = 40000;                 // MAX_MEMORY_CHARACTER_COUNT
var i$_ = 3000;                 // MAX_ULTRAMEMORY_CHARACTER_COUNT

The export map confirms the mapping: MAX_MEMORY_CHARACTER_COUNT: () => fB and MAX_ULTRAMEMORY_CHARACTER_COUNT: () => i$_. There’s also a getLargeMemoryFiles function (ae) that filters for files exceeding fB characters.

When the combined memory files exceed 40,000 characters, later-loaded files are silently truncated or dropped. There’s no warning in the UI. The official docs note that shorter files produce better adherence but don’t explicitly document the hard character limit. The rules simply stop being included.

Let me explain why this matters with a concrete example. A typical global ~/.claude/CLAUDE.md might be around 19,000 characters. A project-level .claude/CLAUDE.md adds another 18,000. A root-level CLAUDE.md might contribute another 10,000 to 25,000 on top of that. That’s somewhere between 47,000 and 62,000 characters of content against a 40,000 character budget. Everything past the limit gets cut. Since auto-memory loads last in the priority order, it’s the first thing to disappear.

Compaction

Claude Code has auto-compaction enabled by default. When a conversation’s token usage crosses a threshold, it triggers a lossy summarization of earlier messages. The model itself performs this summarization.

The compaction threshold isn’t simply “at 200K tokens.” The binary calculates an effective window by subtracting a reserved token buffer (S_R = 20000 tokens) from the model’s context window, then subtracts a further compaction buffer (oCq = 13000 tokens). The CLAUDE_AUTOCOMPACT_PCT_OVERRIDE environment variable lets you override this as a percentage. There’s also a CLAUDE_CODE_AUTO_COMPACT_WINDOW variable that caps the effective window size. Several community analyses have documented these thresholds in detail.

This is where indeterminism enters directly. The summarization is non-deterministic. Different runs will produce different summaries of the same conversation. Your CLAUDE.md rules that were part of earlier context get replaced by whatever the model decided was important enough to keep. Some rules get judged less relevant and dropped.

The system-reminder tags attempt to re-inject critical instructions after compaction. Notably, the resetGetMemoryFilesCache function is called during compaction (the Wl function calls cc_("compact") which resets the cache), so memory files are re-read from disk. But the re-injected content is still competing with the compacted summary for the model’s attention. If the summary de-emphasized a rule, the model may continue to under-weight it even after re-injection.

Let me walk through what this looks like in practice.

You start a session. Claude Code loads your CLAUDE.md with all your rules.
You work through ten or fifteen interactions. Context grows.
Auto-compaction triggers and the model summarizes the earlier part of the conversation.
Your rules were in that earlier part. The summary keeps some of them and drops others.
Claude now behaves as though certain rules don’t exist, because from its perspective the summary de-emphasized them.
You correct Claude, it apologizes, follows the rule for a few more turns, and then another compaction happens.

This cycle is the most common source of the “Claude forgot my rules” experience. Does restarting the session fix it? Yes, because a fresh session reloads all the memory files from disk and starts with a clean context. But the cycle will repeat once context grows again.

Attention

Even when all your instructions fit within the budget and survive compaction, there’s another factor. The model’s attention to instructions isn’t uniform across the context window.

Content in the middle of a long context receives less attention than content at the beginning or end. This is a well-studied property of transformer-based models, documented in the paper “Lost in the Middle: How Language Models Use Long Contexts” (Liu et al., published in TACL 2024). The researchers found a U-shaped performance curve where models attend more strongly to the beginning and end of context. Three hundred lines of rules can’t all be attended to with equal strength. And recent conversation turns pull stronger attention than instructions that were loaded at session start and have been sitting in context for a while.

This means a rule can be present in context, within budget, not compacted away, and still be occasionally violated. The model attended to something else more strongly during that particular generation. It isn’t that the rule was lost. It’s that the model didn’t weight it highly enough in that moment.

Edge Cases

There are a few more mechanisms worth mentioning.

The upward directory walk is anchored to whatever your working directory was when the session started. If you work in git worktrees at sibling paths, the walk paths will differ from the main repo checkout. Same user, same project, different starting directory, different instructions loaded.

The getMemoryFiles function uses memoization with an explicit cache (a3.cache). The clearMemoryFileCaches function (h0) calls a3.cache?.clear?.() to invalidate it. This cache is cleared on compaction and session reset, but not on file edits. If you edit your CLAUDE.md during an active session, the changes won’t be picked up until the next compaction or a new session. You can force a re-read by referencing @CLAUDE.md in a prompt.

When Claude Code spawns sub-agents via the Task tool, each one gets a completely fresh context with no conversation history from the parent. The official sub-agents docs confirm this: the only channel from parent to sub-agent is the Task tool’s prompt string. CLAUDE.md is re-loaded independently, but any reinforcement of rules that happened through conversation is lost. Let’s say you spent several turns correcting Claude’s behavior on a particular rule. Those corrections don’t carry over to sub-agents. They start from scratch.

Mitigations

The common instinct when Claude stops following a rule is to add more rules, repeat them in multiple places, make them louder. This is counterproductive. It consumes more of the 40,000 character budget and increases the volume of content that the model needs to attend to.

The effective approach works in the other direction. I got my global CLAUDE.md under 5,000 characters by moving workflow documentation, agent tables, and reference material into agent definitions or .claude/rules/* files. The global file should contain only hard behavioral constraints.

If your project .claude/CLAUDE.md repeats content from the global file, delete the duplicates. The global file is already loaded. Duplicating it wastes budget on identical content that the loader doesn’t deduplicate.

Rules files support a paths: frontmatter that restricts loading to sessions where matching files are relevant:

---
paths:
  - "**/terraform/**"
  - "**/infra-**"
---
# Infrastructure Security Rules
...

This keeps infrastructure rules out of the budget when you’re working on frontend code. The binary confirms this: the processConditionedMdRules function (dc_) filters rules files by matching their globs against the current file path using an ignore-pattern matcher.

Put critical rules in the first 50 lines of any memory file, since attention is strongest at the beginning and end of context. Rules at line 250 of a 400-line file will receive less attention.

Rather than waiting for auto-compaction to trigger and make its own choices about what to keep, you can use /compact manually when context is growing. This gives you more control over what survives summarization. And MEMORY.md should be treated as a sparse index with pointers to topic-specific files, not a place for detailed content, since the 200-line limit is hard-coded.

After applying these changes to my own setup, the total memory footprint dropped from somewhere between 47,000 and 62,000 characters to roughly 6,500. That’s over 32,000 characters of headroom recovered for tool results, conversation context, and long sessions.

In other words the fix isn’t more rules. It’s fewer, better-placed rules that fit comfortably within the budget and survive compaction.

References

Claude Code Official Memory Documentation — Anthropic’s official docs on CLAUDE.md loading, precedence, and best practices
Claude Code Sub-agents Documentation — How sub-agents receive context
Giuseppe Gurgone — Claude Code’s Experimental Memory System — Independent analysis of the minified bundle, confirming the MEMORY.md 200-line constant
Liu et al. — “Lost in the Middle: How Language Models Use Long Contexts” (TACL 2024) — Research on U-shaped attention in long-context transformers
Claude Code Context Buffer Management — Community documentation on compaction thresholds and token budgets
GitHub Issue #4464 — Discussion of system-reminder content injection mechanism
GitHub Issue #13099 — Discussion of character budget limits for skills/memory