AI Memory Systems: Giving AI Long-Term Memory

You have probably had this experience: you tell AI "I prefer concise answers," and the next conversation it gives you another wall of text. Or you tell it "this project uses pnpm, not npm," and later it suggests running npm install.

The AI is not being lazy — it genuinely has no memory.

Large language models are stateless by design. Each API request is independent and does not remember the previous one. This is not a big problem in casual chat, but in scenarios like development assistance, project management, and long-term collaboration, it is crippling.

This article is about: how do you give AI long-term memory?

Why AI Needs a Memory System

Let us distinguish two types of "memory":

Context memory — content mentioned in the current conversation that the AI can remember. This is short-term memory within the model's context window. Once the conversation ends, it is gone.

Long-term memory — information that remains effective across sessions, days, and projects. Things like user preferences, project context, team decisions, and lessons learned from past mistakes.

Without long-term memory, AI starts every conversation from scratch. You tell it your tech stack — it forgets. You correct its mistakes — it makes them again. You explain the project background — it asks again.

When I use AI for development assistance, this is the most frustrating part. I explained the architecture decisions just three days ago, and today it suggests the opposite. Not because it does not understand — because it does not remember.

The core problem a memory system solves is: giving AI continuous understanding of users and projects across sessions.

This is not as simple as "saving conversations to files." A truly usable memory system needs to solve four problems:

What to store — which information is worth remembering, and which is not
How to store it — what structure enables efficient retrieval
How to find it — when a user asks a question, how to quickly locate relevant memories
How to update it — memories go stale; how to automatically organize and phase them out

Let us break these down one by one.

What to Store: Four Precise Memory Types

The first thing a memory system does is define boundaries — what should be saved, and what should not.

This boundary is critical. Save everything, and the memory system becomes a junk pile with degraded retrieval accuracy that can mislead the AI. Save nothing, and why have a memory system at all.

Claude Code's memory system defines four precise memory types, each with explicit rules for saving and use. This classification is based on one core principle: only save information that cannot be derived from the current project state.

Code patterns, architecture, Git history, file structure — these can be obtained by reading files or running git log. They do not need memory. The memory system saves information that cannot be directly read from code.

user (User Memory)

Records who the user is.

name: User Role
type: user
source: ccb

Content: User's role, goals, responsibilities, knowledge level
When to save: Any time you learn details about the user
How to use: Adjust answer depth and style based on user background
Examples:
- "User is a data scientist focused on observability"
- "User has 10 years of Go experience, first time working with React"

The value of user memory is that it determines how AI talks to you. The explanation depth, terminology, and example complexity should differ when talking to a senior engineer versus a beginner.

feedback (Feedback Memory)

Records what the user said — including both corrections and confirmations.

name: Testing Standards
type: feedback
source: ccb
---

Integration tests must use real databases, do not use mocks.

**Why:** Last quarter, mock tests passed but production migration failed, because the mock masked real migration issues.

**How to apply:** All database-related integration tests use real databases.

Content: Guidance and feedback from the user
When to save: When the user says "do not do it this way" or "this is good"
How to use: Ensure the same guidance does not need to be repeated

Feedback memory has a special requirement: you must record Why (the reason) and How to apply (how to apply it). Why matters immensely — rules without reasons are rote memorization, but with reasons, the system can make correct judgments in edge cases.

Another easily overlooked point: record both successes and failures. If you only record corrections, the AI becomes overly cautious and afraid to do anything. If you only record confirmations, it repeats the same mistakes.

project (Project Memory)

Records what is happening in the project.

name: Merge Freeze
type: project
source: ccb
---

Merge freeze starts from 2026-03-05 for mobile release preparation.

**Why:** The mobile team is cutting a release branch and needs a stable main branch.

**How to apply:** Flag all non-critical PR work after 2026-03-05.

Content: Project context, goals, deadlines, decision rationale
When to save: When you learn "who is doing what, why, and when"
How to use: Understand the background and motivation behind user requests

Project memory has a common pitfall: relative dates must be converted to absolute dates. "Thursday," "next week," "next month" — these relative times become meaningless over time. "Thursday" → "2026-03-05" to remain valid long-term.

reference (Reference Memory)

Records where external resources are.

name: Pipeline Bug Tracking
type: reference
source: ccb

Content: Pointers to external resources (Linear projects, Slack channels, Grafana dashboards, etc.)
When to save: When you learn the location and purpose of an external resource
How to use: Quickly locate external systems when the user references them

Reference memory stores only pointers, not content. The content lives in external systems; the memory system just remembers "where to look."

What Should Not Be Stored

The system explicitly defines content that should not be saved:

Code patterns, conventions, architecture, file paths — can be derived from project state
Git history, recent changes — git log / git blame are more authoritative
Debugging solutions — fixes are in the code; context is in the commit message
Content already in project documentation
Temporary task details: in-progress work, temporary status, current conversation context

Even if the user explicitly asks to save these, the system asks "what is special or non-obvious enough to be worth keeping?"

This design philosophy is clever: the memory system stores only incremental knowledge, not a copy of project knowledge. This ensures the memory system is always additive, not a redundant duplicate.

How to Store: Dual-Engine Architecture

Once we have defined what to store, the next question is how to store it.

The memory system has two complementary engines, each with its own focus: CCB (Claude Code Base) and MAGMA.

CCB — File-Based Structured Storage

CCB is a structured memory system based on Markdown files. Each memory is an independent .md file with YAML frontmatter defining its metadata.

~/.claude/projects/<project-slug>/memory/
├── MEMORY.md              ← Index file (auto-generated)
├── user_role.md           ← User memory
├── user_preferences.md    ← User memory
├── feedback_testing.md    ← Feedback memory
├── feedback_naming.md     ← Feedback memory
├── project_merge-freeze.md ← Project memory
├── reference_linear.md    ← Reference memory
└── ...

Each memory file has a fixed structure: YAML frontmatter (type, description, source) + Markdown body.

MEMORY.md Index Format:

# Memory Index

## User Memories
- [User Role](user_role.md) — Data scientist, focused on observability
- [Preferences](user_preferences.md) — Prefers concise answers, uses pnpm

## Feedback Memories
- [Testing Standards](feedback_testing.md) — Integration tests use real databases
- [Naming Style](feedback_naming.md) — Function names start with verbs

## Project Memories
- [Merge Freeze](project_merge-freeze.md) — Starting 2026-03-05

## Reference Memories
- [Linear Project](reference_linear.md) — INGEST project tracks pipeline bugs

CCB has several interesting design details:

MEMORY.md has a hard limit: 200 lines / 25KB. Exceeding it triggers truncation and a warning. This limit prevents the index file from growing uncontrollably.

The index is auto-generated: Not manually maintained, but auto-generated through the Dream mechanism. This avoids the "forgot to update the index" problem.

Memories are organized by semantic topic, not chronological order. This mirrors human memory — you do not recall things in chronological order, but by topic association.

MAGMA — Vector-Based Unstructured Storage

MAGMA is an unstructured knowledge engine based on LanceDB + Obsidian.

If CCB is like a folder — precise, structured, editable — then MAGMA is like a brain — fuzzy, semantic, associative.

MAGMA uses a five-layer memory architecture:

Layer	Name	Storage	Purpose
L1	Ephemeral Memory	In-memory	Current session context
L2	Project Memory	LanceDB	Project-related knowledge and context
L3	User Memory	LanceDB	User preferences and habits
L4	Feedback Memory	LanceDB	User feedback and guidance
L5	Reference Memory	LanceDB + Obsidian	External resource references

L1 is pure in-memory, with the fastest access but smallest capacity. L2-L5 are stored in a vector database, with larger capacity but retrieval requiring computation.

MAGMA's core capability is semantic vector search. You do not need to remember exact keywords — close enough in meaning will find it. For example, searching for "code style preferences" can find the memory "function names start with verbs" — even though the two phrases are completely different.

Why Two Engines Are Needed

CCB and MAGMA solve different problems:

Dimension	CCB	MAGMA
Data type	Structured memories	Unstructured knowledge
Storage method	Markdown files	Vector database + notes
Retrieval method	Keyword matching	Semantic vector search
Advantages	Precise, explainable, easy to edit	Semantic understanding, fuzzy matching
Disadvantages	No semantic search	Requires additional infrastructure
Typical scenario	"User prefers pnpm"	"What is the architecture of this project"

CCB is suited for precise facts — "user's name is Alice," "project uses pnpm." MAGMA is suited for fuzzy concepts — "the overall architecture of this project," "the user's working style."

The two engines complement each other; neither is dispensable. With only CCB, the AI can only do exact matching and cannot find memories that are semantically related but phrased differently. With only MAGMA, the AI's memories lose precision and editability, and require additional vector database infrastructure.

The Memory Lifecycle

Storing memories is not the end of the story. A complete memory system needs to manage the full lifecycle:

Create → User says something, the system decides "is this worth remembering?" → Yes → Save as one of the four types

Retrieve → User asks a question, the system analyzes intent → Search in CCB and/or MAGMA → Return relevant memories

Consolidate → Dream mechanism runs periodically, merging duplicate memories, deduplicating, updating indices

Phase out → Outdated memories are cleaned up; effective memories are retained

These four stages are interlinked. A failure in any one of them breaks the chain. Storing without finding is pointless. Storing without organizing leads to memory bloat.

Design Philosophy

Reviewing the entire memory system, several design philosophies stand out:

1. Only store non-derivable information. This is the cornerstone of the entire system. Information obtainable from code, documentation, or Git history does not need memory. The memory system stores only incremental knowledge.

2. Precise classification, not a catch-all bucket. Four memory types, each with explicit rules. This is not over-engineering — it avoids ambiguity. Ambiguous classification leads to ambiguous usage.

3. Dual-engine complementarity. Structured + unstructured, precise + semantic, file + vector. There is no silver bullet — only the right combination of tools.

4. Automatic consolidation. The Dream mechanism makes memory systems maintenance-free. Like human sleep — accumulate during the day, consolidate at night, wake up with clearer memories.

5. Hard constraints. MEMORY.md 200-line limit, token budget control, lock mechanisms for concurrency. Good systems do not rely on discipline — they rely on engineering constraints.

This installment covered the "why" and "what" of memory systems. Next, we dive into the internal implementation of the dual engines — how CCB's reader and consolidator work, how MAGMA's vector search and knowledge graph are built, and how they collaborate.

Series:

Next: Dual-Engine Architecture: Files vs Vector Databases

AI Memory Systems: Giving AI Long-Term Memory

AI Memory Systems: Giving AI Long-Term Memory

Why AI Needs a Memory System

What to Store: Four Precise Memory Types

user (User Memory)

feedback (Feedback Memory)

project (Project Memory)

reference (Reference Memory)

What Should Not Be Stored

How to Store: Dual-Engine Architecture

CCB — File-Based Structured Storage

MAGMA — Vector-Based Unstructured Storage

Why Two Engines Are Needed

The Memory Lifecycle

Design Philosophy

Related Articles

面试官问你：如何解决大模型的上下文长度限制——标准回答框架

大模型上下文长度限制完全指南：从原理到工程落地的 4 种方案

面试官问你：RAG 如何处理 PDF——别再说转文本切片了