Self-evolution: Automatic organization and consolidation of AI memories

Self-evolution: Automatic organization and consolidation of AI memories

The first three issues talked about "why","what to save","how to save" and "how to find" about the memory system. This issue talks about a more interesting question: How does the memory system evolve itself?

People forget, but the human brain automatically organizes memories during sleep-consolidating short-term memories into growth-up memories and integrating scattered information into a knowledge network. Claude Code's memory system has a similar ability, called Dream.

1. Dream: Memory consolidation of sleep time

Dream is an automatic organization mechanism for the memory system. It is inspired by the memory consolidation process during human sleep:

Accumulate scattered memory fragments during the day, and the brain automatically organizes them at night-merge and repeat, strengthen associations, and eliminate useless information. When I woke up the next day, my memory became clearer.

1.1 trigger condition

Dream is not always running, it has two gating conditions:

Time gating:

  • There is a minimum of 24 hours between dreams-it is enough to tidy up once a day, as resources are wasted too often
  • The maximum interval is 168 hours (7 days)-if you don't organize it for more than a week, your memory will expand and lose control

Session gating:

  • Dream took at least 5 sessions to trigger-one session produced too little information to be worth sorting out
  • Scan every 10 minutes-check if trigger conditions are met

Both conditions must be met simultaneously. The time is up but the session is not enough, so it will not be triggered. The session is enough but the time is not up and it will not be triggered.

This design is very clever. Tending too frequently wastes calculations, and sorting too little deteriorates memory quality. Having at least five sessions once a day is the optimal balance point in experience.

Locking mechanism:

//Lock mechanism prevents concurrent dreams
const lock = await this.lockManager.acquire({
  key: 'dream-consolidation',
  ttl: 3600000, //Lock expiration time 1 hour
  retry: {
    attempts: 3,
    backoff: 1000 //Try again in 1 second
  }
})

if (! lock) {
  //Another Dream process is running, skip
  return
}

File locks prevent multiple Dream processes from running simultaneously. Lock holder expires in 1 hour-if the Dream process crashes, the lock is automatically released and is not permanently blocked.

1.2 Dream Process

Trigger Dream
    │
    ▼
Get organization lock
    │
    ▼
Read all CCB memory files
    │
    ▼
Merge duplicate memories (based on content hashing)
    │
    ▼
De-duplication (keep the latest version)
    │
    ▼
Incremental collation (only handles changes)
    │
    ▼
Update the MEMORY.md index
    │
    ▼
update metadata file
    │
    ▼
Release the lock and record the sorting time

The whole process is divided into six steps: obtain the lock → read the memory → merge the duplication → remove the duplication → increment sorting → update the index.

Every step is idempotent-if it fails halfway, Dream will start over again next time without inconsistency.

1.3 incremental collation

Not every time Dream is rewritten in full. The core logic of incremental consolidation:

async processIncremental(memories) {
  const lastConsolidatedAt = await this.getLastConsolidatedAt()

  let skipped = 0, updated = 0, added = 0

  for (const memory of memories) {
    //Check if this memory has changed since the last time it was compiled
    const hasChanged = await this.hasMemoryChanged(memory, lastConsolidatedAt)

    if (! hasChanged) {
      skipped++ //No change, skip
      continue
    }

    if (existsSync(metadataPath)) {
      updated++ //existing but changed, updated
    } else {
      added++    //Added, created
    }

    await this.processMemory(memory)
  }

  console.log(`Dream completed: ${skipped},${updated},${added}`)
}

Incremental consolidation is very efficient. A project may have hundreds of memories, but each time a Dream may have only a few new or changed memories. Skip the parts that have not changed and focus your calculations on the memories that really need to be sorted out.

This is the same idea as 'git commit'-only submit changes, and not repeatedly submit files without changes.

2. Configuration system

All behavior of the memory system can be controlled through a configuration file.

2.1 core configuration

{
  "cache": {
    "enabled": true,
    "ttlHours": 24,
    "maxSize": 10000,
    "semanticThreshold": 0.9
  },
  "obsidian": {
    "mode": "auto",
    "cli": { "enabled": true, "timeout": 5000 },
    "direct": { "atomicWrites": true },
    "fallback": { "enabled": true, "maxRetries": 2 }
  },
  "autoDream": {
    "enabled": true,
    "timeGate": {
      "minHoursBetweenDream": 24,
      "maxHoursBetweenDream": 168
    },
    "sessionGate": {
      "minSessionsForDream": 5,
      "scanIntervalMs": 600000
    }
  },
  "retrieval": {
    "defaultStrategy": "both-priority-ccb",
    "intentAnalysis": {
      "enableLLM": true,
      "enableRules": true,
      "llmConfidenceThreshold": 0.7,
      "ruleConfidenceThreshold": 0.5
    },
    "progressiveSearch": {
      "enabled": true,
      "stages": [
        { "name": "ccb-fast", "limit": 5, "timeout": 100 },
        { "name": "magma-extend", "limit": 10, "timeout": 300 },
        { "name": "full-fusion", "enableFusion": true, "timeout": 800 }
      ]
    }
  },
  "fusion": {
    "maxTokens": 4000,
    "enableSemanticDeduplication": true,
    "semanticSimilarityThreshold": 0.85,
    "tokenBudget": { "dynamic": true, "maxTokensPerMemory": 500 }
  }
}

The configuration is divided into five parts:

  • cache: Cache configuration for intent analysis results. TTL 24 hours, up to 10000 items cached
  • obsidian: Obsidian integration configuration. Supports CLI and direct modes with retry and downgrade
  • autoDream: The trigger condition for Dream auto-sorting. The parameters for time gating and session gating are here
  • retrieve: retrieve configuration. Strategy parameters for intention analysis and progressive retrieval
  • fusion: Fusion configuration. Token budget, deduplication threshold, sorting parameters

2.2 Dual LLM configuration

The memory system uses two different LLMs, each with different responsibilities:

model use characteristics
Silicon based flow (Qwen3-8B) FusionVerifier No concurrency restrictions, fast response (3-5s)
Intelligent AI (glm-4.7-flash) intention analyzing Concurrent =1, current limiting protection is required (2s interval)

Why use two models? Because intention analysis and fusion verification are two different tasks, the requirements for the model are different:

  • Intention analysis requires accurate understanding of semantics and the use of stronger models (Intelligent AI)
  • FusionVerifier needs to quickly verify results with a faster model (silicon-based flow)

Moreover, the two models have their own limitations-Intelligent AI only allows concurrency 1, and silicon-based flow has no concurrency restrictions. Selecting the appropriate model based on the call frequency of each task is a typical approach to cost optimization.

2.3 Environmental differential configuration

{
  "environments": {
    "development": {
      "autoDream.timeGate.minHoursBetweenDream": 1,
      "retrieval.intentAnalysis.cacheEnabled": false,
      "retrieval.progressiveSearch.enabled": false
    },
    "production": {
      "autoDream.timeGate.minHoursBetweenDream": 48,
      "retrieval.intentAnalysis.cacheEnabled": true,
      "retrieval.progressiveSearch.enabled": true,
      "metrics.enabled": true
    }
  }
}

The configurations of the development environment and the production environment differ greatly:

  • Dream frequency: once every hour in the development environment (convenient debugging), once every 48 hours in the production environment (saving costs)
  • Caching: Caching is disabled in the development environment (so you can see the latest results), and caching is enabled in the production environment
  • Progressive search: The development environment is disabled (go directly to the slowest but most comprehensive way to facilitate debugging), and the production environment is enabled

3. Shared Memory System: Code Practice

'shared-memory-system' is a memory management library independent of the Claude Code main project and provides interoperability between CCB and MAGMA.

3.1 project structure

shared-memory-system/
├── src/
│   ├── CCBReader.ts           # CCB Memory Reader
│   ├── MagmaAdapter.ts        # MAGMA Memory Adapter
│   ├── crossSystem.ts         #Cross-system interoperability
│   ├── index.ts               #Main export file
│   │
│   ├── agents/                # Agent System
**** │   │   ├── ExtractorAgent.ts  # 提取者 Agent
│   │   └── RefinerAgent.ts    #Refining Agent
│   │
│   ├── autoDream/             #Automatic memory sorting
│   │   ├── CCBConsolidator.ts
│   │   ├── MAGMAConsolidator.ts
│   │   ├── CCBToMAGMARefiner.ts
│   │   └── ConsolidationLock.ts
│   │
│   ├── consistency/           #Consistency check
│   │   ├── ConsistencyChecker.ts
│   │   └── ConflictArbiter.ts
│   │
│   ├── routing/               #Intelligent Routing
│   │   ├── IntelligentRouter.ts
│   │   └── IntentClassifier.ts
│   │
│   ├── retrieval/             #Search and Fusion
│   │   ├── CrossSystemRetriever.ts
│   │   └── MemoryFusion.ts
│   │
│   ├── versioning/            #Version Control
│   │   ├── MemoryVersioning.ts
│   │   └── ExpirationHandler.ts
│   │
│   └── utils/                 #Tool functions
│       ├── debug.ts
│       ├── errors.ts
│       ├── frontmatterParser.ts
│       └── readFileInRange.ts

The project structure is clear and divided into functional modules:

  • agents/: Agent system, responsible for extracting and refining memories
  • autoDream/: Dream automates, including CCB and MAGMA's respective finishers
  • Consistency/: Consistency check to ensure that the memories of CCB and MAGMA do not conflict
  • routing/: Routing system, intent classification and decision making
  • retrieval/: retrieval and fusion
  • versioning/: version control and expiration processing

3.2 core API

import { CCBReader, MagmaAdapter, CrossSystemInterop } from 'shared-memory-system'

// 1. Read CCB memory
const ccbReader = new CCBReader('path/to/ccb/memory')
const ccbDir = await ccbReader.readMemoryDirectory()
console.log(`Found ${ccbDir.memoryFiles.length} memory files`)

// 2. Read MAGMA memory
const magmaAdapter = new MagmaAdapter({
  lancedbPath: 'path/to/lancedb',
  obsidianPath: 'path/to/obsidian'
})
await magmaAdapter.initialize()

// 3. Cross-system search
const interop = new CrossSystemInterop('path/to/ccb/memory', magmaConfig)
await interop.initialize()
const results = await interop.retrieve({
  query: 'user preferences',
  limit: 10,
  searchCCB: true,
  searchMAGMA: true
})

The three-layer APIs correspond to three levels:

  1. CCBReader--directly reads CCB memory, suitable for accurate query
  2. MagmaAdapter--Direct operation of MAGMA vector database, suitable for semantic search
  3. CrossSystemInterop-Cross-system interoperability encapsulates the entire process of intent analysis, routing, retrieval, and fusion

3.3 consistency check

Both CCB and MAGMA engines may store different versions of the same memory. The consistency checker is responsible for discovering and resolving conflicts:

//consistency check
async checkConsistency(): Promise<ConsistencyReport> {
  const ccbMemories = await this.ccbReader.readAll()
  const magmaMemories = await this.magmaAdapter.readAll()

  const conflicts: Conflict[] = []

  for (const ccbMemory of ccbMemories) {
    //Find semantically similar memories in MAGMA
    const similar = await this.magmaAdapter.findSimilar(ccbMemory)

    for (const magmaMemory of similar) {
      if (this.isConflicting(ccbMemory, magmaMemory)) {
        conflicts.push({
          ccb: ccbMemory,
          magma: magmaMemory,
          type: 'content-mismatch',
          severity: 'high'
        })
      }
    }
  }

  return { conflicts, resolved: [] }
}

//Conflict adjudication
async resolveConflict(conflict: Conflict) {
  //Strategy 1: CCB first (structured memory is more accurate)
  if (conflict.ccb.type === 'user' || conflict.ccb.type === 'feedback') {
    await this.magmaAdapter.update(conflict.ccb)
    return 'ccb-wins'
  }

  //Strategy 2: MAGMA takes precedence (more comprehensive unstructured knowledge)
  if (conflict.magma.layer === 'L5') {
    await this.ccbReader.update(conflict.magma)
    return 'magma-wins'
  }

  //Strategy 3: Keep both and mark them for manual processing
  return 'manual-review'
}

There are three strategies for conflict adjudication:

  • CCB first: User memory and feedback memory are subject to CCB (more accurate and more explainable)
  • MAGMA Priority: Reference memory is based on MAGMA (more comprehensive and updated)
  • Manual processing: Conflicts that cannot be automatically resolved are marked as pending manual processing

3.4 version control

Each memory has version information:

interface MemoryVersion {
  id: string
  version: number           //Version number,+1 for each update
  createdAt: Date
  updatedAt: Date
  previousVersions: string[] //ID list of historical versions
  changeLog: string          //Description of this change
}

//Version rollback
async rollback(memoryId: string, targetVersion: number) {
  const history = await this.versionStore.getHistory(memoryId)
  const target = history.find(v => v.version === targetVersion)

  if (! target) {
    throw new Error(`Version ${targetVersion} does not exist`)
  }

  await this.ccbReader.update(target.content)
  await this.magmaAdapter.update(target.content)
}

Version control ensures traceability of memory. Each update records the version number and change description, and supports rolling back to any historical version.

3.5 Expired processing

Not all memories are always valid. The project is over, the user has changed roles, the tools have been updated-outdated memories are more dangerous than no memories:

//Expired detection
async checkExpiration() {
  const allMemories = await this.readAll()

  for (const memory of allMemories) {
    //Type-based expiration rules
    switch (memory.type) {
      case 'project':
        //Project memory: Check if the project is still active
        if (await this.isProjectInactive(memory)) {
          await this.archive(memory, 'project-inactive')
        }
        break

      case 'reference':
        //Reference memory: Check whether external resources are still reachable
        if (! await this.isResourceReachable(memory)) {
          await this.markBroken(memory)
        }
        break

      case 'feedback':
        //Feedback memory: does not expire, but reduces the weight
        await this.decayPriority(memory, factor = 0.95)
        break
    }
  }
}

Expiration policies vary by type:

  • Project memory: Check if the project is still active and file if it is not active
  • Reference memory: Check whether external resources are still reachable. If they are not reachable, they will be marked as a broken link
  • Feedback memory: does not expire (what the user says is always valuable), but reduces the weight over time

4. Agent system

There are two Agents within the memory system responsible for extracting and refining memories:

4.1 ExtractorAgent

Responsible for extracting memorable information from the conversation:

class ExtractorAgent {
  async extract(conversation: Conversation): Promise<Memory[]> {
    const prompt = `
      Analyze the following conversations to extract information worth remembering.

      Rules:
      1. Extract only information that cannot be inferred from code/file
      2. Classified into four types: user / feedback / project / reference
      3. If there is no information in the conversation worth remembering, return an empty array

      Dialogue: ${conversation.text}
    `

    const result = await this.llm.chat(prompt)
    return result.memories
  }
}

The extraction rules are strict: only information that cannot be inferred is extracted. The dialogue said,"The architecture of this project is microservices"-this can be inferred from the code and not extracted. The dialogue said,"Don't use mock for integration testing"-this is non-deductible and is extracted as feedback.

4.2 RefinerAgent

Responsible for refining the extracted original memories into high-quality memory items:

class RefinerAgent {
  async refine(rawMemory: RawMemory): Promise<Memory> {
    const prompt = `
      Refine the following original information into high-quality memory entries.

      Requirements:
      1. name: Short description (no more than 50 words)
      2. description: Summary in one sentence
      3. content: Contains three parts: What, Why, and How to apply
      4. Relative date to absolute date
      5. Remove redundancy and colloquial expressions

      Original information: ${raw Memory.text}
    `

    return await this.llm.chat(prompt)
  }
}

Key actions for refining include:

  • Structure-organize colloquial conversations into a standard format of What / Why / How to apply
  • Absolute date-"Next week"→"2026-06-12"
  • De-redundancy-Remove duplicate and irrelevant information
  • Refining-retain core information and remove nonsense

5. Design philosophy

Looking back at the entire memory evolution system, there are several design philosophies worth summarizing:

  1. Automatic rather than manual. Dream automatically organizes and does not require manual user triggering. Good infrastructure is transparent and users should not need to know that it exists.

  2. Increment rather than total quantity. Incremental sorting, incremental updating, incremental checking. Only deal with the changes at a time and concentrate computing resources where they are really needed.

  3. Layers rather than one pot. Five-layer memory architecture, three-layer progressive retrieval, and two-layer duplication removal. Each layer solves a specific problem, and the responsibilities between layers are clear.

  4. Constraints drive mass. 200-line index upper limit, Token budget, locking mechanism. A good system does not rely on user awareness, but relies on engineering constraints to ensure quality.

  5. Observable and rolled back. Version control, change logging, conflict arbitration. Memory systems must be traceable-knowing where a memory came from, how it changed, and why it was deleted.


This issue completed the last article in the four-article series on memory systems. Review the entire series:

  1. AI memory system: Let AI have long-term memory-Why memory is needed, four memory types, and an overview of dual-engine architecture
  2. Dual-engine architecture: file vs. vector database-internal implementation, knowledge mapping, and cross-system collaboration of CCB and MAGMA
  3. Intention Driven: How AI understands what you are looking for-intent classification, routing decisions, progressive retrieval, deduplication and sorting
  4. Self-evolution: Automatic organization and consolidation of AI memories-Dream mechanism, configuration system, code actual combat

The memory system is the "brain" of the AI assistant. AI without a memory system, no matter how smart it is, it is just a partner for the goldfish. With a memory system, AI can truly become a partner for long-term collaboration.

Series: