Dual-Engine Architecture: Files vs Vector Databases

The previous installment covered the "why" and "what" of memory systems — four precise memory types and why dual engines are needed.

This installment takes a deep dive into the internal implementation of both engines: how CCB stores memories as files, how MAGMA turns memories into vectors, and how they collaborate.

CCB: Storing Memories as Markdown Files

CCB (Claude Code Base) is the "precise memory" engine of the memory system. Its core idea is simple: each memory is a .md file.

Simple does not mean sloppy. CCB's design contains many carefully considered details.

File Structure

Each memory file consists of two parts:

YAML frontmatter — structured metadata:

---
name: Testing Standards
type: feedback
source: ccb
---

Markdown body — unstructured content:

Integration tests must use real databases, do not use mocks.

**Why:** Last quarter mock tests passed but production migration failed.

**How to apply:** All database-related integration tests use real databases.

This "frontmatter + body" structure is clever: frontmatter is easy for programs to parse and filter, while the body is easy for humans to read and maintain. The same file is readable by both machines and people.

MEMORY.md Index

The CCB directory has a special MEMORY.md file that serves as the index for all memories:

# Memory Index

## User Memories
- [User Role](user_role.md) — Data scientist, focused on observability
- [Preferences](user_preferences.md) — Prefers concise answers, uses pnpm

## Feedback Memories
- [Testing Standards](feedback_testing.md) — Integration tests use real databases
- [Naming Style](feedback_naming.md) — Function names start with verbs

This index is auto-generated, not manually maintained. Each time Dream consolidates memories, the index is updated accordingly.

The index has two hard constraints:

200-line limit — exceeding it triggers truncation and a warning
Each entry no more than 150 characters — preventing any single index entry from growing too long

The 200-line limit is an interesting design. It prevents the memory system from growing indefinitely — if a project accumulates too many memories, it is a signal that consolidation is overdue. Hard constraints are more reliable than self-discipline.

CCB Reader

The CCB reader is responsible for reading and parsing memories from the memory directory:

// Read memory directory
async readMemoryDirectory(): Promise<CCBDirectory> {
  const memoryFiles = await this.readAllMemoryFiles()
  const index = await this.readMemoryIndex()

  return {
    memoryFiles,
    index,
    totalCount: memoryFiles.length,
    lastUpdated: index.lastUpdated
  }
}

// Filter by type
async readByType(type: MemoryType): Promise<Memory[]> {
  return this.memoryFiles.filter(m => m.type === type)
}

// Full-text search
async search(keyword: string): Promise<Memory[]> {
  return this.memoryFiles.filter(m =>
    m.name.includes(keyword) ||
    m.description.includes(keyword) ||
    m.content.includes(keyword)
  )
}

The reader's logic is straightforward: read files, parse frontmatter, return structured data. It supports filtering by type and keyword search.

CCB Consolidator

The CCB consolidator (CCBConsolidator) handles automatic memory organization:

// Incremental consolidation
async consolidate() {
  const lastConsolidatedAt = await this.getLastConsolidatedAt()

  for (const memory of this.memoryFiles) {
    const hasChanged = await this.hasMemoryChanged(memory, lastConsolidatedAt)

    if (!hasChanged) {
      continue  // Skip unchanged — this is the "incremental" part
    }

    // Merge duplicate memories (based on content hash)
    const duplicate = await this.findDuplicate(memory)
    if (duplicate) {
      await this.mergeMemories(duplicate, memory)
    }

    // Deduplicate (keep latest version)
    await this.deduplicate()
  }

  // Update index
  await this.updateIndex()
}

The key design principle is incremental consolidation — not a full rewrite every time, but processing only the changed parts. This follows the same philosophy as git commit: record only the deltas, do not reprocess unchanged content.

The merge logic is based on content hashing: if two memories have nearly identical content (close hash values), they are merged into one, keeping the more complete version.

MAGMA: Turning Memories into Vectors

MAGMA is the "semantic memory" engine of the memory system. If CCB is like a folder — precise and structured — then MAGMA is like a brain — fuzzy and associative.

Vector Storage

MAGMA's core storage is LanceDB, an open-source vector database. Memories are converted into high-dimensional vectors and stored in LanceDB:

// Memory embedding
async embedMemory(memory: Memory): Promise<Embedding> {
  const text = `${memory.name}\n${memory.description}\n${memory.content}`
  const embedding = await this.embeddingModel.embed(text)
  return embedding  // e.g., a 1536-dimensional float vector
}

// Store in LanceDB
async storeEmbedding(id: string, embedding: Embedding, metadata: MemoryMetadata) {
  await this.table.add([{
    id,
    vector: embedding,
    metadata: {
      type: metadata.type,
      name: metadata.name,
      description: metadata.description,
      createdAt: metadata.createdAt
    }
  }])
}

The process of converting text into vectors is called embedding. An embedding model maps semantically similar text to nearby locations in vector space. "Code style" and "naming conventions" are different words, but their vectors will be close.

Hybrid Retrieval

MAGMA does not rely solely on vector search — it uses a hybrid of vector search + keyword search:

// Hybrid retrieval
async search(query: string, options: SearchOptions): Promise<SearchResult[]> {
  // 1. Vector search (semantic matching)
  const embedding = await this.embeddingModel.embed(query)
  const vectorResults = await this.table.search(embedding)
    .limit(options.limit)
    .execute()

  // 2. Keyword search (exact matching)
  const keywordResults = await this.table.search(query)
    .limit(options.limit)
    .execute()

  // 3. Merge results and deduplicate
  return this.mergeAndDeduplicate(vectorResults, keywordResults)
}

Why hybrid? Because each approach has blind spots:

Vector search excels at semantic matching but may miss exact keyword matches
Keyword search excels at exact matching but cannot find content that is semantically related but phrased differently

Using both covers each other's weaknesses.

Five-Layer Memory Architecture

MAGMA's memory is organized into five layers, from ephemeral to persistent:

Layer	Name	Storage	Characteristics
L1	Ephemeral Memory	In-memory	Current session, fastest access, smallest capacity
L2	Project Memory	LanceDB	Project-related knowledge and context
L3	User Memory	LanceDB	User preferences and habits
L4	Feedback Memory	LanceDB	User feedback and guidance
L5	Reference Memory	LanceDB + Obsidian	External resource references, most persistent

L1 is a pure in-memory dictionary with nanosecond-level access but is lost on shutdown. L2-L5 are stored in a vector database with large capacity but retrieval requiring computation.

The five-layer architecture follows the tiered storage design principle: frequently accessed data goes in the fast tier; less frequently accessed data goes in the capacity tier. The same principle as CPU L1/L2/L3 caches.

Obsidian Integration

A unique design choice in MAGMA is directly reading Obsidian notes as a knowledge source:

// Read Obsidian notes
async readObsidianNotes(vaultPath: string): Promise<Note[]> {
  const notes: Note[] = []

  for (const filePath of await this.listMarkdownFiles(vaultPath)) {
    const content = await readFile(filePath, 'utf-8')
    const { frontmatter, body } = this.parseFrontmatter(content)

    notes.push({
      title: frontmatter.title || basename(filePath, '.md'),
      content: body,
      tags: frontmatter.tags || [],
      links: this.extractWikiLinks(body),  // [[wikilink]] references
      path: filePath
    })
  }

  return notes
}

Obsidian notes provide two unique values:

Knowledge graph — Obsidian's [[wikilink]] syntax naturally forms a knowledge network that MAGMA can automatically extract as a graph
User-generated knowledge — many people use Obsidian for notes and wikis; MAGMA reads this content directly without additional import steps

Obsidian supports three reading modes:

direct mode: Direct file operations — fast and silent
cli mode: Uses Obsidian CLI — supports advanced features
auto mode: Automatically selects based on operation type

Knowledge Graph: Auto-Extracted from Obsidian

MAGMA's knowledge graph is not pre-built — it is automatically extracted from Obsidian notes.

Graph Construction

Obsidian Notes (WikiLinks)
    │
    ▼  Auto-extract
┌─────────────────────────┐
│  [[target note]] → edge  │
│  note title → node      │
│  note tags → node tags  │
└─────────────────────────┘
    │
    ▼
GraphData { nodes[], edges[] }

The extraction logic is straightforward:

// Extract graph from Obsidian notes
function extractGraph(notes: Note[]): GraphData {
  const nodes: GraphNode[] = []
  const edges: GraphEdge[] = []

  for (const note of notes) {
    // Note title → node
    nodes.push({
      id: note.title,
      label: note.title,
      tags: note.tags
    })

    // [[wikilink]] → edge
    for (const link of note.links) {
      edges.push({
        source: note.title,
        target: link.target,
        label: link.alias || link.target
      })
    }
  }

  return { nodes, edges }
}

No manual graph maintenance is needed — new notes are automatically incorporated. This is far more lightweight than traditional knowledge graph approaches (manual annotation, ETL pipelines).

Four Graph Query Types

The system defines four graph query categories that determine which graph dimensions to activate during intent routing:

Type	Meaning	Activation Scenarios
`concept`	Concept graph	Causal reasoning, hybrid queries
`entity`	Entity graph	Entity relationships, hybrid queries
`relationship`	Relationship graph	Causal reasoning, entity relationships, hybrid queries
`temporal`	Temporal graph	Timeline analysis

These four types are classification dimensions at query time, not layers of the graph itself. The underlying graph is a unified WikiLinks network, filtered and focused according to different dimensions of query intent.

For example, if a user asks "why does this bug keep appearing?" — this is causal reasoning, activating the concept and relationship dimensions to search for relevant concept definitions and causal relationship edges in the graph.

Cross-System Collaboration

CCB and MAGMA do not operate independently — they collaborate through a cross-system retriever.

Parallel Retrieval

When dual-engine retrieval is needed, both engines execute in parallel:

// Hybrid retrieval: parallel execution
async retrieveBoth(strategy, query) {
  const [ccbResult, magmaResult] = await Promise.all([
    this.retrieveCCB(query, ccbOptions),    // CCB in parallel
    this.retrieveMAGMA(query, magmaOptions), // MAGMA in parallel
  ])
  return this.mergeAndDeduplicate(ccbResult, magmaResult)
}

Promise.all ensures both engines execute concurrently, not serially. Whichever returns first gets used first; results are ultimately merged.

Result Fusion

The two engines return results in different formats and need to be unified:

// Unify result format
function unifyResults(ccbResults, magmaResults) {
  return [
    ...ccbResults.map(r => ({
      ...r,
      source: 'ccb',
      baseScore: r.relevance
    })),
    ...magmaResults.map(r => ({
      ...r,
      source: 'magma',
      baseScore: r.similarity
    }))
  ]
}

CCB results are relevance scores (0-1), MAGMA results are cosine similarity (0-1). After unification, they enter the same sorting pipeline.

Deduplication

The two engines may return duplicate memories (CCB found it via keywords, MAGMA found it via semantics — possibly the same memory).

Deduplication happens in two layers:

Exact deduplication — based on normalized key name + content hash:

function exactDuplicate(a, b) {
  const keyA = a.name.toLowerCase().replace(/[^\w]/g, '')
  const keyB = b.name.toLowerCase().replace(/[^\w]/g, '')
  if (keyA === keyB) return true

  const hashA = simpleHash(a.content)
  const hashB = simpleHash(b.content)
  return hashA === hashB
}

Semantic deduplication — based on text similarity:

function semanticDuplicate(a, b) {
  const jaccard = jaccardSimilarity(a.content, b.content)  // Set intersection/union
  const lcs = lCSSimilarity(a.content, b.content)          // Longest common subsequence
  const score = jaccard * 0.4 + lcs * 0.6  // Hybrid weights
  return score >= 0.85  // Above 0.85 threshold considered duplicate
}

The two layers work together: exact deduplication quickly filters obviously duplicate content, while semantic deduplication handles synonymous expressions.

Design Trade-offs

Reviewing the dual-engine design, several trade-offs are worth discussing:

1. Why not just use a vector database? Because precise memories (usernames, tool names, configuration values) are not well-suited to vectorization. The word "pnpm" could be confused with "npm" or "yarn" after vectorization, but as a precise fact, CCB's file storage is more reliable.

2. Why not use files for everything? Because semantic search is impossible with large files. "The architecture of this project" — you cannot search for "architecture" across all memory files and piece together a complete answer. Vector search understands semantics.

3. Why incremental consolidation? Full consolidation is too expensive. A project might have hundreds of memories; rewriting everything each time Dream runs is both wasteful and error-prone. Incremental consolidation is more reliable.

4. Why does MEMORY.md need a hard limit? An index file without limits will grow uncontrollably. 200 lines roughly corresponds to 50-80 memories, which is sufficient for most projects. If memories exceed this count, it signals that consolidation is needed.

This installment took a deep dive into the internal implementation of CCB and MAGMA. Next, we cover the most sophisticated part of the memory system — intent-driven routing. When a user asks a question, how does the system decide whether to search CCB or MAGMA? How does it determine the depth and scope of retrieval?

Series:

Previous: AI Memory Systems: Giving AI Long-Term Memory
Next: Intent-Driven: How AI Understands What You're Looking For

Dual-Engine Architecture: Files vs Vector Databases

Dual-Engine Architecture: Files vs Vector Databases

CCB: Storing Memories as Markdown Files

File Structure

MEMORY.md Index

CCB Reader

CCB Consolidator

MAGMA: Turning Memories into Vectors

Vector Storage

Hybrid Retrieval

Five-Layer Memory Architecture

Obsidian Integration

Knowledge Graph: Auto-Extracted from Obsidian

Graph Construction

Four Graph Query Types

Cross-System Collaboration

Parallel Retrieval

Result Fusion

Deduplication

Design Trade-offs

Related Articles

面试官问你：如何解决大模型的上下文长度限制——标准回答框架

大模型上下文长度限制完全指南：从原理到工程落地的 4 种方案

面试官问你：RAG 如何处理 PDF——别再说转文本切片了