Intention-driven: How AI understands what youre looking for

The first two issues talked about "why","what to store" and "how to store" about the memory system. The most exquisite part of this issue's chat-how to find it.

The user asked a question. How does the system determine whether to go to CCB or MAGMA? How to determine the depth and scope of search? How to pick the most relevant results from the two engines?

The answer lies in intent-driven routing.

1. Six classifications of intentions

The system divides user queries into six types of intentions. These six types are not taken randomly, but are summarized based on observations of actual use scenarios.

intention	description	example
`simple-fact`	Simple fact check	"What is X? "," How to use Y? "
`preference-recall`	User preference recall	"I said before... "," My preference is... "
`causal-reasoning`	causal reasoning	"Why? "," What will X cause? "
`entity-relationship`	Entity Relationship	"The relationship between X and Y? "," The difference between A and B? "
`temporal-analysis`	timeline analysis	"The development history of X? "," What is the timeline? "
`hybrid`	mixed problem	Complex issues involving multiple intentions

These six intentions correspond to different search strategies. For example,"users prefer to use pnpm"-this is a simple fact that can be checked in CCB and does not need to open MAGMA. "Why does this bug appear repeatedly? "--This is causal reasoning that requires MAGMA's map data to trace the causal relationship.

2. Two-level classification mechanism

Intention classification does not rely on a single method, but on a two-layer mechanism of LLM + rules:

user query
    │
    ▼
┌───────────────────┐
**** │ 1. LLM 意图分类   │  ← 主要方式
│   Confidence ≥ 0.7     │
└────────┬──────────┘
         ◆ If LLM fails or has insufficient confidence
         ▼
┌───────────────────┐
│ 2. rule matching       │  ← Fallback
│   Confidence ≥ 0.5     │
└────────┬──────────┘
         │
         ▼
    Cache results (TTL 24h)

Level 1: LLM Classification

Use a large model to analyze user intent:

// LLM intention classification
async classifyWithLLM(query: string): Promise<IntentResult> {
  const response = await this.llm.chat({
    messages: [{
      role: 'user',
      content: `Analyze the intent of the following query and return JSON:
        { category, confidence, reasoning, keywords }

        Query: ${query}

        Optional categories: simple-fact, preference-recall,
        causal-reasoning, entity-relationship,
        temporal-analysis, hybrid`
    }]
  })

  return JSON.parse(response.content)
  // { category: 'causal-reasoning', confidence: 0.92, ... }
}

The advantage of LLM classification is its accuracy-it can understand the semantics and not be confused by superficial wording. "What happened to this thing" and "Why did this happen" are literally different, but LLM can both recognize it as causal reasoning.

The disadvantage is the possibility of failure-network timeout, model degradation, and abnormal output format are all possible.

Level 2: Rule matching

Pattern matching based on regular expressions does not require an external API and never fails:

//Rule classification
const rules = {
  'causal-reasoning': [
    /Why/, /Reason/, /Causes/, /How did/appear/, /Why/
  ],
  'preference-recall': [
    /I (?: before| said| preferences| Liked)/, /My preferences/, /Remember/
  ],
  'temporal-analysis': [
    /History/, /Timeline/, /Development/, /Evolution/, /When/
  ],
  'entity-relationship': [
    /(?: and| and).* (?: relationship| difference| Different)/, /(?: comparison| Compare)/
  ],
  'simple-fact': [
    /what is/, /how to use/, /how to use/, /what is/
  ]
}

function classifyWithRules(query: string): IntentResult {
  let bestCategory = 'simple-fact'
  let bestScore = 0

  for (const [category, patterns] of Object.entries(rules)) {
    for (const pattern of patterns) {
      if (pattern.test(query)) {
        bestScore += 0.3 //Add 0.3 to each matching regular
      }
    }
  }

  return { category: bestCategory, confidence: Math.min(bestScore, 1.0) }
}

The advantage of rule classification is that it never fails-it does not rely on external services, does not consume tokens, and responds in milliseconds. The disadvantage is limited coverage-patterns that are not covered cannot be recognized.

The value of the two-level mechanism: LLM handles semantic understanding and rules handle the bottom. Use LLM (accurate) when LLM is normal, and use rules (reliable) when LLM is abnormal. Failure on both levels? There is also a 24-hour cache-the same query will not be analyzed repeatedly.

3. Routing decision matrix

With intent classification, the next step is routing decision-different intentions take different retrieval paths:

intention	strategy	memorization layer	map	Estimated Token
`simple-fact`	`ccb-only`	L4, L5	no	500
`preference-recall`	`both-priority-ccb`	L3	no	800
`causal-reasoning`	`both-priority-magma`	L2, L3, L4	concept, relationship	1200
`entity-relationship`	`both-priority-magma`	L2, L3	entity, relationship	1000
`temporal-analysis`	`both-priority-ccb`	L2, L4	temporal	1500
`hybrid`	`both-priority-ccb`	L2-L5	concept, entity, relationship	2000

This matrix is the core of the entire intent routing system. Each line answers three questions:

Where to find it--ccb-only check CCB,both-priority-magma check both but MAGMA is the main one
Which layer to find-different intentions focus on different memory layers, causal reasoning requires item context (L2), and preference recall only requires user layer (L3)
How deep is the search-the estimated number of Tokens is from 500 to 2000, simple questions are superficial, and complex questions are comprehensive and in-depth

Design ideas:

Simple fact → Only check CCB, exact match, 500 Tokens is enough
Preference memory → dual engines but CCB takes precedence because user memory is mainly in CCB
Causal reasoning → Dual engines but MAGMA takes precedence, requiring map data to trace the causal chain
Entity relationship → Dual engines but MAGMA takes precedence, requiring map data to find entity connections
Timeline → Dual engines but CCB takes precedence, project memory contains richer time information
Mixed problems → Full engine and full layer, full coverage with 2000 Token

I noticed one interesting detail: Token budgets vary widely. The simple question is 500 Tokens, and the mixed question is 2000 Tokens, which is four times the difference. This shows that the system has fine control over retrieval costs-rather than going all out every time, it allocates resources based on the complexity of the intention.

4. Progressive search

The search is not a single round, but is gradually deepened in three stages:

┌──────────────────────────────────────┐
③ Phase 1: ccb-fast                      │
③ System: CCB                             │
③ Restrictions: 5                            │
③ Time-out: 100ms                           │
**** │ 目的：先用 CCB 快速匹配               │
└──────────────┬───────────────────────┘
               ◆ Not enough results?
               ▼
┌──────────────────────────────────────┐
Stage 2: Magma-extend                  │
**** │ 系统：CCB + MAGMA                     │
③ Restrictions: 10                           │
③ Time-out: 300ms                           │
**** │ 目的：扩展到 MAGMA 语义搜索           │
└──────────────┬───────────────────────┘
****                │ 结果还不够？
               ▼
┌──────────────────────────────────────┐
③ Phase 3: full-fusion                   │
③ System: CCB + MAGMA + FusionVerifier    │
③ Restrictions: All                            │
③ Time-out: 800ms                           │
③ Purpose: Enable fusion validator for comprehensive search         │
└──────────────────────────────────────┘

Why incremental? Because search costs vary widely:

Phase 1 only checks CCB files, 100ms is enough, and the cost is extremely low.
Phase 2 plus vector search, 300ms, moderate cost
Phase 3 enables FusionVerifier (verify results with another LLM), 800ms, the highest cost

If you go through Stage 3 every time, simple questions will also require 800ms and a large number of Tokens. Progressive retrieval allows simple questions to be returned quickly, and only truly complex queries go through the full process.

This design is very similar to the search engine's strategy: first use inverted indexes to quickly recall, then use semantic models to refine them, and finally use verification models to confirm them. Progressive layer by layer, balance costs and effects.

5. Semantic de-duplication

Two engines retrieve in parallel and are likely to return duplicate results. What CCB found through keywords and what MAGMA found through semantics may be the same memory.

De-duplication is divided into two layers:

5.1 Precise deduplication

Based on normalized key name + content hash:

function exactDuplicate(a, b) {
  // 1. Key name normalization: convert to lower case, remove punctuation
  const keyA = a.name.toLowerCase().replace(/[^\w]/g, '')
  const keyB = b.name.toLowerCase().replace(/[^\w]/g, '')
  if (keyA === keyB) return true

  // 2. Content hash comparison
  const hashA = simpleHash(a.content)
  const hashB = simpleHash(b.content)
  if (hashA === hashB) return true

  return false
}

Precise deduplication is fast, hash comparison of O(1). But it can only identify "almost the same" content-it cannot identify what has different expressions but the same meaning.

5.2 Semantic de-duplication

Based on text similarity, mix with Jaccard + LCS:

function semanticDuplicate(a, b) {
  const wordsA = new Set(a.content.split(/\s+/))
  const wordsB = new Set(b.content.split(/\s+/))

  // Jaccard similarity (intersection/union of word sets)
  const intersection = new Set([... wordsA].filter(w => wordsB.has(w)))
  const union = new Set([... wordsA, ... wordsB])
  const jaccard = intersection.size / union.size

  //Longest common subsequence similarity
  const lcsLength = lcs(a.content, b.content).length
  const lcs = lcsLength / Math.max(a.content.length, b.content.length)

  //Mixed weight: Jaccard 0.4 + LCS 0.6
  const score = jaccard  0.4 + lcs  0.6

  return score >= 0.85 //Threshold above 0.85 is considered duplicate
}

Why are Jaccard and LCS mixed?

Jaccard only looks at word sets, fast but ignores word order. Jaccard for "Cat Chasing Mouse" and "Mouse Chasing Cat" is 1.0, but it means completely different
LCS considers word order and can distinguish between "cat chasing mouse" and "mouse chasing cat", but the calculation is slow.

Mixed use: Jaccard is responsible for rapid preliminary screening, and LCS is responsible for accurate judgment. In terms of weight, LCS accounts for 0.6 because word order information is more important.

6. Sorting algorithm

After de-duplication, the results need to be sorted. Rather than simply sorting by relevance, it combines multiple factors:

finalScore = baseScore × priorityMultiplier + layerBonus + recencyBonus

Four factors:

baseScore: raw correlation score (0-1), from CCB or MAGMA
PriorityMultiplier: Source Priority Multiplier. CCB priority ×1.1, MAGMA priority ×0.9, equal ×1.0. This multiplier reflects the routing strategy-CCB first queries, CCB results are weighted 10% higher
layerBonus: Bonus points for memory layer. The more relevant the matching layer, the more points will be, up to 0.1. For example, causal reasoning requires item memory (L2), and the result of L2 will be added points
recencyBonus: Time-sensitive bonus. Linear decay over 90 days, up to 0.05. The newer the memory, the more valuable it is.

I particularly appreciate the design of this formula-it is not a single dimensional ranking, but a synthesis of the four orthogonal dimensions of relevance, source preference, memory layer, and timeliness. The influence of each dimension is limited (multiplier at most 1.1 and bonus points at most 0.15), and no one factor will completely dominate the ranking.

7. Token budget control

After sorting, the last step is Token budget control-not all results can be stuffed into the context window:

function applyTokenBudget(results, maxTokens = 4000) {
  let accumulated = 0
  const selected = []

  for (const result of results) {
    //The number of tokens per memory does not exceed the total budget/number of memories
    const perMemoryCap = Math.floor(maxTokens / results.length)
    const tokens = Math.min(estimateTokens(result), perMemoryCap)

    if (accumulated + tokens <= maxTokens) {
      selected.push(result)
      accumulated += tokens
    } else {
      //Try to truncate the last memory and stuff it in
      const remaining = maxTokens - accumulated
      if (remaining > 0 && result.content.length > 100) {
        selected.push(truncateMemory(result, remaining))
      }
      break
    }
  }
  return selected
}

There are several interesting things about this algorithm:

Dynamic allocation-instead of fixing 500 Tokens per memory, it's 'total budget/number of memories'. 10 memories, each 400 Tokens;5 memories, each 800 Tokens.
Truncate rather than discard-When the budget is insufficient, the last memory is not directly discarded, but truncated. Keep the beginning and mark "(content truncated)."
Retain in sort order-the first ones are retained first, and the last ones are truncated or discarded first. This ensures that highly correlated memories will not be accidentally killed because they are ranked low.

Token budget control is the last gate of the memory system. Regardless of how many results are retrieved previously, the number of Tokens that are ultimately injected into the context is limited. This limitation forces the system to sort and de-duplicate during the retrieval stage-because only the first few items will ultimately be retained.

8. Complete process

String all the above links together, and a complete search process looks like this:

user question
    │
    ▼
┌─────────────────────┐
**** │ 1. 意图分类          │  LLM + 规则双层
│   → causal-reasoning │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
**** │ 2. 路由决策          │  查矩阵表
│   → both-priority-   │
│     magma, L2-L4,    │
│     concept+relation │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ 3. Progressive search        ③ The three stages are gradually deepened
│   ccb-fast → magma-  │
│   extend → full-     │
│   fusion             │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
**** │ 4. 结果融合          │  统一格式 + 去重
│   Accuracy + Semantic Two Levels    │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ 5. comprehensive ranking          ③ Relevance + Priority + Layer + Timing
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ 6. Token budget control    ◆ Dynamic allocation + truncation
└──────────┬──────────┘
           │
           ▼
    Inject into Claude context

There are six steps, each step solves one problem: intention classification solves "what to find", routing decision solves "where to find", progressive search solves "how to find", result fusion solves "put together", comprehensive sorting solves "which priority", Token budget control solves "how much can be stuffed?"

9. Design insights

Looking back at the entire intent-driven retrieval system, there are several design insights worth refining:

Intent determines everything. In the same memory bank, different intentions take different paths. This is not over-design, but an admission that "there is no silver bullet"-simple questions do not require complex retrieval, and complex questions are not worth returning quickly.
Double downgrade. LLM → Rules → Cache, with three levels. Good system design assumes that every link can fail and is prepared for failure.
Progressive investment. Spend 100ms trying CCB first, but it is not enough to add MAGMA, and it is not enough to use FusionVerifier. The search cost matches the query complexity, so there is no waste.
Multi-dimensional sorting. It is not ranked by correlation alone, but by comprehensive correlation, source, layer, and timeliness. The impact of each dimension is limited, avoiding single factor dominating.
Hard constraints are the bottom. The Token budget is a hard constraint. No matter how many results are retrieved previously, the number of Tokens injected is limited. This constraint forces the quality of the previous links.

This issue breaks down the complete process of intent-driven search. The next issue will talk about the "self-evolution" of the memory system-the Dream automatic organization mechanism, configuration system, and the actual code practice of the Shared Memory System.

Series:

Intention-driven: How AI understands what you're looking for