当 AI Agent 的对话历史超出上下文窗口时,如何在不丢失关键信息的前提下压缩历史?OpenAI Codex CLI 采用了一种名为 Memento 的上下文压缩策略——让模型自己生成"交接摘要",再用摘要替换原始历史。本文将从源码级别完整剖析 Codex 的上下文压缩机制,包括触发条件、压缩流程、历史重建策略,并与 Claude Code 的实现进行深度对比。
1. 问题背景:为什么需要上下文压缩?
LLM 的上下文窗口是有限的。当 Agent 执行长任务时,对话历史(用户消息、工具调用、工具输出、模型回复)会不断增长,最终触及上下文窗口上限。此时面临三个选择:
- 截断:直接丢弃最早的消息——简单粗暴,但丢失关键上下文
- 滑动窗口:只保留最近 N 条消息——稍好,但仍可能丢失早期决策
- 压缩/摘要:让模型生成历史摘要,用摘要替换原始历史——保留语义,节省空间
Codex 选择了第三种方案,并将其命名为 Memento 策略(取自电影《记忆碎片》,暗示"在碎片中保留关键记忆")。
2. 压缩触发条件
2.1 自动压缩(Auto Compact)
自动压缩发生在 Turn 内的模型采样完成后。核心判断逻辑在 auto_compact_token_status 中:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
| async fn auto_compact_token_status(
sess: &Session,
turn_context: &TurnContext,
) -> AutoCompactTokenStatus {
let active_context_tokens = sess.get_total_token_usage().await;
let (auto_compact_scope_tokens, auto_compact_scope_limit, full_context_window_limit) =
match turn_context.config.model_auto_compact_token_limit_scope {
AutoCompactTokenLimitScope::Total => (
active_context_tokens,
turn_context.model_info.auto_compact_token_limit()
.unwrap_or(i64::MAX),
None,
),
AutoCompactTokenLimitScope::BodyAfterPrefix => {
// 只计算前缀之后的部分(排除缓存前缀)
let window = sess.auto_compact_window_snapshot().await;
let baseline = window.prefill_input_tokens.unwrap_or(active_context_tokens);
(
active_context_tokens.saturating_sub(baseline),
turn_context.config.model_auto_compact_token_limit
.or_else(|| turn_context.model_info.auto_compact_token_limit())
.unwrap_or(i64::MAX),
turn_context.model_context_window(),
)
}
};
// ...
}
|
Codex 支持两种自动压缩范围:
| 范围 | 说明 |
|---|
Total | 基于 Token 总量判断,简单直接 |
BodyAfterPrefix | 排除缓存前缀(prefill)后的 Token 量,更精确 |
当 auto_compact_scope_tokens >= auto_compact_scope_limit 时,触发自动压缩。
2.2 手动压缩
用户可以通过 /compact 命令手动触发压缩,此时:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| pub(crate) async fn run_compact_task(
sess: Arc<Session>,
turn_context: Arc<TurnContext>,
input: Vec<UserInput>,
) -> CodexResult<()> {
// 手动压缩使用 DoNotInject 策略
run_compact_task_inner(
sess.clone(), turn_context, input,
InitialContextInjection::DoNotInject, // 关键区别
CompactionTrigger::Manual,
CompactionReason::UserRequested,
CompactionPhase::StandaloneTurn,
).await
}
|
2.3 两种触发方式的关键区别
| 维度 | 自动压缩 | 手动压缩 |
|---|
| 触发时机 | Turn 内采样完成后 | 用户显式请求 |
| InitialContextInjection | BeforeLastUserMessage | DoNotInject |
| CompactionPhase | MidTurn | StandaloneTurn |
| 压缩后行为 | 继续当前 Turn 的模型请求 | 结束当前 Turn |
| 压缩 Prompt | 使用默认 SUMMARIZATION_PROMPT | 可携带用户自定义指令 |
InitialContextInjection 的区别是核心:
BeforeLastUserMessage:压缩后在历史中重新注入初始上下文(工具定义、环境信息等),放在最后一个用户消息之前。因为自动压缩发生在 Turn 中间,模型需要看到完整的初始上下文才能继续工作。DoNotInject:不注入初始上下文。因为手动压缩是一个独立 Turn,下一个常规 Turn 会自动重新注入初始上下文。
3. 压缩 Prompt:让模型自己写摘要
3.1 固定的压缩提示词
Codex 使用了一个固定的压缩提示词,定义在 core/templates/compact/prompt.md:
1
2
3
4
5
6
7
8
9
10
11
| You are performing a CONTEXT CHECKPOINT COMPACTION. Create a handoff summary
for another LLM that will resume the task.
Include:
- Current progress and key decisions made
- Important context, constraints, or user preferences
- What remains to be done (clear next steps)
- Any critical data, examples, or references needed to continue
Be concise, structured, and focused on helping the next LLM seamlessly
continue the work.
|
这个 Prompt 通过 include_str! 在编译时嵌入:
1
| pub const SUMMARIZATION_PROMPT: &str = include_str!("../templates/compact/prompt.md");
|
3.2 手动压缩的自定义指令
手动压缩时,用户可以提供自定义指令(如 /compact 重点保留测试结果),这些指令会作为 UserInput 传入,替代默认的 SUMMARIZATION_PROMPT:
1
2
3
4
5
6
| // TurnContext 中的 compact_prompt 方法
pub(crate) fn compact_prompt(&self) -> &str {
self.compact_prompt
.as_deref()
.unwrap_or(compact::SUMMARIZATION_PROMPT)
}
|
如果用户提供了自定义指令,compact_prompt 字段会被设置为用户输入;否则使用默认的 SUMMARIZATION_PROMPT。
4. 压缩执行流程
4.1 完整流程图
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
| 触发压缩(自动/手动)
│
▼
run_pre_compact_hooks() ← 钩子检查,可能中断
│
▼
run_compact_task_inner_impl()
│
├── 1. 发出 ContextCompactionItem Started 事件
├── 2. 克隆当前历史记录
├── 3. 将压缩 Prompt 加入历史
│
▼
┌──────────────────────────────────┐
│ loop { │
│ 构建 Prompt (history + 指令) │
│ │ │
│ ▼ │
│ ModelClient::stream() │ ← 调用 OpenAI API
│ │ │
│ ▼ │
│ drain_to_completed() │ ← 流式接收摘要
│ │ │
│ ┌─ Ok → break │
│ ├─ ContextWindowExceeded │
│ │ → history.remove_first() │ ← 逐条删除最旧历史
│ │ → continue │
│ └─ Other Error │
│ → retry with backoff │
│ } │
└──────────────────────────────────┘
│
├── 4. 提取模型生成的摘要
├── 5. 构建 SUMMARY_PREFIX + 摘要
├── 6. 收集用户消息(最多 20,000 tokens)
├── 7. build_compacted_history()
├── 8. 按需注入初始上下文
├── 9. replace_compacted_history()
├── 10. recompute_token_usage()
│
▼
run_post_compact_hooks() ← 钩子检查
│
▼
发出 Warning: 多次压缩可能降低准确性
|
4.2 核心实现:run_compact_task_inner_impl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
| async fn run_compact_task_inner_impl(
sess: Arc<Session>,
turn_context: Arc<TurnContext>,
input: Vec<UserInput>,
initial_context_injection: InitialContextInjection,
) -> CodexResult<String> {
// 1. 标记压缩开始
let compaction_item = TurnItem::ContextCompaction(ContextCompactionItem::new());
sess.emit_turn_item_started(&turn_context, &compaction_item).await;
// 2. 将压缩请求加入历史
let initial_input_for_turn: ResponseInputItem = ResponseInputItem::from(input);
let mut history = sess.clone_history().await;
history.record_items(&[initial_input_for_turn.into()], ...);
// 3. 调用模型生成摘要(带重试)
let mut client_session = sess.services.model_client.new_session();
loop {
let turn_input = history.clone().for_prompt(...);
let prompt = Prompt {
input: turn_input,
base_instructions: sess.get_base_instructions().await,
personality: turn_context.personality,
..Default::default()
};
match drain_to_completed(&sess, turn_context.as_ref(),
&mut client_session, ..., &prompt).await {
Ok(()) => break,
Err(CodexErr::ContextWindowExceeded) => {
// 压缩时也超限?逐条删除最旧历史
history.remove_first_item();
continue;
}
Err(e) if retries < max_retries => {
retries += 1;
tokio::time::sleep(backoff(retries)).await;
continue;
}
Err(e) => return Err(e),
}
}
// 4. 提取摘要
let history_snapshot = sess.clone_history().await;
let summary_suffix = get_last_assistant_message_from_turn(history_snapshot.raw_items())
.unwrap_or_default();
let summary_text = format!("{SUMMARY_PREFIX}\n{summary_suffix}");
// 5. 收集用户消息并构建压缩后历史
let user_messages = collect_user_messages(history_snapshot.raw_items());
let mut new_history = build_compacted_history(Vec::new(), &user_messages, &summary_text);
// 6. 按需注入初始上下文
if matches!(initial_context_injection, InitialContextInjection::BeforeLastUserMessage) {
let initial_context = sess.build_initial_context(turn_context.as_ref()).await;
new_history = insert_initial_context_before_last_real_user_or_summary(
new_history, initial_context
);
}
// 7. 替换历史
sess.replace_compacted_history(new_history, reference_context_item, compacted_item).await;
sess.recompute_token_usage(&turn_context).await;
// 8. 警告
sess.send_event(&turn_context, EventMsg::Warning(WarningEvent {
message: "Heads up: Long threads and multiple compactions can cause the model \
to be less accurate. Start a new thread when possible.".to_string(),
})).await;
Ok(summary_suffix)
}
|
5. 摘要前缀:告诉模型"这是压缩摘要"
压缩后的摘要使用 SUMMARY_PREFIX 作为前缀,定义在 core/templates/compact/summary_prefix.md:
1
2
3
4
5
6
| Another language model started to solve this problem and produced a summary
of its thinking process. You also have access to the state of the tools that
were used by that language model. Use this to build on the work that has
already been done and avoid duplicating work. Here is the summary produced
by the other language model, use the information in this summary to assist
your own analysis:
|
这个前缀的设计非常巧妙:
- 人格分离:将摘要描述为"另一个语言模型"的输出,避免模型认为自己"说过"这些话
- 工具状态继承:明确告知模型"你可以使用之前语言模型使用的工具状态"
- 避免重复:要求模型"在已有工作基础上继续,避免重复劳动"
最终注入历史的格式:
1
2
| [SUMMARY_PREFIX]
[模型生成的摘要正文]
|
6. 压缩后的历史重建
6.1 build_compacted_history
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
| fn build_compacted_history_with_limit(
mut history: Vec<ResponseItem>,
user_messages: &[String],
summary_text: &str,
max_tokens: usize, // 默认 20,000
) -> Vec<ResponseItem> {
// 1. 从后往前选取用户消息,最多 max_tokens
let mut selected_messages: Vec<String> = Vec::new();
let mut remaining = max_tokens;
for message in user_messages.iter().rev() {
if remaining == 0 { break; }
let tokens = approx_token_count(message);
if tokens <= remaining {
selected_messages.push(message.clone());
remaining = remaining.saturating_sub(tokens);
} else {
let truncated = truncate_text(message, TruncationPolicy::Tokens(remaining));
selected_messages.push(truncated);
break;
}
}
selected_messages.reverse();
// 2. 将用户消息加入新历史
for message in &selected_messages {
history.push(ResponseItem::Message {
role: "user".to_string(),
content: vec![ContentItem::InputText { text: message.clone() }],
// ...
});
}
// 3. 将摘要作为最后一条用户消息加入
history.push(ResponseItem::Message {
role: "user".to_string(),
content: vec![ContentItem::InputText {
text: summary_text.to_string()
}],
// ...
});
history
}
|
6.2 压缩后历史的结构
1
2
3
4
5
6
7
8
9
10
11
12
| 压缩前: 压缩后:
┌──────────────────────┐ ┌──────────────────────┐
│ System Instructions │ │ (空,等待下次Turn注入) │
│ Initial Context │ │ │
│ User: "fix bug" │ │ User: "fix bug" │ ← 保留的
│ Assistant: "..." │ │ User: "add tests" │ ← 用户消息
│ Tool: shell output │ │ │ (≤20K tokens)
│ User: "add tests" │ │ User: [SUMMARY_PREFIX│
│ Assistant: "..." │ │ + 摘要正文] │ ← 摘要
│ Tool: shell output │ └──────────────────────┘
│ ... (100+ items) │
└──────────────────────┘
|
关键设计决策:
- 用户消息优先保留:从后往前选取,最近的消息完整保留,较旧的可能截断
- 摘要作为用户消息:摘要被包装为
role: "user" 的消息,让模型将其视为需要处理的上下文 - 初始上下文不保留:压缩后清空初始上下文(工具定义、环境信息等),等下一个常规 Turn 重新注入
- 摘要标识:通过
SUMMARY_PREFIX 前缀标识摘要消息,is_summary_message() 用于检测
6.3 初始上下文注入位置
对于 MidTurn 压缩,初始上下文需要注入到压缩后的历史中:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| pub(crate) fn insert_initial_context_before_last_real_user_or_summary(
mut compacted_history: Vec<ResponseItem>,
initial_context: Vec<ResponseItem>,
) -> Vec<ResponseItem> {
// 优先插入位置:最后一个真实用户消息之前
// 备选位置:最后一个摘要消息之前
// 兜底位置:追加到末尾
let insertion_index = last_real_user_index
.or(last_user_or_summary_index)
.or(last_compaction_index);
if let Some(insertion_index) = insertion_index {
compacted_history.splice(insertion_index..insertion_index, initial_context);
} else {
compacted_history.extend(initial_context);
}
compacted_history
}
|
7. 远程压缩(Remote Compaction)
Codex 还支持一种远程压缩模式,由 OpenAI 服务端执行压缩而非本地:
1
2
3
| pub(crate) fn should_use_remote_compact_task(provider: &ModelProviderInfo) -> bool {
provider.supports_remote_compaction()
}
|
远程压缩的优势:
- 更高效:服务端可以直接访问模型内部状态,无需客户端-服务端往返
- 更精确:服务端可以更准确地计算 Token 使用量
- 更一致:压缩逻辑由 OpenAI 统一管理
远程压缩有两个版本(V1 和 V2),通过 Feature Flag 控制:
1
2
3
4
5
6
7
8
9
10
11
12
| async fn run_auto_compact(...) -> CodexResult<bool> {
if should_use_remote_compact_task(turn_context.provider.info()) {
if turn_context.features.enabled(Feature::RemoteCompactionV2) {
run_inline_remote_auto_compact_task_v2(...).await?;
return Ok(false);
}
run_inline_remote_auto_compact_task(...).await?;
} else {
run_inline_auto_compact_task(...).await?;
}
// ...
}
|
8. Hook 系统:压缩前后的扩展点
Codex 提供了压缩前后的 Hook 机制,允许用户自定义压缩行为:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| // 压缩前 Hook
let pre_compact_outcome = run_pre_compact_hooks(&sess, &turn_context, trigger).await;
match pre_compact_outcome {
PreCompactHookOutcome::Continue => {} // 继续
PreCompactHookOutcome::Stopped { reason } => {
return Err(CodexErr::TurnAborted); // 中断
}
}
// 压缩后 Hook
let post_compact_outcome = run_post_compact_hooks(&sess, &turn_context, trigger).await;
if let PostCompactHookOutcome::Stopped = post_compact_outcome {
return Err(CodexErr::TurnAborted);
}
|
9. 与 Claude Code 上下文压缩的深度对比
Claude Code 的源码因 2025 年 3 月的 npm source map 泄露事件而公开。以下对比基于泄露源码中的实际代码(src/services/compact/),而非二手分析。
9.1 架构对比
| 维度 | Codex (Rust) | Claude Code (TypeScript) |
|---|
| 语言 | Rust | TypeScript |
| 压缩策略名 | Memento | 传统 compact + session memory compact + microcompact + reactive compact |
| 摘要生成 | 本地模型调用 / 远程 API | Forked Agent(共享 Prompt Cache)/ 流式回退 |
| 历史替换 | 整体替换(摘要 + 用户消息) | 整体替换(boundary + 摘要 + 附件 + 保留消息) |
9.2 触发机制对比
Codex:
- 基于 Token 使用量精确计算
- 支持
Total 和 BodyAfterPrefix 两种范围 - MidTurn 压缩后自动继续当前请求
Claude Code:
- 阈值 ≈ 上下文窗口 - 20,000(摘要预留)- 13,000(缓冲)
- 对 200K 窗口约 83.5% 时触发
- 自动压缩后继续当前 query
9.3 压缩路径对比
Codex 的路径:
1
2
3
| 触发压缩 → run_inline_auto_compact_task (本地)
→ run_inline_remote_auto_compact_task (远程 V1)
→ run_inline_remote_auto_compact_task_v2 (远程 V2)
|
三条路径最终都产出相同结构的压缩历史。
Claude Code 的路径(基于 src/commands/compact/compact.ts 源码):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| /compact 命令入口
│
├─ 无自定义指令 → trySessionMemoryCompaction()
│ │ 成功 → 使用 session memory 作为摘要(快速路径)
│ │ 失败 ↓
│
├─ Reactive-only 模式 → compactViaReactive()
│
└─ 传统路径 → microcompactMessages() → compactConversation()
│
├─ streamCompactSummary()
│ ├─ Forked Agent(共享 Prompt Cache)
│ └─ 流式回退(独立 API 调用)
│
└─ 生成 post-compact 附件
|
Claude Code 比 Codex 多了三种机制:
- Session Memory Compact:使用后台已维护的 session memory 直接作为摘要,无需现场调用模型
- Microcompact:在传统压缩前先执行一轮轻量压缩(去除冗余内容)
- Reactive Compact:响应式压缩,在 Prompt-Too-Long 错误时触发
9.4 摘要 Prompt 对比(基于真实源码)
这是两者差异最大的地方。
Codex 的压缩 Prompt(core/templates/compact/prompt.md,约 60 词):
1
2
3
4
5
6
7
8
9
10
11
| You are performing a CONTEXT CHECKPOINT COMPACTION. Create a handoff summary
for another LLM that will resume the task.
Include:
- Current progress and key decisions made
- Important context, constraints, or user preferences
- What remains to be done (clear next steps)
- Any critical data, examples, or references needed to continue
Be concise, structured, and focused on helping the next LLM seamlessly
continue the work.
|
- 极简设计,约 60 个词
- 强调"交接"视角——为下一个 LLM 写摘要
- 手动压缩时支持自定义指令
Claude Code 的压缩 Prompt(src/services/compact/prompt.ts,约 400 词):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
| // 源码中的 NO_TOOLS_PREAMBLE(防止模型调用工具)
const NO_TOOLS_PREAMBLE = `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.
- Do NOT use Read, Bash, Grep, Glob, Edit, Write, or ANY other tool.
- You already have all the context you need in the conversation above.
- Tool calls will be REJECTED and will waste your only turn — you will
fail the task.
- Your entire response must be plain text: an <analysis> block followed
by a <summary> block.
`
// 主体 Prompt(BASE_COMPACT_PROMPT)
const BASE_COMPACT_PROMPT = `Your task is to create a detailed summary of the
conversation so far, paying close attention to the user's explicit requests
and your previous actions.
This summary should be thorough in capturing technical details, code patterns,
and architectural decisions that would be essential for continuing development
work without losing context.
Before providing your final summary, wrap your analysis in <analysis> tags
to organize your thoughts and ensure you've covered all necessary points.
In your analysis process:
1. Chronologically analyze each message and section of the conversation.
For each section thoroughly identify:
- The user's explicit requests and intents
- Your approach to addressing the user's requests
- Key decisions, technical concepts and code patterns
- Specific details like: file names, full code snippets,
function signatures, file edits
- Errors that you ran into and how you fixed them
- Pay special attention to specific user feedback that you received
2. Double-check for technical accuracy and completeness.
Your summary should include the following sections:
1. Primary Request and Intent
2. Key Technical Concepts
3. Files and Code Sections
4. Errors and fixes
5. Problem Solving
6. All user messages
7. Pending Tasks
8. Current Work
9. Optional Next Step
`
|
Claude Code 的 Prompt 设计有几个显著特点:
- 结构化输出:要求模型先输出
<analysis> 思考过程,再输出 <summary> 正式摘要。formatCompactSummary() 会剥离 <analysis> 部分,只保留 <summary> - 9 个固定章节:比 Codex 的 4 个要点详细得多,特别是"Files and Code Sections"和"All user messages"
- 防工具调用:Prompt 前后都有
NO_TOOLS_PREAMBLE / NO_TOOLS_TRAILER,防止模型在压缩时调用工具 - 支持自定义指令:通过
Additional Instructions 追加
9.5 摘要前缀对比
Codex 的摘要前缀(core/templates/compact/summary_prefix.md):
1
2
3
4
5
6
| Another language model started to solve this problem and produced a summary
of its thinking process. You also have access to the state of the tools that
were used by that language model. Use this to build on the work that has
already been done and avoid duplicating work. Here is the summary produced
by the other language model, use the information in this summary to assist
your own analysis:
|
- 使用"另一个语言模型"的人称分离策略
- 强调"工具状态可继承"和"避免重复劳动"
Claude Code 的摘要前缀(src/services/compact/prompt.ts 中的 getCompactUserSummaryMessage):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
| export function getCompactUserSummaryMessage(
summary: string,
suppressFollowUpQuestions?: boolean,
transcriptPath?: string,
recentMessagesPreserved?: boolean,
): string {
const formattedSummary = formatCompactSummary(summary)
let baseSummary = `This session is being continued from a previous conversation
that ran out of context. The summary below covers the earlier portion of
the conversation.
${formattedSummary}`
if (transcriptPath) {
baseSummary += `\n\nIf you need specific details from before compaction
(like exact code snippets, error messages, or content you generated),
read the full transcript at: ${transcriptPath}`
}
if (recentMessagesPreserved) {
baseSummary += `\n\nRecent messages are preserved verbatim.`
}
if (suppressFollowUpQuestions) {
baseSummary += `\nContinue the conversation from where it left off
without asking the user any further questions. Resume directly — do not
acknowledge the summary, do not recap what was happening, do not preface
with "I'll continue" or similar. Pick up the last task as if the break
never happened.`
}
return baseSummary
}
|
Claude Code 的摘要前缀更丰富:
- Transcript 路径:告诉模型可以去哪里读取完整的压缩前对话
- 最近消息保留提示:如果 session memory compact 保留了最近消息,会告知模型
- 自动继续指令:自动压缩时,明确要求模型"不要问用户、不要 recap、直接继续"
9.6 压缩后上下文恢复对比
这是两者最大的差异点。
Codex 的恢复策略:
1
| 压缩后历史 = [初始上下文(可选)] + [保留的用户消息(≤20K tokens)] + [摘要]
|
- 用户消息从后往前保留,最多 20,000 tokens
- 摘要作为用户消息注入,带
SUMMARY_PREFIX 标识 - 初始上下文(工具定义等)在下一个常规 Turn 重新注入
- 不显式恢复 Skill、Plan、MCP 状态
Claude Code 的恢复策略(基于 compactConversation() 源码):
1
2
| 压缩后历史 = [boundaryMarker] + [summaryMessages] + [messagesToKeep]
+ [attachments] + [hookResults]
|
post-compact 附件在 compactConversation() 中按以下顺序生成:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| // 1. 文件附件(最多 5 个文件,总计 50K tokens)
const [fileAttachments, asyncAgentAttachments] = await Promise.all([
createPostCompactFileAttachments(preCompactReadFileState, context, 5),
createAsyncAgentAttachmentsIfNeeded(context),
])
// 2. Plan 附件
const planAttachment = createPlanAttachmentIfNeeded(context.agentId)
// 3. Plan Mode 附件
const planModeAttachment = await createPlanModeAttachmentIfNeeded(context)
// 4. Skill 附件(每个最多 5K tokens,总计最多 25K tokens)
const skillAttachment = createSkillAttachmentIfNeeded(context.agentId)
// 5. 工具/MCP/Agent Delta 附件(重新宣告完整集合)
for (const att of getDeferredToolsDeltaAttachment(...)) { ... }
for (const att of getAgentListingDeltaAttachment(...)) { ... }
for (const att of getMcpInstructionsDeltaAttachment(...)) { ... }
|
关键常量(源码中直接定义):
1
2
3
4
5
| export const POST_COMPACT_MAX_FILES_TO_RESTORE = 5
export const POST_COMPACT_TOKEN_BUDGET = 50_000
export const POST_COMPACT_MAX_TOKENS_PER_FILE = 5_000
export const POST_COMPACT_MAX_TOKENS_PER_SKILL = 5_000
export const POST_COMPACT_SKILLS_TOKEN_BUDGET = 25_000
|
9.7 Skill 恢复的源码级对比
Codex:不显式恢复 Skill,依赖摘要中的信息。
Claude Code:显式恢复 Skill,源码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
| // src/services/compact/compact.ts:1494
export function createSkillAttachmentIfNeeded(
agentId?: string,
): AttachmentMessage | null {
const invokedSkills = getInvokedSkillsForAgent(agentId)
if (invokedSkills.size === 0) return null
// 按最近调用排序,预算压力下丢弃最久未用的
let usedTokens = 0
const skills = Array.from(invokedSkills.values())
.sort((a, b) => b.invokedAt - a.invokedAt)
.map(skill => ({
name: skill.skillName,
path: skill.skillPath,
content: truncateToTokens(skill.content, POST_COMPACT_MAX_TOKENS_PER_SKILL),
}))
.filter(skill => {
const tokens = roughTokenCountEstimation(skill.content)
if (usedTokens + tokens > POST_COMPACT_SKILLS_TOKEN_BUDGET) return false
usedTokens += tokens
return true
})
return createAttachmentMessage({ type: 'invoked_skills', skills })
}
|
截断策略保留 Skill 文件头部(setup/usage instructions 通常在顶部):
1
2
3
4
5
6
7
| function truncateToTokens(content: string, maxTokens: number): string {
if (roughTokenCountEstimation(content) <= maxTokens) return content
const charBudget = maxTokens * 4 - SKILL_TRUNCATION_MARKER.length
return content.slice(0, charBudget) + SKILL_TRUNCATION_MARKER
// SKILL_TRUNCATION_MARKER = '[... skill content truncated for compaction;
// use Read on the skill path if you need the full text]'
}
|
9.8 Session Memory Compact 的源码验证
Session Memory Compact 是 Claude Code 独有的快速路径。源码确认:
- 只恢复 plan,不恢复 invoked_skills:
1
2
3
4
| // src/services/compact/sessionMemoryCompact.ts:484-485
const planAttachment = createPlanAttachmentIfNeeded(agentId)
const attachments = planAttachment ? [planAttachment] : []
// 注意:没有 createSkillAttachmentIfNeeded() 调用
|
- 保留最近消息(而非全部替换):
1
2
3
4
5
| // src/services/compact/sessionMemoryCompact.ts:571-581
const startIndex = calculateMessagesToKeepIndex(messages, lastSummarizedIndex)
const messagesToKeep = messages
.slice(startIndex)
.filter(m => !isCompactBoundaryMessage(m))
|
- 有阈值检查(自动压缩时,压完仍超阈值则放弃):
1
2
3
4
5
6
| // src/services/compact/sessionMemoryCompact.ts:605-614
if (autoCompactThreshold !== undefined &&
postCompactTokenCount >= autoCompactThreshold) {
logEvent('tengu_sm_compact_threshold_exceeded', { ... })
return null // 放弃,回退到传统路径
}
|
9.9 Prompt Cache 共享机制
Claude Code 有一个 Codex 没有的优化:Forked Agent 共享 Prompt Cache。
1
2
3
4
5
6
7
8
9
10
11
12
13
| // src/services/compact/compact.ts:1179-1200
if (promptCacheSharingEnabled) {
const result = await runForkedAgent({
promptMessages: [summaryRequest],
cacheSafeParams,
canUseTool: createCompactCanUseTool(),
querySource: 'compact',
forkLabel: 'compact',
maxTurns: 1,
skipCacheWrite: true,
overrides: { abortController: context.abortController },
})
}
|
这个机制让压缩请求复用主对话的 Prompt Cache(system prompt + tools + context messages),避免重复创建缓存,显著降低成本。源码注释明确说明:
DO NOT set maxOutputTokens here. The fork piggybacks on the main thread’s prompt cache by sending identical cache-key params. Setting maxOutputTokens would create a thinking config mismatch that invalidates the cache.
9.10 设计哲学总结
| 维度 | Codex | Claude Code |
|---|
| 核心理念 | 摘要即记忆 | 摘要 + 结构化附件 |
| 恢复粒度 | 语义级(依赖摘要质量) | 结构级(显式附件) |
| Prompt 风格 | 极简(60 词) | 详尽(400 词,9 章节) |
| 快速路径 | 远程压缩(服务端执行) | session memory compact(后台记忆) |
| Cache 优化 | 无 | Forked Agent 共享 Prompt Cache |
| 可扩展性 | Hook 系统 | 附件系统 + Hook 系统 |
| Skill 恢复 | 不显式恢复 | 显式 invoked_skills 附件(5K/个,25K 总计) |
| 风险 | 摘要遗漏关键信息 | 附件 Token 开销大 |
Codex 选择"简洁但依赖摘要质量",Claude Code 选择"冗余但结构化恢复"。两种策略各有取舍:Codex 的压缩更轻量,但多次压缩后信息衰减更快;Claude Code 的附件系统更健壮,但 Token 开销更大。
10. 关键源码文件索引
Codex 源码
| 文件 | 职责 |
|---|
core/src/compact.rs | 压缩核心逻辑:触发、执行、历史重建 |
core/src/compact_remote.rs | 远程压缩 V1 |
core/src/compact_remote_v2.rs | 远程压缩 V2 |
core/templates/compact/prompt.md | 压缩提示词模板 |
core/templates/compact/summary_prefix.md | 摘要前缀模板 |
core/src/session/turn.rs | 自动压缩触发判断 |
core/src/session/mod.rs | replace_compacted_history() |
core/src/hooks/src/events/compact.rs | 压缩 Hook 事件 |
codex-protocol/src/items.rs | ContextCompactionItem 定义 |
Claude Code 源码(泄露)
| 文件 | 职责 |
|---|
src/services/compact/compact.ts | 传统压缩核心:compactConversation()、附件生成、Skill 恢复 |
src/services/compact/prompt.ts | 压缩 Prompt 定义:BASE_COMPACT_PROMPT、NO_TOOLS_PREAMBLE、摘要前缀 |
src/services/compact/sessionMemoryCompact.ts | Session Memory 快速路径:trySessionMemoryCompaction() |
src/services/compact/microCompact.ts | 轻量压缩:去除冗余内容 |
src/commands/compact/compact.ts | /compact 命令入口:路径路由 |
参考资料