v2.7.4: 截断安全 + 代理续写禁用 + 日志提示词对比视图

- 截断时跳过工具解析，防止损坏的工具调用（写入半截文件） - maxAutoContinue 默认 0，交由 Claude Code 原生续写 - 系统提示词身份声明清除（防 prompt injection 拒绝） - 流式热身窗口 96→300 chars（拒绝检测前不释放文本） - 日志查看器「提示词对比」视图：原始 vs Cursor 转换后 - 转换摘要面板：工具数/消息数/上下文大小一目了然 - 标题提取增强：通用 XML 标签清除 + 更多引导语过滤
2026-05-07 22:27:15 +08:00 · 2026-03-18 11:56:26 +08:00
parent e6f3a06416
commit 8a5117bbb1
14 changed files with 707 additions and 159 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,26 @@
 # Changelog

+## v2.7.4 (2026-03-18)
+
+### 🛡️ 截断安全 — 防止损坏的工具调用
+
+- **截断时跳过工具解析**：当响应被截断（`stop_reason=max_tokens`）时，不再尝试解析不完整的 `json action` 块，避免生成损坏的工具调用（如写入半截文件）
+- **纯文本回退**：截断响应中的不完整工具块被自动剥离，剩余文本作为纯文本返回，由客户端（Claude Code）原生续写
+- **默认禁用代理续写**：`maxAutoContinue` 默认值改为 `0`，让 Claude Code 原生处理续写（体验更好、进度可见），配置同步更新至 `config.yaml`、`config.yaml.example`、`docker-compose.yml`
+
+### 🧹 提示词注入防御增强
+
+- **身份声明清除**：自动剥离系统提示词中的 Claude Code / Anthropic 身份声明（`You are Claude Code`、`I'm Claude, made by Anthropic` 等），防止模型将其判定为 prompt injection 并拒绝服务
+- **流式热身窗口扩大**：混合流式模式的 `warmupChars` 从 96 增至 300 字符，确保拒绝检测完成前不释放任何文本给客户端
+
+### 📊 日志查看器增强
+
+- **提示词对比视图**：「💬 提示词」tab 重命名为「💬 提示词对比」，分区展示原始请求 vs 转换后的 Cursor 消息
+- **转换摘要面板**：顶部新增 6 格摘要（原始工具数 → Cursor 工具数 0、工具指令占用字符数、消息数变化、总上下文大小）
+- **工具去向提示**：当有工具时显示黄色提示「Cursor API 不支持原生 tools 参数，N 个工具已转换为文本指令嵌入 user #1」
+- **标题提取优化**：通用 XML 标签清除（覆盖所有注入标签）+ 清除 `Respond with the appropriate action` 引导语
+
+---
 ## v2.7.2 (2026-03-17)

 ### 🖥️ 日志查看器全面升级
--- a/README.md
+++ b/README.md
@@ -1,8 +1,8 @@
-# Cursor2API v2.7.3
+# Cursor2API v2.7.4

 将 Cursor 文档页免费 AI 对话接口代理转换为 **Anthropic Messages API** 和 **OpenAI Chat Completions API**，支持 **Claude Code** 和 **Cursor IDE** 使用。

-> ⚠️ **版本说明**：当前 v2.7.3 统一 thinking 剥离逻辑、增强拒绝检测准确性、优化 Docker 部署配置。
+> ⚠️ **版本说明**：当前 v2.7.4 截断安全（防止损坏工具调用）、默认禁用代理续写（让客户端原生续写）、日志查看器提示词对比视图。

 ## 原理

@@ -80,6 +80,7 @@ cp config.yaml.example config.yaml
 | `logging.file_enabled` | 日志文件持久化 | `false` |
 | `logging.dir` | 日志存储目录 | `./logs` |
 | `logging.max_days` | 日志保留天数 | `7` |
+| `max_auto_continue` | 截断自动续写次数 (`0`=禁用，交由客户端续写) | `0` |

 > 💡 详细配置说明请参见 `config.yaml.example` 中的注释。

@@ -241,6 +242,7 @@ AI 按此格式输出 → 我们解析并转换为标准的 Anthropic `tool_use`
 | `COMPRESSION_LEVEL` | 压缩级别 (`1`/`2`/`3`) |
 | `LOG_FILE_ENABLED` | 日志文件持久化 (`true`/`false`) |
 | `LOG_DIR` | 日志文件目录 |
+| `MAX_AUTO_CONTINUE` | 截断自动续写次数 (`0`=禁用) |

 ## 免责声明 / Disclaimer

--- a/config.yaml.example
+++ b/config.yaml.example
@@ -29,16 +29,17 @@ cursor_model: "anthropic/claude-sonnet-4.6"

 # ==================== 自动续写配置 ====================
 # 当模型输出被截断时，自动发起续写请求的最大次数
-# 设为 0 可完全禁用自动续写（由用户在对话中手动续写）
-# 环境变量: MAX_AUTO_CONTINUE=3
-max_auto_continue: 3
+# 默认 0（禁用），由客户端（如 Claude Code）自行处理续写，体验更好
+# 设为 1~3 可启用 proxy 内部续写（拼接更完整，但延迟更高）
+# 环境变量: MAX_AUTO_CONTINUE=0
+max_auto_continue: 0

 # ==================== 历史消息条数硬限制 ====================
 # 输入消息条数上限，超出时删除最早的消息（保留工具 few-shot 示例）
 # 防止超长对话（800+ 条）导致请求体积过大、响应变慢
 # 设为 -1 不限制消息条数
 # 环境变量: MAX_HISTORY_MESSAGES=100
-max_history_messages: 100
+max_history_messages: -1

 # ==================== Thinking 开关（最高优先级） ====================
 # 控制是否向 Cursor 发送 thinking 请求，优先级高于客户端传入的 thinking 参数
@@ -46,8 +47,8 @@ max_history_messages: 100
 # 设为 false: 强制关闭 thinking（即使客户端请求了 thinking 也不启用）
 # 不配置此项时: 跟随客户端请求（Anthropic API 看 thinking 参数，OpenAI API 看模型名/reasoning_effort）
 # 环境变量: THINKING_ENABLED=true|false
-# thinking:
-#   enabled: false
+thinking:
+  enabled: false

 # ==================== 历史消息压缩配置 ====================
 # 对话过长时自动压缩早期消息，释放输出空间，防止 Cursor 上下文溢出
@@ -55,40 +56,40 @@ max_history_messages: 100
 compression:
  # 是否启用压缩（true/false），关闭后所有消息原样保留
  # 环境变量: COMPRESSION_ENABLED=true|false
-  enabled: true
+  enabled: false

-  # 压缩级别: 1=轻度, 2=中等(默认), 3=激进
+  # 压缩级别: 1=轻度(默认), 2=中等, 3=激进
  # 环境变量: COMPRESSION_LEVEL=1|2|3
  # 级别说明:
-  #   1（轻度）: 保留最近 10 条消息，早期消息保留 4000 字符，适合短对话
-  #   2（中等）: 保留最近 6 条消息，早期消息保留 2000 字符，推荐日常使用
+  #   1（轻度）: 保留最近 10 条消息，早期消息保留 4000 字符，适合日常使用（默认）
+  #   2（中等）: 保留最近 6 条消息，早期消息保留 2000 字符，适合中长对话
  #   3（激进）: 保留最近 4 条消息，早期消息保留 1000 字符，适合超长对话/大工具集
-  level: 2
+  level: 1

  # 以下为高级选项，设置后会覆盖 level 的预设值
  # 保留最近 N 条消息不压缩（数字越大保留越多上下文）
-  # keep_recent: 6
+  # keep_recent: 10

  # 早期消息最大字符数（超过此长度的消息会被智能压缩）
-  # early_msg_max_chars: 2000
+  # early_msg_max_chars: 4000

 # ==================== 工具处理配置 ====================
 # 控制工具定义如何传递给模型，影响上下文体积和工具调用准确性
 tools:
  # Schema 呈现模式
-  #   'compact': [默认推荐] TypeScript 风格的紧凑签名，体积最小（~15K chars/90工具）
+  #   'compact': TypeScript 风格的紧凑签名，体积最小（~15K chars/90工具）
  #              示例: {file_path!: string, encoding?: utf-8|base64}
-  #   'full':    完整 JSON Schema，体积最大（~135K chars/90工具），工具调用最精确
-  #              适合工具少（<20个）或参数复杂的场景
+  #   'full':    [默认] 完整 JSON Schema，工具调用最精确
+  #              适合工具少（<20个）或需要最高准确率的场景
  #   'names_only': 只输出工具名和描述，不输出参数Schema
  #              极致省 token，适合模型已经"学过"这些工具的场景（如 Claude Code 内置工具）
-  schema_mode: 'compact'
+  schema_mode: 'full'

  # 工具描述截断长度
-  #   50: [默认推荐] 截断到 50 个字符，节省上下文
-  #   0:  不截断，保留完整描述（适合工具少的场景）
+  #   0:  [默认] 不截断，保留完整描述，工具理解最准确
+  #   50: 截断到 50 个字符，节省上下文（适合工具多的场景）
  #   200: 中等截断，保留大部分有用信息
-  description_max_length: 50
+  description_max_length: 0

  # 工具白名单 — 只保留指定名称的工具（不配则保留所有工具）
  # 💡 适合只用核心工具、排除大量不需要的 MCP 工具等场景
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -33,12 +33,12 @@ services:
      # - THINKING_ENABLED=true

      # ── 历史消息压缩 ──
-      # - COMPRESSION_ENABLED=true
-      # - COMPRESSION_LEVEL=2
+      # - COMPRESSION_ENABLED=false
+      # - COMPRESSION_LEVEL=1

      # ── 自动续写 & 历史消息限制 ──
-      # - MAX_AUTO_CONTINUE=3          # 截断后自动续写次数，0=禁用
-      # - MAX_HISTORY_MESSAGES=100     # 历史消息条数上限，-1=不限制
+      # - MAX_AUTO_CONTINUE=0          # 截断后自动续写次数，0=禁用(默认)
+      # - MAX_HISTORY_MESSAGES=-1      # 历史消息条数上限，-1=不限制

      # ── 日志持久化 ──
      # - LOG_FILE_ENABLED=true
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
 {
  "name": "cursor2api",
-  "version": "2.7.3",
+  "version": "2.7.4",
  "description": "Proxy Cursor docs AI to Anthropic Messages API for Claude Code",
  "type": "module",
  "scripts": {
--- a/public/logs.html
+++ b/public/logs.html
@@ -64,7 +64,7 @@
      <div class="tabs" id="tabs" style="display:none">
        <div class="tab a" data-tab="logs" onclick="setTab('logs',this)">📋 日志</div>
        <div class="tab" data-tab="request" onclick="setTab('request',this)">📥 请求参数</div>
-        <div class="tab" data-tab="prompts" onclick="setTab('prompts',this)">💬 提示词</div>
+        <div class="tab" data-tab="prompts" onclick="setTab('prompts',this)">💬 提示词对比</div>
        <div class="tab" data-tab="response" onclick="setTab('response',this)">📤 响应内容</div>
      </div>
      <div class="tab-content" id="tabContent">
--- a/public/logs.js
+++ b/public/logs.js
@@ -234,12 +234,40 @@ function renderRequestTab(tc){
 function renderPromptsTab(tc){
  if(!curPayload){tc.innerHTML='<div class="empty"><div class="ic">💬</div><p>暂无提示词数据</p></div>';return}
  let h='';
+  const s=selId?rmap[selId]:null;
+  // ===== 转换摘要 =====
+  if(s){
+    const origMsgCount=curPayload.messages?curPayload.messages.length:0;
+    const cursorMsgCount=curPayload.cursorMessages?curPayload.cursorMessages.length:0;
+    const origToolCount=s.toolCount||0;
+    const sysPLen=curPayload.systemPrompt?curPayload.systemPrompt.length:0;
+    const cursorTotalChars=curPayload.cursorRequest?.totalChars||0;
+    // 计算工具指令占用的字符数（第一条 cursor 消息 减去 原始第一条用户消息）
+    const firstCursorMsg=curPayload.cursorMessages?.[0];
+    const firstOrigUser=curPayload.messages?.find(m=>m.role==='user');
+    const toolInstructionChars=firstCursorMsg&&firstOrigUser?Math.max(0,firstCursorMsg.contentLength-(firstOrigUser?.contentLength||0)):0;
+    h+='<div class="content-section"><div class="cs-title">🔄 转换摘要</div>';
+    h+='<div class="sgrid" style="grid-template-columns:repeat(3,1fr);gap:8px;margin:8px 0">';
+    h+='<div class="si2"><span class="l">原始工具数</span><span class="v">'+origToolCount+'</span></div>';
+    h+='<div class="si2"><span class="l">Cursor 工具数</span><span class="v" style="color:var(--green)">0 <span style="font-size:10px;color:var(--t2)">(嵌入消息)</span></span></div>';
+    h+='<div class="si2"><span class="l">工具指令占用</span><span class="v">'+(toolInstructionChars>0?fmtN(toolInstructionChars)+' chars':origToolCount>0?'嵌入第1条消息':'N/A')+'</span></div>';
+    h+='<div class="si2"><span class="l">原始消息数</span><span class="v">'+origMsgCount+'</span></div>';
+    h+='<div class="si2"><span class="l">Cursor 消息数</span><span class="v" style="color:var(--green)">'+cursorMsgCount+'</span></div>';
+    h+='<div class="si2"><span class="l">总上下文大小</span><span class="v">'+(cursorTotalChars>0?fmtN(cursorTotalChars)+' chars':'—')+'</span></div>';
+    h+='</div>';
+    if(origToolCount>0){
+      h+='<div style="color:var(--yellow);font-size:12px;padding:6px 10px;background:rgba(234,179,8,0.1);border-radius:6px;margin-top:4px">⚠️ Cursor API 不支持原生 tools 参数。'+origToolCount+' 个工具定义已转换为文本指令，嵌入在 user #1 消息中'+(toolInstructionChars>0?'（约 '+fmtN(toolInstructionChars)+' chars）':'')+'</div>';
+    }
+    h+='</div>';
+  }
+  // ===== 原始请求 =====
+  h+='<div class="content-section"><div class="cs-title">📥 客户端原始请求</div></div>';
  if(curPayload.systemPrompt){
-    h+='<div class="content-section"><div class="cs-title">🔒 System Prompt <span class="cnt">'+fmtN(curPayload.systemPrompt.length)+' chars</span></div>';
-    h+='<div class="resp-box" style="max-height:600px;overflow-y:auto">'+escH(curPayload.systemPrompt)+'<button class="copy-btn" onclick="copyText(curPayload.systemPrompt)">复制</button></div></div>';
+    h+='<div class="content-section"><div class="cs-title">🔒 原始 System Prompt <span class="cnt">'+fmtN(curPayload.systemPrompt.length)+' chars</span></div>';
+    h+='<div class="resp-box" style="max-height:400px;overflow-y:auto;border-color:var(--orange)">'+escH(curPayload.systemPrompt)+'<button class="copy-btn" onclick="copyText(curPayload.systemPrompt)">复制</button></div></div>';
  }
  if(curPayload.messages&&curPayload.messages.length){
-    h+='<div class="content-section"><div class="cs-title">💬 消息列表 <span class="cnt">'+curPayload.messages.length+' 条</span></div>';
+    h+='<div class="content-section"><div class="cs-title">💬 原始消息列表 <span class="cnt">'+curPayload.messages.length+' 条</span></div>';
    curPayload.messages.forEach((m,i)=>{
      const imgs=m.hasImages?' 🖼️':'';
      const collapsed=m.contentPreview.length>500;
@@ -247,6 +275,19 @@ function renderPromptsTab(tc){
    });
    h+='</div>';
  }
+  // ===== 转换后 Cursor 请求 =====
+  if(curPayload.cursorMessages&&curPayload.cursorMessages.length){
+    h+='<div class="content-section" style="margin-top:24px;border-top:2px solid var(--green);padding-top:16px"><div class="cs-title">📤 Cursor 最终消息（转换后） <span class="cnt" style="background:var(--green);color:#fff">'+curPayload.cursorMessages.length+' 条</span></div>';
+    h+='<div style="color:var(--t2);font-size:12px;margin-bottom:8px">⬇️ 以下是清洗后实际发给 Cursor 模型的消息（已清除身份声明、注入工具指令、添加认知重构）</div>';
+    curPayload.cursorMessages.forEach((m,i)=>{
+      const collapsed=m.contentPreview.length>500;
+      h+='<div class="msg-item" style="border-left:3px solid var(--green)"><div class="msg-header" onclick="togMsg(this)"><span class="msg-role '+m.role+'">'+m.role+' #'+(i+1)+'</span><span class="msg-meta">'+fmtN(m.contentLength)+' chars '+(collapsed?'▶ 展开':'▼ 收起')+'</span></div><div class="msg-body" style="display:'+(collapsed?'none':'block')+';max-height:800px;overflow-y:auto">'+escH(m.contentPreview)+'</div></div>';
+    });
+    h+='</div>';
+  } else if(curPayload.cursorRequest) {
+    h+='<div class="content-section" style="margin-top:24px;border-top:2px solid var(--green);padding-top:16px"><div class="cs-title">📤 Cursor 最终请求（转换后）</div>';
+    h+='<div class="resp-box" style="border-color:var(--green)">'+syntaxHL(curPayload.cursorRequest)+'</div></div>';
+  }
  tc.innerHTML=h||'<div class="empty"><div class="ic">💬</div><p>暂无提示词数据</p></div>';
 }

--- a/src/config.ts
+++ b/src/config.ts
@@ -12,8 +12,8 @@ export function getConfig(): AppConfig {
        port: 3010,
        timeout: 120,
        cursorModel: 'anthropic/claude-sonnet-4.6',
-        maxAutoContinue: 2,
-        maxHistoryMessages: 100,
+        maxAutoContinue: 0,
+        maxHistoryMessages: -1,
        fingerprint: {
            userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/140.0.0.0 Safari/537.36',
        },
@@ -54,9 +54,9 @@ export function getConfig(): AppConfig {
                const c = yaml.compression;
                config.compression = {
                    enabled: c.enabled !== false, // 默认启用
-                    level: [1, 2, 3].includes(c.level) ? c.level : 2,
-                    keepRecent: typeof c.keep_recent === 'number' ? c.keep_recent : 6,
-                    earlyMsgMaxChars: typeof c.early_msg_max_chars === 'number' ? c.early_msg_max_chars : 2000,
+                    level: [1, 2, 3].includes(c.level) ? c.level : 1,
+                    keepRecent: typeof c.keep_recent === 'number' ? c.keep_recent : 10,
+                    earlyMsgMaxChars: typeof c.early_msg_max_chars === 'number' ? c.early_msg_max_chars : 4000,
                };
            }
            // ★ Thinking 开关（最高优先级）
@@ -78,8 +78,8 @@ export function getConfig(): AppConfig {
                const t = yaml.tools;
                const validModes = ['compact', 'full', 'names_only'];
                config.tools = {
-                    schemaMode: validModes.includes(t.schema_mode) ? t.schema_mode : 'compact',
-                    descriptionMaxLength: typeof t.description_max_length === 'number' ? t.description_max_length : 50,
+                    schemaMode: validModes.includes(t.schema_mode) ? t.schema_mode : 'full',
+                    descriptionMaxLength: typeof t.description_max_length === 'number' ? t.description_max_length : 0,
                    includeOnly: Array.isArray(t.include_only) ? t.include_only.map(String) : undefined,
                    exclude: Array.isArray(t.exclude) ? t.exclude.map(String) : undefined,
                };
@@ -101,11 +101,11 @@ export function getConfig(): AppConfig {
    }
    // 压缩环境变量覆盖
    if (process.env.COMPRESSION_ENABLED !== undefined) {
-        if (!config.compression) config.compression = { enabled: true, level: 2, keepRecent: 6, earlyMsgMaxChars: 2000 };
+        if (!config.compression) config.compression = { enabled: false, level: 1, keepRecent: 10, earlyMsgMaxChars: 4000 };
        config.compression.enabled = process.env.COMPRESSION_ENABLED !== 'false' && process.env.COMPRESSION_ENABLED !== '0';
    }
    if (process.env.COMPRESSION_LEVEL) {
-        if (!config.compression) config.compression = { enabled: true, level: 2, keepRecent: 6, earlyMsgMaxChars: 2000 };
+        if (!config.compression) config.compression = { enabled: false, level: 1, keepRecent: 10, earlyMsgMaxChars: 4000 };
        const lvl = parseInt(process.env.COMPRESSION_LEVEL);
        if (lvl >= 1 && lvl <= 3) config.compression.level = lvl as 1 | 2 | 3;
    }
--- a/src/converter.ts
+++ b/src/converter.ts
@@ -219,6 +219,9 @@ export async function convertToCursorRequest(req: AnthropicRequest): Promise<Cur
    // ★ 计费头清除：x-anthropic-billing-header 会被模型判定为恶意伪造并触发注入警告
    if (combinedSystem) {
        combinedSystem = combinedSystem.replace(/^x-anthropic-billing-header[^\n]*$/gim, '');
+        // ★ Claude Code 身份声明清除：模型看到 "You are Claude Code" 会认为是 prompt injection
+        combinedSystem = combinedSystem.replace(/^You are Claude Code[^\n]*$/gim, '');
+        combinedSystem = combinedSystem.replace(/^You are Claude,\s+Anthropic's[^\n]*$/gim, '');
        combinedSystem = combinedSystem.replace(/\n{3,}/g, '\n\n').trim();
    }
    // ★ Thinking 提示注入：根据是否有工具选择不同的注入位置
@@ -418,7 +421,7 @@ export async function convertToCursorRequest(req: AnthropicRequest): Promise<Cur
    // - 包含 json action 块的 assistant 消息 → 摘要替代（防止截断 JSON 导致解析错误）
    // - 工具结果消息 → 头尾保留（错误信息经常在末尾）
    // - 普通文本 → 在自然边界处截断
-    const compressionConfig = config.compression ?? { enabled: true, level: 2 as const, keepRecent: 6, earlyMsgMaxChars: 2000 };
+    const compressionConfig = config.compression ?? { enabled: false, level: 1 as const, keepRecent: 10, earlyMsgMaxChars: 4000 };
    if (compressionConfig.enabled) {
        // ★ 压缩级别参数映射：
        // Level 1（轻度）: 保留更多消息和更多字符
--- a/src/handler.ts
+++ b/src/handler.ts
@@ -696,7 +696,7 @@ export async function autoContinueCursorToolResponseStream(
    const MAX_AUTO_CONTINUE = getConfig().maxAutoContinue;
    let continueCount = 0;
    let consecutiveSmallAdds = 0;
-    const originalMessages = [...cursorReq.messages];
+

    while (MAX_AUTO_CONTINUE > 0 && shouldAutoContinueTruncatedToolResponse(fullResponse, hasTools) && continueCount < MAX_AUTO_CONTINUE) {
        continueCount++;
@@ -718,7 +718,9 @@ Continue EXACTLY from where you stopped. DO NOT repeat any content already gener
        const continuationReq: CursorChatRequest = {
            ...cursorReq,
            messages: [
-                ...originalMessages,
+                // ★ 续写优化：丢弃所有工具定义和历史消息，只保留续写上下文
+                // 模型已经知道在写什么（从 assistantContext 可以推断），不需要工具 Schema
+                // 这样大幅减少输入体积，给输出留更多空间，续写更快
                {
                    parts: [{ type: 'text', text: assistantContext }],
                    id: uuidv4(),
@@ -767,7 +769,6 @@ export async function autoContinueCursorToolResponseFull(
    const MAX_AUTO_CONTINUE = getConfig().maxAutoContinue;
    let continueCount = 0;
    let consecutiveSmallAdds = 0;
-    const originalMessages = [...cursorReq.messages];

    while (MAX_AUTO_CONTINUE > 0 && shouldAutoContinueTruncatedToolResponse(fullText, hasTools) && continueCount < MAX_AUTO_CONTINUE) {
        continueCount++;
@@ -789,7 +790,7 @@ Continue EXACTLY from where you stopped. DO NOT repeat any content already gener
        const continuationReq: CursorChatRequest = {
            ...cursorReq,
            messages: [
-                ...originalMessages,
+                // ★ 续写优化：丢弃所有工具定义和历史消息
                {
                    parts: [{ type: 'text', text: assistantContext }],
                    id: uuidv4(),
@@ -1171,7 +1172,7 @@ async function handleStream(res: Response, cursorReq: CursorChatRequest, body: A
    let activeCursorReq = cursorReq;
    let retryCount = 0;

-    const executeStream = async (detectRefusalEarly = false): Promise<{ earlyAborted: boolean }> => {
+    const executeStream = async (detectRefusalEarly = false, onTextDelta?: (delta: string) => void): Promise<{ earlyAborted: boolean }> => {
        fullResponse = '';
        const apiStart = Date.now();
        let firstChunk = true;
@@ -1186,6 +1187,7 @@ async function handleStream(res: Response, cursorReq: CursorChatRequest, body: A
                if (event.type !== 'text-delta' || !event.delta) return;
                if (firstChunk) { log.recordTTFT(); log.endPhase(); log.startPhase('response', '接收响应'); firstChunk = false; }
                fullResponse += event.delta;
+                onTextDelta?.(event.delta);

                // ★ 早期拒绝检测：前 300 字符即可判断
                if (detectRefusalEarly && !earlyAborted && fullResponse.length >= 200 && fullResponse.length < 600) {
@@ -1217,7 +1219,8 @@ async function handleStream(res: Response, cursorReq: CursorChatRequest, body: A
            return;
        }

-        // 工具模式：创建 keepalive（无工具路径已在 handleDirectTextStream 内部处理）
+        // ★ 工具模式：混合流式 — 文本增量推送 + 工具块缓冲
+        // 用户体验优化：工具调用前的文字立即逐字流式，不再等全部生成完毕
        keepaliveInterval = setInterval(() => {
            try {
                res.write(': keepalive\n\n');
@@ -1226,7 +1229,127 @@ async function handleStream(res: Response, cursorReq: CursorChatRequest, body: A
            } catch { /* connection already closed, ignore */ }
        }, 15000);

-        await executeStream(true);  // ★ 启用早期拒绝检测，节省 2-5s/次
+        // --- 混合流式状态 ---
+        const hybridStreamer = createIncrementalTextStreamer({
+            warmupChars: 300,   // ★ 与拒绝检测窗口对齐：前 300 chars 不释放，等拒绝检测通过后再流
+            transform: sanitizeResponse,
+            isBlockedPrefix: (text) => isRefusal(text.substring(0, 300)),
+        });
+        let toolMarkerDetected = false;
+        let pendingText = '';                           // 边界检测缓冲区
+        let hybridThinkingContent = '';
+        let hybridLeadingBuffer = '';
+        let hybridLeadingResolved = false;
+        const TOOL_MARKER = '```json action';
+        const MARKER_LOOKBACK = TOOL_MARKER.length + 2; // +2 for newline safety
+        let hybridTextSent = false;                     // 是否已经向客户端发过文字
+
+        const hybridState = { blockIndex, textBlockStarted, thinkingEmitted: thinkingBlockEmitted };
+
+        const pushToStreamer = (text: string): void => {
+            if (!text || toolMarkerDetected) return;
+
+            pendingText += text;
+            const idx = pendingText.indexOf(TOOL_MARKER);
+            if (idx >= 0) {
+                // 工具标记出现 → flush 标记前的文字，切换到缓冲模式
+                const before = pendingText.substring(0, idx);
+                if (before) {
+                    const d = hybridStreamer.push(before);
+                    if (d) {
+                        if (clientRequestedThinking && hybridThinkingContent && !hybridState.thinkingEmitted) {
+                            emitAnthropicThinkingBlock(res, hybridState, hybridThinkingContent);
+                        }
+                        writeAnthropicTextDelta(res, hybridState, d);
+                        hybridTextSent = true;
+                    }
+                }
+                toolMarkerDetected = true;
+                pendingText = '';
+                return;
+            }
+
+            // 安全刷出：保留末尾 MARKER_LOOKBACK 长度防止标记被截断
+            const safeEnd = pendingText.length - MARKER_LOOKBACK;
+            if (safeEnd > 0) {
+                const safe = pendingText.substring(0, safeEnd);
+                pendingText = pendingText.substring(safeEnd);
+                const d = hybridStreamer.push(safe);
+                if (d) {
+                    if (clientRequestedThinking && hybridThinkingContent && !hybridState.thinkingEmitted) {
+                        emitAnthropicThinkingBlock(res, hybridState, hybridThinkingContent);
+                    }
+                    writeAnthropicTextDelta(res, hybridState, d);
+                    hybridTextSent = true;
+                }
+            }
+        };
+
+        const processHybridDelta = (delta: string): void => {
+            // 前导 thinking 检测（与 handleDirectTextStream 完全一致）
+            if (!hybridLeadingResolved) {
+                hybridLeadingBuffer += delta;
+                const split = splitLeadingThinkingBlocks(hybridLeadingBuffer);
+                if (split.startedWithThinking) {
+                    if (!split.complete) return;
+                    hybridThinkingContent = split.thinkingContent;
+                    hybridLeadingResolved = true;
+                    hybridLeadingBuffer = '';
+                    pushToStreamer(split.remainder);
+                    return;
+                }
+                if (hybridLeadingBuffer.trimStart().length < THINKING_OPEN.length) return;
+                hybridLeadingResolved = true;
+                const buffered = hybridLeadingBuffer;
+                hybridLeadingBuffer = '';
+                pushToStreamer(buffered);
+                return;
+            }
+            pushToStreamer(delta);
+        };
+
+        // 执行第一次请求（带混合流式回调）
+        await executeStream(true, processHybridDelta);
+
+        // 流结束：flush 残留的 leading buffer
+        if (!hybridLeadingResolved && hybridLeadingBuffer) {
+            hybridLeadingResolved = true;
+            const split = splitLeadingThinkingBlocks(hybridLeadingBuffer);
+            if (split.startedWithThinking && split.complete) {
+                hybridThinkingContent = split.thinkingContent;
+                pushToStreamer(split.remainder);
+            } else {
+                pushToStreamer(hybridLeadingBuffer);
+            }
+        }
+        // flush 残留的 pendingText（没有检测到工具标记）
+        if (pendingText && !toolMarkerDetected) {
+            const d = hybridStreamer.push(pendingText);
+            if (d) {
+                if (clientRequestedThinking && hybridThinkingContent && !hybridState.thinkingEmitted) {
+                    emitAnthropicThinkingBlock(res, hybridState, hybridThinkingContent);
+                }
+                writeAnthropicTextDelta(res, hybridState, d);
+                hybridTextSent = true;
+            }
+            pendingText = '';
+        }
+        // finalize streamer 残留文本
+        const hybridRemaining = hybridStreamer.finish();
+        if (hybridRemaining) {
+            if (clientRequestedThinking && hybridThinkingContent && !hybridState.thinkingEmitted) {
+                emitAnthropicThinkingBlock(res, hybridState, hybridThinkingContent);
+            }
+            writeAnthropicTextDelta(res, hybridState, hybridRemaining);
+            hybridTextSent = true;
+        }
+        // 同步混合流式状态回主变量
+        blockIndex = hybridState.blockIndex;
+        textBlockStarted = hybridState.textBlockStarted;
+        thinkingBlockEmitted = hybridState.thinkingEmitted;
+        // ★ 混合流式标记：记录已通过增量流发送给客户端的状态
+        // 后续 SSE 输出阶段根据此标记跳过已发送的文字
+        const hybridAlreadySentText = hybridTextSent;

        log.recordRawResponse(fullResponse);
        log.info('Handler', 'response', `原始响应: ${fullResponse.length} chars`, {
@@ -1235,12 +1358,12 @@ async function handleStream(res: Response, cursorReq: CursorChatRequest, body: A
        });

        // ★ Thinking 提取（在拒绝检测之前，防止 thinking 内容触发 isRefusal 误判）
-        // 始终剥离 thinking 标签，避免泄漏到最终文本中
-        let thinkingContent = '';
+        // 混合流式阶段可能已经提取了 thinking，优先使用
+        let thinkingContent = hybridThinkingContent || '';
        if (fullResponse.includes('<thinking>')) {
            const { thinkingContent: extracted, strippedText } = extractThinking(fullResponse);
            if (extracted) {
-                thinkingContent = extracted;
+                if (!thinkingContent) thinkingContent = extracted;
                fullResponse = strippedText;
                log.recordThinking(thinkingContent);
                log.updateSummary({ thinkingChars: thinkingContent.length });
@@ -1253,8 +1376,10 @@ async function handleStream(res: Response, cursorReq: CursorChatRequest, body: A
        }

        // 拒绝检测 + 自动重试
-        // fullResponse 已在上方剥离 thinking 标签，可直接用于拒绝检测
+        // ★ 混合流式保护：如果已经向客户端发送了文字，不能重试（会导致内容重复）
+        // IncrementalTextStreamer 的 isBlockedPrefix 机制保证拒绝一定在发送任何文字之前被检测到
        const shouldRetryRefusal = () => {
+            if (hybridTextSent) return false;  // 已发文字，不可重试
            if (!isRefusal(fullResponse)) return false;
            if (hasTools && hasToolCalls(fullResponse)) return false;
            return true;
@@ -1266,7 +1391,7 @@ async function handleStream(res: Response, cursorReq: CursorChatRequest, body: A
            log.updateSummary({ retryCount });
            const retryBody = buildRetryRequest(body, retryCount - 1);
            activeCursorReq = await convertToCursorRequest(retryBody);
-            await executeStream(true);  // 重试也启用早期中止
+            await executeStream(true);  // 重试不传回调（纯缓冲模式）
            // 重试后也需要剥离 thinking 标签
            if (fullResponse.includes('<thinking>')) {
                const { thinkingContent: retryThinking, strippedText: retryStripped } = extractThinking(fullResponse);
@@ -1309,12 +1434,10 @@ async function handleStream(res: Response, cursorReq: CursorChatRequest, body: A
        // 流完成后，处理完整响应
        // ★ 内部截断续写：如果模型输出过长被截断（常见于写大文件），Proxy 内部分段续写，然后拼接成完整响应
        // 这样可以确保工具调用（如 Write）不会横跨两次 API 响应而退化为纯文本
-        const MAX_AUTO_CONTINUE = getConfig().maxAutoContinue ?? 2; // Set default to 2
+        const MAX_AUTO_CONTINUE = getConfig().maxAutoContinue ?? 0;
        let continueCount = 0;
        let consecutiveSmallAdds = 0; // 连续小增量计数
-        
-        // 保存原始请求的消息快照（不含续写追加的消息）
-        const originalMessages = [...activeCursorReq.messages];
+
        
        while (MAX_AUTO_CONTINUE > 0 && shouldAutoContinueTruncatedToolResponse(fullResponse, hasTools) && continueCount < MAX_AUTO_CONTINUE) {
            continueCount++;
@@ -1343,7 +1466,7 @@ Continue EXACTLY from where you stopped. DO NOT repeat any content already gener
            activeCursorReq = {
                ...activeCursorReq,
                messages: [
-                    ...originalMessages,
+                    // ★ 续写优化：丢弃所有工具定义和历史消息
                    {
                        parts: [{ type: 'text', text: assistantContext }],
                        id: uuidv4(),
@@ -1407,10 +1530,10 @@ Continue EXACTLY from where you stopped. DO NOT repeat any content already gener
            log.warn('Handler', 'truncation', `${MAX_AUTO_CONTINUE}次续写后仍截断 (${fullResponse.length} chars) → stop_reason=max_tokens`);
        }

-        // ★ Thinking 块发送：仅 GUI 插件（enabled）才发 thinking content block
-        // Claude Code（adaptive）需要密码学 signature 验证，无法伪造，所以保留标签在正文中
+        // ★ Thinking 块发送：仅在混合流式未发送 thinking 时才在此发送
+        // 混合流式阶段已通过 emitAnthropicThinkingBlock 发送过的不重复发
        log.startPhase('stream', 'SSE 输出');
-        if (clientRequestedThinking && thinkingContent) {
+        if (clientRequestedThinking && thinkingContent && !thinkingBlockEmitted) {
            writeSSE(res, 'content_block_start', {
                type: 'content_block_start', index: blockIndex,
                content_block: { type: 'thinking', thinking: '' },
@@ -1426,6 +1549,32 @@ Continue EXACTLY from where you stopped. DO NOT repeat any content already gener
        }

        if (hasTools) {
+            // ★ 截断保护：如果响应被截断，不要解析不完整的工具调用
+            // 直接作为纯文本返回 max_tokens，让客户端自行处理续写
+            if (stopReason === 'max_tokens') {
+                log.info('Handler', 'truncation', '响应截断，跳过工具解析，作为纯文本返回 max_tokens');
+                // 去掉不完整的 ```json action 块
+                const incompleteToolIdx = fullResponse.lastIndexOf('```json action');
+                const textOnly = incompleteToolIdx >= 0 ? fullResponse.substring(0, incompleteToolIdx).trimEnd() : fullResponse;
+                
+                // 发送纯文本
+                if (!hybridAlreadySentText) {
+                    const unsentText = textOnly.substring(sentText.length);
+                    if (unsentText) {
+                        if (!textBlockStarted) {
+                            writeSSE(res, 'content_block_start', {
+                                type: 'content_block_start', index: blockIndex,
+                                content_block: { type: 'text', text: '' },
+                            });
+                            textBlockStarted = true;
+                        }
+                        writeSSE(res, 'content_block_delta', {
+                            type: 'content_block_delta', index: blockIndex,
+                            delta: { type: 'text_delta', text: unsentText },
+                        });
+                    }
+                }
+            } else {
            let { toolCalls, cleanText } = parseToolCalls(fullResponse);

            // ★ tool_choice=any 强制重试：如果模型没有输出任何工具调用块，追加强制消息重试
@@ -1475,20 +1624,23 @@ Continue EXACTLY from where you stopped. DO NOT repeat any content already gener
                }

                // Any clean text is sent as a single block before the tool blocks
-                const unsentCleanText = cleanText.substring(sentText.length).trim();
+                // ★ 如果混合流式已经发送了文字，跳过重复发送
+                if (!hybridAlreadySentText) {
+                    const unsentCleanText = cleanText.substring(sentText.length).trim();

-                if (unsentCleanText) {
-                    if (!textBlockStarted) {
-                        writeSSE(res, 'content_block_start', {
-                            type: 'content_block_start', index: blockIndex,
-                            content_block: { type: 'text', text: '' },
+                    if (unsentCleanText) {
+                        if (!textBlockStarted) {
+                            writeSSE(res, 'content_block_start', {
+                                type: 'content_block_start', index: blockIndex,
+                                content_block: { type: 'text', text: '' },
+                            });
+                            textBlockStarted = true;
+                        }
+                        writeSSE(res, 'content_block_delta', {
+                            type: 'content_block_delta', index: blockIndex,
+                            delta: { type: 'text_delta', text: (sentText && !sentText.endsWith('\n') ? '\n' : '') + unsentCleanText }
                        });
-                        textBlockStarted = true;
                    }
-                    writeSSE(res, 'content_block_delta', {
-                        type: 'content_block_delta', index: blockIndex,
-                        delta: { type: 'text_delta', text: (sentText && !sentText.endsWith('\n') ? '\n' : '') + unsentCleanText }
-                    });
                }

                if (textBlockStarted) {
@@ -1526,34 +1678,38 @@ Continue EXACTLY from where you stopped. DO NOT repeat any content already gener
            } else {
                // False alarm! The tool triggers were just normal text. 
                // We must send the remaining unsent fullResponse.
-                let textToSend = fullResponse;
+                // ★ 如果混合流式已发送部分文字，只发送未发送的部分
+                if (!hybridAlreadySentText) {
+                    let textToSend = fullResponse;

-                // ★ 仅对短响应或开头明确匹配拒绝模式的响应进行压制
-                // fullResponse 已被剥离 thinking 标签
-                const isShortResponse = fullResponse.trim().length < 500;
-                const startsWithRefusal = isRefusal(fullResponse.substring(0, 300));
-                const isActualRefusal = stopReason !== 'max_tokens' && (isShortResponse ? isRefusal(fullResponse) : startsWithRefusal);
+                    // ★ 仅对短响应或开头明确匹配拒绝模式的响应进行压制
+                    // fullResponse 已被剥离 thinking 标签
+                    const isShortResponse = fullResponse.trim().length < 500;
+                    const startsWithRefusal = isRefusal(fullResponse.substring(0, 300));
+                    const isActualRefusal = stopReason !== 'max_tokens' && (isShortResponse ? isRefusal(fullResponse) : startsWithRefusal);

-                if (isActualRefusal) {
-                    log.info('Handler', 'sanitize', `抑制无工具的完整拒绝响应`, { preview: fullResponse.substring(0, 200) });
-                    textToSend = 'I understand the request. Let me proceed with the appropriate action. Could you clarify what specific task you would like me to perform?';
-                }
-
-                const unsentText = textToSend.substring(sentText.length);
-                if (unsentText) {
-                    if (!textBlockStarted) {
-                        writeSSE(res, 'content_block_start', {
-                            type: 'content_block_start', index: blockIndex,
-                            content_block: { type: 'text', text: '' },
-                        });
-                        textBlockStarted = true;
+                    if (isActualRefusal) {
+                        log.info('Handler', 'sanitize', `抑制无工具的完整拒绝响应`, { preview: fullResponse.substring(0, 200) });
+                        textToSend = 'I understand the request. Let me proceed with the appropriate action. Could you clarify what specific task you would like me to perform?';
+                    }
+
+                    const unsentText = textToSend.substring(sentText.length);
+                    if (unsentText) {
+                        if (!textBlockStarted) {
+                            writeSSE(res, 'content_block_start', {
+                                type: 'content_block_start', index: blockIndex,
+                                content_block: { type: 'text', text: '' },
+                            });
+                            textBlockStarted = true;
+                        }
+                        writeSSE(res, 'content_block_delta', {
+                            type: 'content_block_delta', index: blockIndex,
+                            delta: { type: 'text_delta', text: unsentText },
+                        });
                    }
-                    writeSSE(res, 'content_block_delta', {
-                        type: 'content_block_delta', index: blockIndex,
-                        delta: { type: 'text_delta', text: unsentText },
-                    });
                }
            }
+            } // end else (non-truncated tool parsing)
        } else {
            // 无工具模式 — 缓冲后统一发送（已经过拒绝检测+重试）
            // 最后一道防线：清洗所有 Cursor 身份引用
@@ -1708,7 +1864,6 @@ async function handleNonStream(res: Response, cursorReq: CursorChatRequest, body
    const MAX_AUTO_CONTINUE = getConfig().maxAutoContinue;
    let continueCount = 0;
    let consecutiveSmallAdds = 0; // 连续小增量计数
-    const originalMessages = [...activeCursorReq.messages];

    while (MAX_AUTO_CONTINUE > 0 && shouldAutoContinueTruncatedToolResponse(fullText, hasTools) && continueCount < MAX_AUTO_CONTINUE) {
        continueCount++;
@@ -1730,9 +1885,9 @@ Continue EXACTLY from where you stopped. DO NOT repeat any content already gener
        const continuationReq: CursorChatRequest = {
            ...activeCursorReq,
            messages: [
-                ...originalMessages,
+                // ★ 续写优化：丢弃所有工具定义和历史消息
                {
-                    parts: [{ type: 'text', text: fullText }],
+                    parts: [{ type: 'text', text: fullText.length > 2000 ? '...\n' + fullText.slice(-2000) : fullText }],
                    id: uuidv4(),
                    role: 'assistant',
                },
--- a/src/index.ts
+++ b/src/index.ts
@@ -154,7 +154,7 @@ app.listen(config.port, () => {
    
    // Tools 配置摘要
    const toolsCfg = config.tools;
-    let toolsInfo = 'default (compact, desc≤50)';
+    let toolsInfo = 'default (full, desc=full)';
    if (toolsCfg) {
        const parts: string[] = [];
        parts.push(`schema=${toolsCfg.schemaMode}`);
--- a/src/logger.ts
+++ b/src/logger.ts
@@ -466,10 +466,11 @@ export class RequestLogger {
                        .map((c: any) => c.text || '')
                        .join(' ');
                }
-                // 去掉 <system-reminder>...</system-reminder> 注入内容
-                text = text.replace(/<system-reminder>[\s\S]*?<\/system-reminder>/gi, '');
-                // 去掉 Claude Code 尾部的 "First, think step by step..." 引导语
+                // 去掉 <system-reminder>...</system-reminder> 等 XML 注入内容
+                text = text.replace(/<[a-zA-Z_-]+>[\s\S]*?<\/[a-zA-Z_-]+>/gi, '');
+                // 去掉 Claude Code 尾部的引导语
                text = text.replace(/First,\s*think\s+step\s+by\s+step[\s\S]*$/i, '');
+                text = text.replace(/Respond with the appropriate action[\s\S]*$/i, '');
                // 清理换行、多余空格
                text = text.replace(/\s+/g, ' ').trim();
                this.summary.title = text.length > 80 ? text.substring(0, 77) + '...' : text;
--- a/src/openai-handler.ts
+++ b/src/openai-handler.ts
@@ -779,11 +779,12 @@ async function handleOpenAIStream(
    let retryCount = 0;

    // 统一缓冲模式：先缓冲全部响应，再检测拒绝和处理
-    const executeStream = async () => {
+    const executeStream = async (onTextDelta?: (delta: string) => void) => {
        fullResponse = '';
        await sendCursorRequest(activeCursorReq, (event: CursorSSEEvent) => {
            if (event.type !== 'text-delta' || !event.delta) return;
            fullResponse += event.delta;
+            onTextDelta?.(event.delta);
        });
    };

@@ -793,26 +794,132 @@ async function handleOpenAIStream(
            return;
        }

-        await executeStream();
+        // ★ 混合流式：文本增量 + 工具缓冲（与 Anthropic handler 同一设计）
+        const thinkingEnabled = anthropicReq.thinking?.type === 'enabled';
+        const hybridStreamer = createIncrementalTextStreamer({
+            warmupChars: 300,   // ★ 与拒绝检测窗口对齐
+            transform: sanitizeResponse,
+            isBlockedPrefix: (text) => isRefusal(text.substring(0, 300)),
+        });
+        let toolMarkerDetected = false;
+        let pendingText = '';
+        let hybridThinkingContent = '';
+        let hybridLeadingBuffer = '';
+        let hybridLeadingResolved = false;
+        const TOOL_MARKER = '```json action';
+        const MARKER_LOOKBACK = TOOL_MARKER.length + 2;
+        let hybridTextSent = false;
+        let hybridReasoningSent = false;

-        // 日志记录在详细日志中 (Web UI 可见)
+        const pushToStreamer = (text: string): void => {
+            if (!text || toolMarkerDetected) return;
+            pendingText += text;
+            const idx = pendingText.indexOf(TOOL_MARKER);
+            if (idx >= 0) {
+                const before = pendingText.substring(0, idx);
+                if (before) {
+                    const d = hybridStreamer.push(before);
+                    if (d) {
+                        if (thinkingEnabled && hybridThinkingContent && !hybridReasoningSent) {
+                            writeOpenAIReasoningDelta(res, id, created, model, hybridThinkingContent);
+                            hybridReasoningSent = true;
+                        }
+                        writeOpenAITextDelta(res, id, created, model, d);
+                        hybridTextSent = true;
+                    }
+                }
+                toolMarkerDetected = true;
+                pendingText = '';
+                return;
+            }
+            const safeEnd = pendingText.length - MARKER_LOOKBACK;
+            if (safeEnd > 0) {
+                const safe = pendingText.substring(0, safeEnd);
+                pendingText = pendingText.substring(safeEnd);
+                const d = hybridStreamer.push(safe);
+                if (d) {
+                    if (thinkingEnabled && hybridThinkingContent && !hybridReasoningSent) {
+                        writeOpenAIReasoningDelta(res, id, created, model, hybridThinkingContent);
+                        hybridReasoningSent = true;
+                    }
+                    writeOpenAITextDelta(res, id, created, model, d);
+                    hybridTextSent = true;
+                }
+            }
+        };
+
+        const processHybridDelta = (delta: string): void => {
+            if (!hybridLeadingResolved) {
+                hybridLeadingBuffer += delta;
+                const split = splitLeadingThinkingBlocks(hybridLeadingBuffer);
+                if (split.startedWithThinking) {
+                    if (!split.complete) return;
+                    hybridThinkingContent = split.thinkingContent;
+                    hybridLeadingResolved = true;
+                    hybridLeadingBuffer = '';
+                    pushToStreamer(split.remainder);
+                    return;
+                }
+                if (hybridLeadingBuffer.trimStart().length < 10) return;
+                hybridLeadingResolved = true;
+                const buffered = hybridLeadingBuffer;
+                hybridLeadingBuffer = '';
+                pushToStreamer(buffered);
+                return;
+            }
+            pushToStreamer(delta);
+        };
+
+        await executeStream(processHybridDelta);
+
+        // flush 残留缓冲
+        if (!hybridLeadingResolved && hybridLeadingBuffer) {
+            hybridLeadingResolved = true;
+            const split = splitLeadingThinkingBlocks(hybridLeadingBuffer);
+            if (split.startedWithThinking && split.complete) {
+                hybridThinkingContent = split.thinkingContent;
+                pushToStreamer(split.remainder);
+            } else {
+                pushToStreamer(hybridLeadingBuffer);
+            }
+        }
+        if (pendingText && !toolMarkerDetected) {
+            const d = hybridStreamer.push(pendingText);
+            if (d) {
+                if (thinkingEnabled && hybridThinkingContent && !hybridReasoningSent) {
+                    writeOpenAIReasoningDelta(res, id, created, model, hybridThinkingContent);
+                    hybridReasoningSent = true;
+                }
+                writeOpenAITextDelta(res, id, created, model, d);
+                hybridTextSent = true;
+            }
+            pendingText = '';
+        }
+        const hybridRemaining = hybridStreamer.finish();
+        if (hybridRemaining) {
+            if (thinkingEnabled && hybridThinkingContent && !hybridReasoningSent) {
+                writeOpenAIReasoningDelta(res, id, created, model, hybridThinkingContent);
+                hybridReasoningSent = true;
+            }
+            writeOpenAITextDelta(res, id, created, model, hybridRemaining);
+            hybridTextSent = true;
+        }

        // ★ Thinking 提取（在拒绝检测之前）
-        const thinkingEnabled = anthropicReq.thinking?.type === 'enabled';
-        let reasoningContent: string | undefined;
+        let reasoningContent: string | undefined = hybridThinkingContent || undefined;
        if (fullResponse.includes('<thinking>')) {
            const { thinkingContent: extracted, strippedText } = extractThinking(fullResponse);
            if (extracted) {
-                if (thinkingEnabled) {
+                if (thinkingEnabled && !reasoningContent) {
                    reasoningContent = extracted;
                }
                fullResponse = strippedText;
-                // thinking 剥离记录在详细日志中
            }
        }

-        // 拒绝检测 + 自动重试（工具模式和非工具模式均生效）
+        // 拒绝检测 + 自动重试
        const shouldRetryRefusal = () => {
+            if (hybridTextSent) return false;  // 已发文字，不可重试
            if (!isRefusal(fullResponse)) return false;
            if (hasTools && hasToolCalls(fullResponse)) return false;
            return true;
@@ -820,22 +927,18 @@ async function handleOpenAIStream(

        while (shouldRetryRefusal() && retryCount < MAX_REFUSAL_RETRIES) {
            retryCount++;
-            // 重试记录在详细日志中
            const retryBody = buildRetryRequest(anthropicReq, retryCount - 1);
            activeCursorReq = await convertToCursorRequest(retryBody);
-            await executeStream();
+            await executeStream();  // 重试不传回调
        }
        if (shouldRetryRefusal()) {
            if (!hasTools) {
                if (isToolCapabilityQuestion(anthropicReq)) {
-                    // 记录在详细日志
                    fullResponse = CLAUDE_TOOLS_RESPONSE;
                } else {
-                    // 记录在详细日志
                    fullResponse = CLAUDE_IDENTITY_RESPONSE;
                }
            } else {
-                // 记录在详细日志
                fullResponse = 'I understand the request. Let me analyze the information and proceed with the appropriate action.';
            }
        }
@@ -843,7 +946,6 @@ async function handleOpenAIStream(
        // 极短响应重试
        if (hasTools && fullResponse.trim().length < 10 && retryCount < MAX_REFUSAL_RETRIES) {
            retryCount++;
-            // 记录在详细日志
            activeCursorReq = await convertToCursorRequest(anthropicReq);
            await executeStream();
        }
@@ -854,8 +956,8 @@ async function handleOpenAIStream(

        let finishReason: 'stop' | 'tool_calls' = 'stop';

-        // ★ 发送 reasoning_content（如果有）
-        if (reasoningContent) {
+        // ★ 发送 reasoning_content（仅在混合流式未发送时）
+        if (reasoningContent && !hybridReasoningSent) {
            writeOpenAISSE(res, {
                id, object: 'chat.completion.chunk', created, model,
                choices: [{
@@ -872,18 +974,20 @@ async function handleOpenAIStream(
            if (toolCalls.length > 0) {
                finishReason = 'tool_calls';

-                // 发送工具调用前的残余文本（清洗后）
-                let cleanOutput = isRefusal(cleanText) ? '' : cleanText;
-                cleanOutput = sanitizeResponse(cleanOutput);
-                if (cleanOutput) {
-                    writeOpenAISSE(res, {
-                        id, object: 'chat.completion.chunk', created, model,
-                        choices: [{
-                            index: 0,
-                            delta: { content: cleanOutput },
-                            finish_reason: null,
-                        }],
-                    });
+                // 发送工具调用前的残余文本 — 如果混合流式已发送则跳过
+                if (!hybridTextSent) {
+                    let cleanOutput = isRefusal(cleanText) ? '' : cleanText;
+                    cleanOutput = sanitizeResponse(cleanOutput);
+                    if (cleanOutput) {
+                        writeOpenAISSE(res, {
+                            id, object: 'chat.completion.chunk', created, model,
+                            choices: [{
+                                index: 0,
+                                delta: { content: cleanOutput },
+                                finish_reason: null,
+                            }],
+                        });
+                    }
                }

                // 增量流式发送工具调用：先发 name+id，再分块发 arguments
@@ -929,38 +1033,42 @@ async function handleOpenAIStream(
                    }
                }
            } else {
-                // 误报：发送清洗后的文本
-                let textToSend = fullResponse;
-                if (isRefusal(fullResponse)) {
-                    textToSend = 'I understand the request. Let me proceed with the appropriate action. Could you clarify what specific task you would like me to perform?';
-                } else {
-                    textToSend = sanitizeResponse(fullResponse);
+                // 误报：发送清洗后的文本（如果混合流式未发送）
+                if (!hybridTextSent) {
+                    let textToSend = fullResponse;
+                    if (isRefusal(fullResponse)) {
+                        textToSend = 'I understand the request. Let me proceed with the appropriate action. Could you clarify what specific task you would like me to perform?';
+                    } else {
+                        textToSend = sanitizeResponse(fullResponse);
+                    }
+                    writeOpenAISSE(res, {
+                        id, object: 'chat.completion.chunk', created, model,
+                        choices: [{
+                            index: 0,
+                            delta: { content: textToSend },
+                            finish_reason: null,
+                        }],
+                    });
                }
-                writeOpenAISSE(res, {
-                    id, object: 'chat.completion.chunk', created, model,
-                    choices: [{
-                        index: 0,
-                        delta: { content: textToSend },
-                        finish_reason: null,
-                    }],
-                });
            }
        } else {
-            // 无工具模式或无工具调用 — 统一清洗后发送
-            let sanitized = sanitizeResponse(fullResponse);
-            // ★ response_format 后处理：剥离 markdown 代码块包裹
-            if (body.response_format && body.response_format.type !== 'text') {
-                sanitized = stripMarkdownJsonWrapper(sanitized);
-            }
-            if (sanitized) {
-                writeOpenAISSE(res, {
-                    id, object: 'chat.completion.chunk', created, model,
-                    choices: [{
-                        index: 0,
-                        delta: { content: sanitized },
-                        finish_reason: null,
-                    }],
-                });
+            // 无工具模式或无工具调用 — 如果混合流式未发送则统一清洗后发送
+            if (!hybridTextSent) {
+                let sanitized = sanitizeResponse(fullResponse);
+                // ★ response_format 后处理：剥离 markdown 代码块包裹
+                if (body.response_format && body.response_format.type !== 'text') {
+                    sanitized = stripMarkdownJsonWrapper(sanitized);
+                }
+                if (sanitized) {
+                    writeOpenAISSE(res, {
+                        id, object: 'chat.completion.chunk', created, model,
+                        choices: [{
+                            index: 0,
+                            delta: { content: sanitized },
+                            finish_reason: null,
+                        }],
+                    });
+                }
            }
        }

--- a/test/test-hybrid-stream.mjs
+++ b/test/test-hybrid-stream.mjs
@@ -0,0 +1,216 @@
+/**
+ * 混合流式完整性测试
+ * 验证：
+ *   1. 文字增量流式 ✓
+ *   2. 工具调用参数完整 ✓
+ *   3. 多工具调用 ✓
+ *   4. 纯文字（无工具调用）✓
+ *   5. stop_reason 正确 ✓
+ */
+
+import http from 'http';
+
+const BASE = process.env.BASE_URL || 'http://localhost:3010';
+const url = new URL(BASE);
+
+function runAnthropicTest(name, body, timeout = 60000) {
+    return new Promise((resolve, reject) => {
+        const timer = setTimeout(() => { reject(new Error('超时 ' + timeout + 'ms')); }, timeout);
+        const data = JSON.stringify(body);
+        const req = http.request({
+            hostname: url.hostname, port: url.port, path: '/v1/messages', method: 'POST',
+            headers: {
+                'Content-Type': 'application/json', 'x-api-key': 'test',
+                'anthropic-version': '2023-06-01', 'Content-Length': Buffer.byteLength(data),
+            },
+        }, (res) => {
+            const start = Date.now();
+            let events = [];
+            let buf = '';
+
+            res.on('data', (chunk) => {
+                buf += chunk.toString();
+                const lines = buf.split('\n');
+                buf = lines.pop(); // keep incomplete last line
+                for (const line of lines) {
+                    if (!line.startsWith('data: ')) continue;
+                    const payload = line.slice(6).trim();
+                    if (payload === '[DONE]') continue;
+                    try {
+                        const ev = JSON.parse(payload);
+                        events.push({ ...ev, _ts: Date.now() - start });
+                    } catch { /* skip */ }
+                }
+            });
+
+            res.on('end', () => {
+                clearTimeout(timer);
+                // 解析结果
+                const textDeltas = events.filter(e => e.type === 'content_block_delta' && e.delta?.type === 'text_delta');
+                const toolStarts = events.filter(e => e.type === 'content_block_start' && e.content_block?.type === 'tool_use');
+                const toolInputDeltas = events.filter(e => e.type === 'content_block_delta' && e.delta?.type === 'input_json_delta');
+                const msgDelta = events.find(e => e.type === 'message_delta');
+                const msgStop = events.find(e => e.type === 'message_stop');
+
+                const fullText = textDeltas.map(e => e.delta.text).join('');
+                const tools = toolStarts.map(ts => {
+                    // 收集该工具的 input JSON
+                    const inputChunks = toolInputDeltas
+                        .filter(d => d.index === ts.index)
+                        .map(d => d.delta.partial_json);
+                    let parsedInput = null;
+                    try { parsedInput = JSON.parse(inputChunks.join('')); } catch { }
+                    return {
+                        name: ts.content_block.name,
+                        id: ts.content_block.id,
+                        input: parsedInput,
+                        inputRaw: inputChunks.join(''),
+                    };
+                });
+
+                resolve({
+                    name,
+                    textChunks: textDeltas.length,
+                    textLength: fullText.length,
+                    textPreview: fullText.substring(0, 120).replace(/\n/g, '\\n'),
+                    tools,
+                    stopReason: msgDelta?.delta?.stop_reason || '?',
+                    firstTextMs: textDeltas[0]?._ts ?? -1,
+                    firstToolMs: toolStarts[0]?._ts ?? -1,
+                    doneMs: msgStop?._ts ?? -1,
+                });
+            });
+            res.on('error', (err) => { clearTimeout(timer); reject(err); });
+        });
+        req.on('error', (err) => { clearTimeout(timer); reject(err); });
+        req.write(data);
+        req.end();
+    });
+}
+
+function printResult(r) {
+    console.log(`\n  📊 ${r.name}`);
+    console.log(`     时间: 首字=${r.firstTextMs}ms  首工具=${r.firstToolMs}ms  完成=${r.doneMs}ms`);
+    console.log(`     文字: ${r.textChunks} chunks, ${r.textLength} chars`);
+    if (r.textPreview) console.log(`     预览: "${r.textPreview}"`);
+    console.log(`     stop_reason: ${r.stopReason}`);
+    if (r.tools.length > 0) {
+        console.log(`     工具调用 (${r.tools.length}个):`);
+        for (const t of r.tools) {
+            console.log(`       - ${t.name}(${JSON.stringify(t.input)})`);
+            if (!t.input) console.log(`         ⚠️ 参数解析失败! raw: ${t.inputRaw?.substring(0, 100)}`);
+        }
+    }
+}
+
+const TOOLS = [
+    { name: 'Read', description: 'Read file contents', input_schema: { type: 'object', properties: { file_path: { type: 'string' } }, required: ['file_path'] } },
+    { name: 'Write', description: 'Write file contents', input_schema: { type: 'object', properties: { file_path: { type: 'string' }, content: { type: 'string' } }, required: ['file_path', 'content'] } },
+    { name: 'Bash', description: 'Run bash command', input_schema: { type: 'object', properties: { command: { type: 'string' } }, required: ['command'] } },
+];
+
+async function main() {
+    console.log('\n  ⚡ 混合流式完整性测试\n');
+
+    // 健康检查
+    try {
+        await new Promise((resolve, reject) => {
+            const req = http.get(`${BASE}/health`, r => { r.resume(); r.on('end', resolve); });
+            req.on('error', reject);
+            req.setTimeout(3000, () => { req.destroy(); reject(new Error('timeout')); });
+        });
+    } catch {
+        console.log('  ❌ 服务器未运行\n');
+        process.exit(1);
+    }
+    console.log('  ✅ 服务器在线');
+
+    let passed = 0;
+    let failed = 0;
+
+    // ---- Test 1: 单工具调用 ----
+    console.log('\n  ━━━ ① 单工具调用 ━━━');
+    try {
+        const r = await runAnthropicTest('单工具调用', {
+            model: 'claude-3-5-sonnet-20241022', max_tokens: 1024, stream: true,
+            system: 'You are a coding assistant with file system access. When a user asks to read a file, use the Read tool immediately. Do not refuse or explain limitations.',
+            tools: TOOLS,
+            messages: [{ role: 'user', content: 'Read the file /src/index.ts for me please' }],
+        });
+        printResult(r);
+
+        // 验证
+        const checks = [];
+        checks.push({ name: 'stop_reason=tool_use', ok: r.stopReason === 'tool_use' });
+        checks.push({ name: '至少1个工具调用', ok: r.tools.length >= 1 });
+        checks.push({ name: '工具名=Read', ok: r.tools.some(t => t.name === 'Read') });
+        checks.push({ name: '工具参数有 file_path', ok: r.tools.some(t => t.input?.file_path) });
+        checks.push({ name: '首字延迟<10s', ok: r.firstTextMs >= 0 && r.firstTextMs < 10000 });
+
+        for (const c of checks) {
+            console.log(`     ${c.ok ? '✅' : '❌'} ${c.name}`);
+            c.ok ? passed++ : failed++;
+        }
+    } catch (err) {
+        console.log(`  ❌ 失败: ${err.message}`);
+        failed++;
+    }
+
+    // ---- Test 2: 多工具调用 ----
+    console.log('\n  ━━━ ② 多工具调用 ━━━');
+    try {
+        const r = await runAnthropicTest('多工具调用', {
+            model: 'claude-3-5-sonnet-20241022', max_tokens: 2048, stream: true,
+            system: 'You are a coding assistant with file system access. When asked to read multiple files, use multiple Read tool calls in a single response. Do not refuse.',
+            tools: TOOLS,
+            messages: [{ role: 'user', content: 'Read both /src/index.ts and /src/config.ts for me' }],
+        });
+        printResult(r);
+
+        const checks = [];
+        checks.push({ name: 'stop_reason=tool_use', ok: r.stopReason === 'tool_use' });
+        checks.push({ name: '≥2个工具调用', ok: r.tools.length >= 2 });
+        checks.push({ name: '工具参数都有 file_path', ok: r.tools.every(t => t.input?.file_path) });
+
+        for (const c of checks) {
+            console.log(`     ${c.ok ? '✅' : '❌'} ${c.name}`);
+            c.ok ? passed++ : failed++;
+        }
+    } catch (err) {
+        console.log(`  ❌ 失败: ${err.message}`);
+        failed++;
+    }
+
+    // ---- Test 3: 纯文字（带工具定义但不需要调用） ----
+    console.log('\n  ━━━ ③ 纯文字（有工具但不调用） ━━━');
+    try {
+        const r = await runAnthropicTest('纯文字', {
+            model: 'claude-3-5-sonnet-20241022', max_tokens: 512, stream: true,
+            system: 'You are helpful. Answer questions directly without using any tools.',
+            tools: TOOLS,
+            messages: [{ role: 'user', content: 'What is 2+2? Just answer with the number.' }],
+        });
+        printResult(r);
+
+        const checks = [];
+        checks.push({ name: 'stop_reason=end_turn', ok: r.stopReason === 'end_turn' });
+        checks.push({ name: '0个工具调用', ok: r.tools.length === 0 });
+        checks.push({ name: '有文字输出', ok: r.textLength > 0 });
+        checks.push({ name: '文字含数字4', ok: r.textPreview.includes('4') });
+
+        for (const c of checks) {
+            console.log(`     ${c.ok ? '✅' : '❌'} ${c.name}`);
+            c.ok ? passed++ : failed++;
+        }
+    } catch (err) {
+        console.log(`  ❌ 失败: ${err.message}`);
+        failed++;
+    }
+
+    // ---- 汇总 ----
+    console.log(`\n  ━━━ 汇总 ━━━`);
+    console.log(`  ✅ 通过: ${passed}  ❌ 失败: ${failed}\n`);
+    process.exit(failed > 0 ? 1 : 0);
+}
+
+main().catch(err => { console.error('致命错误:', err); process.exit(1); });