- converter: replace rough overhead formula (tools*70+350) with actual
estimateTokens on built few-shot messages + Cursor hidden overhead
(1300 base + perTool by schema_mode); remove 16000 output reservation
- cursor-client: sendCursorRequestFull now returns {text, usage?} to
capture real Cursor inputTokens/outputTokens from messageMetadata
- handler: add estimateCursorReqTokens() and [TokenDiff] log to compare
tiktoken estimate vs actual Cursor usage; fix non-stream retry paths
to update usage from retry result; skip auto-continue when response < 200 chars
- openai-handler: update 4 call sites for new sendCursorRequestFull return type
- config: raise default maxHistoryTokens from 130000 to 150000
- docs/config: remove incorrect 'tiktoken underestimates 10~20%' claim;
update overhead description and reference range to 130000~170000
Capture the actual input/output token counts from Cursor API's finish
event (messageMetadata.usage) and use them in place of tiktoken
estimates where available. Fall back to tiktoken if not present.
- src/types.ts: extend CursorSSEEvent with finishReason/messageMetadata
- src/handler.ts: capture finish event usage in streaming paths, pass
real counts to updateSummary with tiktoken fallback
- src/logger.ts: add inputTokens/outputTokens fields to RequestSummary
- vue-ui: show ↑/↓ Cursor tokens in RequestList, DetailPanel, PayloadView
- public/logs.js: show ↑/↓ Cursor tokens in scard and prompts summary
Add /api/config GET+POST endpoints to read and write config.yaml fields
that support hot-reload. Frontend: config button in AppHeader opens a
650px side drawer with grouped fields, SegSelect/Toggle components,
YAML key names as labels, and descriptions sourced from config.yaml.example.
thinking.enabled supports a 3-way selector (auto/off/on) where auto
deletes the yaml key so the default kicks in.
Adds line-by-line repeat detection alongside the existing delta-level
degenerate loop guard. When the same non-empty line appears 8+ consecutive
times, the stream is cancelled and a retryable error is thrown.
1. Tool passthrough mode (config: tools.passthrough: true)
- Bypasses multi-namespace few-shot injection
- Embeds raw tool definitions in <tools> tags with minimal 1-shot example
- Cleans conflicting client prompts (provider-native tool calling, XML markup)
- Ideal for Roo Code / Cline clients
2. Enhanced Cursor identity leak sanitization
- New refusal detection patterns for "currently in Cursor context" leaks
- 4 new sanitizeResponse regexes targeting full context leak paragraphs
- Covers "I apologize - it appears I'm in Cursor support assistant context"
3. Enhanced tool_choice=any force message
- Lists available tool names (up to 15) with format example
- Uses collaborative guidance tone to avoid triggering refusal
- Stream and non-stream paths aligned
CC 压缩后,消息主体全是 XML 标签(压缩的上下文摘要),
剥离标签后 actualQuery 为空,回退保留了完整 XML 内容,
但追加的 suffix 是通用的"Respond with structured format",
模型看不到具体任务 → 回答"你有什么问题吗?"
修复:检测到压缩回退场景时,改用上下文感知的引导指令:
"Based on the context above, determine the next step and
proceed..." + "Do NOT ask the user what they want",
让模型直接根据压缩上下文继续工作,而非等待新指令。
- behaviorRules: append action-first directive reducing narration 37% (32%→20%)
\"Each response must be maximally efficient: omit preamble and planning text
when the next step is clear—go straight to the action block.\"
- Continuation suffix: shorten from ~180 chars to ~30 chars (83% token saving)
\"Based on the output above, continue working...\" → \"Continue with the next action.\"
- Add A/B test harnesses (e2e-prompt-ab.mjs, e2e-prompt-ab2.mjs) for future prompt tuning
Tested 4 variants × 17 scenarios. Candidate B won with:
- Narration ratio: 32% → 20%
- Response latency: 2372ms → 1795ms (↓24%)
- Zero over-action side effects
- 9/9 continuation scenarios passed
Tighten image path normalization, preserve multimodal request content across OpenAI-compatible endpoints, and fail fast on unsupported image_file inputs so clients get predictable behavior instead of silent degradation.
Made-with: Cursor
Fix "stream disconnected before completion: stream closed before response.completed"
error when using Codex CLI. The /v1/responses endpoint was outputting Chat Completions
SSE format (data-only) instead of the Responses API SSE format with event: prefixes.
Codex expects specific SSE events:
- response.created / response.in_progress
- response.output_text.delta (incremental text)
- response.function_call_arguments.delta (tool calls)
- response.completed (★ the critical event Codex waits for)
Changes:
- Rewrite handleOpenAIResponses() with dedicated stream/non-stream handlers
- Add writeResponsesSSE() for event: + data: format
- Support function_call output items for tool calls
- Add error recovery that always emits response.completed
- Add identity probe mock responses in Responses API format