63 Commits

Author SHA1 Message Date
Luis Pater
959067edfb feat(usage): introduce executor type tracking in usage reporting
- Replaced `NewUsageReporter` with `NewExecutorUsageReporter` to include executor type in usage records.
- Updated all executors to use the new reporter implementation.
- Extended `UsageReporter` to track and publish executor type.
- Added tests to validate proper executor type recording and handling.
- Enhanced RedisQueue plugin and payload schema with executor type support.
2026-06-02 00:43:16 +08:00
sususu98
aee7a5fbc5 feat: intercept incompatible signature replay 2026-05-29 15:22:57 +08:00
Luis Pater
94c1b25146 feat(executor): add TTFT tracking and reporting for enhanced performance metrics
- Introduced Time-To-First-Token (TTFT) measurement and reporting across major executors.
- Added TTFT calculation to `UsageReporter`, including support for HTTP clients and WebSocket communication.
- Updated tests to validate TTFT tracking in streamed and non-streamed scenarios.
- Ensured integration with `usage` plugin and augmented usage records with TTFT data.
2026-05-28 02:59:24 +08:00
Luis Pater
11f0f906bd feat(logging): add SetTranslatedReasoningEffort to track reasoning levels in usage reporting
- Introduced `SetTranslatedReasoningEffort` method in `UsageReporter` to capture and log reasoning efforts from translated payloads.
- Updated executors to incorporate the new reporting functionality for handling reasoning efforts across various providers.
- Enhanced logging for thinking level extraction with new helper function `ExtractTranslatedReasoningEffort`.
2026-05-28 02:19:45 +08:00
Luis Pater
feebe6c7f2 feat(api): add OpenAI compatibility for image models
- Introduced OpenAI-compatible image model support in the API, enabling integration through image generation and editing endpoints.
- Added registry type for OpenAIImageModelType to classify and validate compatibility.
- Implemented request handling for OpenAI-compatible image models, including JSON and multipart formats.
- Enhanced executor methods to support OpenAI-compatible image streaming and non-streaming requests.
- Included tests to validate model registration, streaming behavior, and multipart payload formatting.
2026-05-19 10:13:26 +08:00
Luis Pater
2007a89594 feat(runtime): enhance payload rule resolution with dynamic path support
- Introduced `resolvePayloadRulePaths` function to dynamically resolve rule paths supporting array queries and complex logic.
- Updated payload processing logic (`apply defaults`, `overrides`, `filters`) to handle resolved paths for better flexibility.
- Added helper functions for path parsing, query matching, and logical resolution to improve modularity and reusability.
- Introduced payload condition match logic, including `match`, `not-match`, `exist`, and `not-exist` rules in `PayloadConfig`.
- Enhanced `payloadModelRulesMatch` function to support conditional checks at various levels.
- Added helper methods for evaluating JSON path conditions and values.
- Updated tests to validate new conditional rules against different payload scenarios.
2026-05-17 23:06:43 +08:00
Luis Pater
66c3dae06b feat(home): implement count for home auth dispatch requests and enable usage statistics
- Added `count` attribute to `homeAuthCount` requests to improve home message batching.
- Enabled usage statistics for home mode by default and added config-level enforcement.
- Adjusted failure logging to include detailed metadata in `UsageReporter`.
- Updated multiple executors to pass error details to `PublishFailure` for better debugging.
- Enhanced unit tests to validate `count` behavior and usage statistics enforcement across components.
2026-05-10 01:30:43 +08:00
Luis Pater
e50cabac4b chore: upgrade CLIProxyAPI dependency to v7 across the project
- Updated all references from v6 to v7 for `github.com/router-for-me/CLIProxyAPI`.
- Ensured consistency in imports within core libraries, tests, and integration tests.
- Added missing tests for new features in Redis Protocol integration.
2026-05-08 11:46:46 +08:00
Luis Pater
e4a93c02c5 fix(executor): enhance parsing of OpenAI stream data lines
- Added trimming for stream input lines to prevent processing of unnecessary whitespace.
- Improved handling of unsupported prefixes and malformed JSON responses, ensuring errors are recorded and propagated appropriately.

Fixed: #2690
2026-05-04 23:42:26 +08:00
Luis Pater
162897e02a Merge remote-tracking branch 'origin/pr/3205' into dev 2026-05-04 21:17:01 +08:00
Luis Pater
89d80bfff4 fix(executor): adjust ApplyThinking order and add payload override test
- Moved `ApplyThinking` logic earlier in `openai_compat_executor` to align with configuration application sequence.
- Added test to verify payload override precedence over Thinking suffix configuration.
2026-05-04 16:45:25 +08:00
1137043480
bf0e5c23f7 fix: prevent goroutine leaks in streaming executors via context-aware channel sends
All streaming executors use bare channel sends (out <- chunk) inside goroutines
that process upstream SSE responses. When the downstream consumer disconnects
(client timeout, network drop, etc.), these sends block indefinitely, causing
the goroutine and all associated resources (HTTP response body, scanner buffers,
translation state) to leak permanently.

Over time, leaked goroutines accumulate monotonically, leading to RSS growth
from ~30MB to 3.7GB+ and eventual OOM kills on resource-constrained VPS hosts.

Fix: Replace all bare 'out <- ...' sends with:
  select {
  case out <- ...:
  case <-ctx.Done():
    return
  }

This ensures goroutines terminate promptly when the request context is canceled,
allowing GC to reclaim all associated resources.

Affected executors (9 files, 36+ send sites):
- antigravity_executor.go (5 sites)
- gemini_cli_executor.go (6 sites)
- gemini_vertex_executor.go (6 sites)
- aistudio_executor.go (4 sites)
- gemini_executor.go (3 sites)
- openai_compat_executor.go (3 sites)
- claude_executor.go (4 sites)
- codex_executor.go (2 sites)
- kimi_executor.go (3 sites)
2026-05-03 11:25:04 -04:00
Luis Pater
f56a19e5b8 feat: add tri-state support for disable-image-generation configuration
- Introduced `DisableImageGenerationMode` with support for `false`, `true`, and `chat` values.
- Updated payload handling to preserve `image_generation` on images endpoints when `chat` mode is enabled.
- Modified OpenAI image handlers (`ImagesGenerations`, `ImagesEdits`) to respect tri-state logic.
- Added unit tests for `DisableImageGenerationMode` behavior and endpoint-specific handling.
- Enhanced configuration diff logging to support `DisableImageGenerationMode`.
2026-04-30 12:10:27 +08:00
Luis Pater
38573050aa feat(config): add support for disabling OpenAI compatibility providers
- Introduced a `Disabled` flag to OpenAI compatibility configurations.
- Updated routing, auth selection, and API handling logic to respect the `Disabled` state.
- Extended relevant APIs, YAML configurations, and data structures to include the `Disabled` field.
- Adjusted all relevant loops and filters to skip disabled providers.

Closes: #3060 #3059 #2977
2026-04-26 21:49:36 +08:00
James
65e9e892a4 Fix missing response.completed.usage for late-usage OpenAI-compatible streams 2026-04-04 05:58:04 +00:00
Luis Pater
d2c7e4e96a refactor(runtime): move executor utilities to helps package and update references 2026-04-01 03:08:20 +08:00
Luis Pater
2bd646ad70 refactor: replace sjson.Set usage with sjson.SetBytes to optimize mutable JSON transformations 2026-03-19 17:58:54 +08:00
Zhenyu Qi
aec65e3be3 fix(openai_compat): add stream_options.include_usage for streaming usage tracking 2026-03-13 00:48:17 -07:00
Kirill Turanskiy
1f8f198c45 feat: passthrough upstream response headers to clients
CPA previously stripped ALL response headers from upstream AI provider
APIs, preventing clients from seeing rate-limit info, request IDs,
server-timing and other useful headers.

Changes:
- Add Headers field to Response and StreamResult structs
- Add FilterUpstreamHeaders helper (hop-by-hop + security denylist)
- Add WriteUpstreamHeaders helper (respects CPA-set headers)
- ExecuteWithAuthManager/ExecuteCountWithAuthManager now return headers
- ExecuteStreamWithAuthManager returns headers from initial connection
- All 11 provider executors populate Response.Headers
- All handler call sites write filtered upstream headers before response

Filtered headers (not forwarded):
- RFC 7230 hop-by-hop: Connection, Transfer-Encoding, Keep-Alive, etc.
- Security: Set-Cookie
- CPA-managed: Content-Length, Content-Encoding
2026-02-18 00:16:22 +03:00
Luis Pater
a5a25dec57 refactor(translator, executor): remove redundant bytes.Clone calls for improved performance
- Replaced all instances of `bytes.Clone` with direct references to enhance efficiency.
- Simplified payload handling across executors and translators by eliminating unnecessary data duplication.
2026-02-06 03:26:29 +08:00
Luis Pater
09ecfbcaed refactor(executor): optimize payload cloning and streamline SDK translator usage
- Replaced unnecessary `bytes.Clone` calls for `opts.OriginalRequest` throughout executors.
- Introduced intermediate variable `originalPayloadSource` to simplify payload processing.
- Ensured better clarity and structure in request translation logic.
2026-02-06 01:44:20 +08:00
Shady Khalifa
53920b0399 fix(openai): drop stream for responses/compact 2026-01-27 18:27:34 +02:00
Shady Khalifa
95096bc3fc feat(openai): add responses/compact support 2026-01-26 16:36:01 +02:00
hkfires
f30ffd5f5e feat(executor): add request_id to error logs
Extract error.message from JSON error responses when summarizing error bodies for debug logs
2026-01-25 21:31:46 +08:00
hkfires
ecc850bfb7 feat(executor): apply payload rules using requested model 2026-01-23 16:38:41 +08:00
hkfires
e641fde25c feat(registry): support provider-specific model info lookup 2026-01-20 10:01:17 +08:00
hkfires
c7e8830a56 refactor(thinking): pass source and target formats to ApplyThinking for cross-format validation
Update ApplyThinking signature to accept fromFormat and toFormat parameters
instead of a single provider string. This enables:

- Proper level-to-budget conversion when source is level-based (openai/codex)
  and target is budget-based (gemini/claude)
- Strict budget range validation when source and target formats match
- Level clamping to nearest supported level for cross-format requests
- Format alias resolution in SDK translator registry for codex/openai-response

Also adds ErrBudgetOutOfRange error code and improves iflow config extraction
to fall back to openai format when iflow-specific config is not present.
2026-01-18 10:30:15 +08:00
hkfires
72f2125668 fix(executor): properly handle thinking application errors 2026-01-15 13:06:39 +08:00
hkfires
0b06d637e7 refactor: improve thinking logic 2026-01-15 13:06:39 +08:00
Luis Pater
e8e3bc8616 feat(executor): add HttpRequest support across executors for better http request handling 2026-01-10 16:25:25 +08:00
Luis Pater
af6bdca14f Fixed: #942
fix(executor): ignore non-SSE lines in OpenAI-compatible streams
2026-01-09 23:41:50 +08:00
Luis Pater
2a663d5cba feat(executor): enhance payload translation with original request context
Refactored `applyPayloadConfig` to `applyPayloadConfigWithRoot`, adding support for default rule validation against the original payload when available. Updated all executors to use `applyPayloadConfigWithRoot` and incorporate an optional original request payload for translations.
2026-01-02 00:03:26 +08:00
hkfires
96340bf136 refactor(executor): resolve upstream model at conductor level before execution 2025-12-30 19:31:54 +08:00
hkfires
367a05bdf6 refactor(thinking): export thinking helpers
Expose thinking/effort normalization helpers from the executor package
so conversion tests use production code and stay aligned with runtime
validation behavior.
2025-12-15 09:16:15 +08:00
Luis Pater
660aabc437 fix(executor): add allowCompat support for reasoning effort normalization
Introduced `allowCompat` parameter to improve compatibility handling for reasoning effort in payloads across OpenAI and similar models.
2025-12-13 04:06:02 +08:00
huynguyen03.dev
15c3cc3a50 fix(openai-compat): prevent model alias from being overwritten by ResolveOriginalModel
When using OpenAI-compatible providers with model aliases (e.g., glm-4.6-zai -> glm-4.6),
the alias resolution was correctly applied but then immediately overwritten by
ResolveOriginalModel, causing 'Unknown Model' errors from upstream APIs.

This fix skips the ResolveOriginalModel override when a model alias has already
been resolved, ensuring the correct model name is sent to the upstream provider.

Co-authored-by: Amp <amp@ampcode.com>
2025-12-12 17:20:24 +07:00
Luis Pater
a74ee3f319 Merge pull request #481 from sususu98/fix/increase-buffer-size
fix: increase buffer size for stream scanners to 50MB across multiple executors
2025-12-11 21:20:54 +08:00
hkfires
3a81ab22fd fix(runtime): unify reasoning effort metadata overrides 2025-12-11 14:35:05 +08:00
hkfires
519da2e042 fix(runtime): validate reasoning effort levels 2025-12-11 12:36:54 +08:00
hkfires
3ffd120ae9 feat(runtime): add thinking config normalization 2025-12-11 11:51:33 +08:00
Luis Pater
423ce97665 feat(util): implement dynamic thinking suffix normalization and refactor budget resolution logic
- Added support for parsing and normalizing dynamic thinking model suffixes.
- Centralized budget resolution across executors and payload helpers.
- Retired legacy Gemini-specific thinking handlers in favor of unified logic.
- Updated executors to use metadata-based thinking configuration.
- Added `ResolveOriginalModel` utility for resolving normalized upstream models using request metadata.
- Updated executors (Gemini, Codex, iFlow, OpenAI, Qwen) to incorporate upstream model resolution and substitute model values in payloads and request URLs.
- Ensured fallbacks handle cases with missing or malformed metadata to derive models robustly.
- Refactored upstream model resolution to dynamically incorporate metadata for selecting and normalizing models.
- Improved handling of thinking configurations and model overrides in executors.
- Removed hardcoded thinking model entries and migrated logic to metadata-based resolution.
- Updated payload mutations to always include the resolved model.
2025-12-11 03:10:50 +08:00
sususu
76c563d161 fix(executor): increase buffer size for stream scanners to 50MB across multiple executors 2025-12-10 23:20:04 +08:00
Luis Pater
d50b0f7524 **refactor(executor): simplify Gemini CLI execution and remove internal retry logic**
- Removed nested retry handling for 429 rate limit errors.
- Simplified request/response handling by cleaning redundant retry-related code.
- Eliminated `parseRetryDelay` function and max retry configuration logic.
2025-11-20 17:49:37 +08:00
Luis Pater
db2d22c978 **fix(runtime): simplify scanner buffer allocation in executor implementations** 2025-11-18 10:59:49 +08:00
Luis Pater
fcd98f4f9b **feat(runtime): add payload configuration support for executors**
Introduce `PayloadConfig` in the configuration to define default and override rules for modifying payload parameters. Implement `applyPayloadConfig` and `applyPayloadConfigWithRoot` to apply these rules across various executors, ensuring consistent parameter handling for different models and protocols. Update all relevant executors to utilize this functionality.
2025-11-13 23:27:40 +08:00
Luis Pater
ef7e8206d3 fix(executor): ensure usage reporting for upstream responses lacking usage data
Add `ensurePublished` to guarantee request counting even when usage fields (e.g., tokens) are absent in OpenAI-compatible executor responses, particularly for streaming paths.
2025-11-09 17:24:47 +08:00
hkfires
cfb9cb8951 feat(config): support HTTP headers across providers 2025-11-08 20:52:05 +08:00
hkfires
a517290726 refactor(executor): summarize API error bodies of html in debug logs 2025-10-31 06:58:38 +08:00
Luis Pater
a552a45b81 Fixed: #140 #133 #80
feat(translator): add token counting functionality for Gemini, Claude, and CLI

- Introduced `TokenCount` handling across various Codex translators (Gemini, Claude, CLI) with respective implementations.
- Added utility methods for token counting and formatting responses.
- Integrated `tiktoken-go/tokenizer` library for tokenization.
- Updated CodexExecutor with token counting logic to support multiple models including GPT-5 variants.
- Refined go.mod and go.sum to include new dependencies.

feat(runtime): add token counting functionality across executors

- Implemented token counting in OpenAICompatExecutor, QwenExecutor, and IFlowExecutor.
- Added utilities for token counting and response formatting using `tiktoken-go/tokenizer`.
- Integrated token counting into translators for Gemini, Claude, and Gemini CLI.
- Enhanced multiple model support, including GPT-5 variants, for token counting.

docs: update environment variable instructions for multi-model support

- Added details for setting `ANTHROPIC_DEFAULT_OPUS_MODEL`, `ANTHROPIC_DEFAULT_SONNET_MODEL`, and `ANTHROPIC_DEFAULT_HAIKU_MODEL` for version 2.x.x.
- Clarified usage of `ANTHROPIC_MODEL` and `ANTHROPIC_SMALL_FAST_MODEL` for version 1.x.x.
- Expanded examples for setting environment variables across different models including Gemini, GPT-5, Claude, and Qwen3.
2025-10-26 05:39:15 +08:00
Luis Pater
20985d1a10 Refactor executor error handling and usage reporting
- Updated the Execute methods in various executors (GeminiCLIExecutor, GeminiExecutor, IFlowExecutor, OpenAICompatExecutor, QwenExecutor) to return a response and error as named return values for improved clarity.
- Enhanced error handling by deferring failure tracking in usage reporters, ensuring that failures are reported correctly.
- Improved response body handling by ensuring proper closure and error logging for HTTP responses across all executors.
- Added failure tracking and reporting in the usage reporter to capture unsuccessful requests.
- Updated the usage logging structure to include a 'Failed' field for better tracking of request outcomes.
- Adjusted the logic in the RequestStatistics and Record methods to accommodate the new failure tracking mechanism.
2025-10-21 11:22:24 +08:00