548 Commits

Author SHA1 Message Date
Luis Pater
fb08b92402 feat(executor): add upstream disconnect handling for Codex WebSocket sessions
- Introduced `UpstreamDisconnectChan` for Codex WebSocket sessions to notify downstream connections of upstream disconnections.
- Implemented `notifyUpstreamDisconnect` to signal errors and close channels on disconnect events.
- Added integration tests to validate WebSocket session behavior on upstream disconnect.
- Updated OpenAI WebSocket response handlers to properly close connections upon upstream disconnect notifications.
2026-05-06 22:09:33 +08:00
Luis Pater
ba5d8ca733 feat(usage): add support for requested model alias handling
- Introduced methods for setting and retrieving model aliases in execution and usage contexts.
- Enhanced `UsageReporter` and related structures to include client-requested aliases.
- Updated tests to validate alias propagation and ensure correct usage reporting.
- Adjusted metadata handling in CLIProxyAPI executors to address alias integration.
2026-05-05 01:47:53 +08:00
Luis Pater
28b4b19e7e Merge pull request #3208 from kdcokenny/codex-websocket-protocol-parity
Align Codex websocket protocol semantics
2026-05-05 01:29:19 +08:00
Luis Pater
bdc424007e Merge pull request #2896 from edlsh/fix/oauth-tool-rename-per-request-map
fix(amp): smart-mode tool name fixes + deep-mode response repair
2026-05-05 00:58:39 +08:00
Luis Pater
e4a93c02c5 fix(executor): enhance parsing of OpenAI stream data lines
- Added trimming for stream input lines to prevent processing of unnecessary whitespace.
- Improved handling of unsupported prefixes and malformed JSON responses, ensuring errors are recorded and propagated appropriately.

Fixed: #2690
2026-05-04 23:42:26 +08:00
Luis Pater
8262a03f29 Merge PR #2568: fix Claude refresh backoff 2026-05-04 21:44:11 +08:00
Luis Pater
ecf1c2590c fix: preserve Antigravity cancellation errors 2026-05-04 21:18:18 +08:00
Luis Pater
162897e02a Merge remote-tracking branch 'origin/pr/3205' into dev 2026-05-04 21:17:01 +08:00
Luis Pater
bf6fa402e2 fix(executor): strip Vertex OpenAI response tool call IDs for consistency
- Integrated `StripVertexOpenAIResponsesToolCallIDs` to remove tool call ID data from request bodies and translated requests.
- Ensures uniformity and avoids unnecessary payload data propagation.

Fixed: #2549
2026-05-04 17:54:16 +08:00
Luis Pater
89d80bfff4 fix(executor): adjust ApplyThinking order and add payload override test
- Moved `ApplyThinking` logic earlier in `openai_compat_executor` to align with configuration application sequence.
- Added test to verify payload override precedence over Thinking suffix configuration.
2026-05-04 16:45:25 +08:00
Kenny
6b4bc0a9a8 Align Codex default identity and docs 2026-05-03 21:13:37 -07:00
Kenny
08b0fe6816 Fix Codex websocket retry metadata 2026-05-03 19:01:44 -07:00
Kenny
c19ae1d5be Align Codex websocket protocol semantics 2026-05-03 15:56:39 -07:00
Luis Pater
2753d9fb71 feat: add validation for Claude streaming responses
- Implemented `validateClaudeStreamingResponse` to ensure upstream streaming data integrity.
- Added new tests to verify response validation, including empty streams, error events, incomplete streams, and valid streams.
- Integrated validation logic into the Claude executor's streaming handler, returning detailed errors for malformed upstream data.

Fixed: #2193
2026-05-04 03:37:31 +08:00
1137043480
bf0e5c23f7 fix: prevent goroutine leaks in streaming executors via context-aware channel sends
All streaming executors use bare channel sends (out <- chunk) inside goroutines
that process upstream SSE responses. When the downstream consumer disconnects
(client timeout, network drop, etc.), these sends block indefinitely, causing
the goroutine and all associated resources (HTTP response body, scanner buffers,
translation state) to leak permanently.

Over time, leaked goroutines accumulate monotonically, leading to RSS growth
from ~30MB to 3.7GB+ and eventual OOM kills on resource-constrained VPS hosts.

Fix: Replace all bare 'out <- ...' sends with:
  select {
  case out <- ...:
  case <-ctx.Done():
    return
  }

This ensures goroutines terminate promptly when the request context is canceled,
allowing GC to reclaim all associated resources.

Affected executors (9 files, 36+ send sites):
- antigravity_executor.go (5 sites)
- gemini_cli_executor.go (6 sites)
- gemini_vertex_executor.go (6 sites)
- aistudio_executor.go (4 sites)
- gemini_executor.go (3 sites)
- openai_compat_executor.go (3 sites)
- claude_executor.go (4 sites)
- codex_executor.go (2 sites)
- kimi_executor.go (3 sites)
2026-05-03 11:25:04 -04:00
Luis Pater
672fdd14ed feat: filter and drop empty assistant messages in Kimi executor
- Added `filterKimiEmptyAssistantMessages` to identify and remove empty assistant messages with no content, tool links, or reasoning.
- Integrated logging to track the number of dropped messages.
- Updated tests to validate the filtering logic for both empty and valid assistant messages.

Fixed: #1730
2026-05-03 22:40:42 +08:00
Luis Pater
6ba7c810a7 feat: apply image_generation filtering before payload rules
- Updated `ApplyPayloadConfigWithRoot` to prioritize `disable-image-generation` filtering before applying payload rules.
- Ensured payload overrides can explicitly re-enable `image_generation` when required.
- Added unit tests to validate `image_generation` restoration through overrides.
2026-04-30 12:42:08 +08:00
Luis Pater
f56a19e5b8 feat: add tri-state support for disable-image-generation configuration
- Introduced `DisableImageGenerationMode` with support for `false`, `true`, and `chat` values.
- Updated payload handling to preserve `image_generation` on images endpoints when `chat` mode is enabled.
- Modified OpenAI image handlers (`ImagesGenerations`, `ImagesEdits`) to respect tri-state logic.
- Added unit tests for `DisableImageGenerationMode` behavior and endpoint-specific handling.
- Enhanced configuration diff logging to support `DisableImageGenerationMode`.
2026-04-30 12:10:27 +08:00
Luis Pater
46018417ad feat: remove tool_choice for image_generation when disabled
- Added logic to remove `tool_choice` entries of type `image_generation` from payloads when `disable-image-generation` is enabled.
- Updated `ApplyPayloadConfigWithRoot` to handle new removal logic.
- Added unit tests to verify `tool_choice` removal behavior.
2026-04-30 08:24:14 +08:00
Luis Pater
e3e60f914b feat: support disabling image generation globally
- Added `disable-image-generation` configuration flag to disable the `image_generation` tool globally.
- Updated payload handling to remove `image_generation` tools from request payload arrays when the flag is enabled.
- Modified OpenAI image handlers (`ImagesGenerations`, `ImagesEdits`) to return 404 when the feature is disabled.
- Enhanced configuration diff logging to track changes for the `disable-image-generation` flag.
- Added accompanying unit tests for the new feature in payload helpers and image handler logic.
2026-04-30 03:42:27 +08:00
Luis Pater
a1f0ed9575 Merge pull request #3071 from sususu98/fix/antigravity-credits-log
Mark Antigravity credits requests in access logs
2026-04-29 22:56:41 +08:00
sususu98
4982512da2 fix: parse gemini cli usage metadata variants 2026-04-29 13:10:53 +08:00
sususu98
0e1235122e fix antigravity client agent headers 2026-04-28 19:04:40 +08:00
sususu98
e78d45acc9 fix antigravity user agent handling 2026-04-28 19:04:40 +08:00
xbang
a992dee4e8 fix(antigravity): use real antigravity UA when polling credits balance
The loadCodeAssist polling call hardcoded the User-Agent to
google-api-nodejs-client/9.15.1. Google Cloud Code returns the
paidTier object WITHOUT the availableCredits array for that UA,
so updateAntigravityCreditsBalance always saw "no credits", set the
hint to Available=false for every Google One AI Ultra account, and
the conductor-level credits fallback could never find a candidate.

Switching to resolveUserAgent(auth) (the same UA used for
streamGenerateContent / generateContent) makes the response include
availableCredits, so the credits hint is populated correctly and the
fallback can actually inject enabledCreditTypes:["GOOGLE_ONE_AI"]
when free tier is exhausted.
2026-04-28 16:21:15 +08:00
Luis Pater
04a336f7df fix(usage_helpers): skip zero-token usage in additional model records
- Added `buildAdditionalModelRecord` to filter out zero-token usage details.
- Introduced `hasNonZeroTokenUsage` helper function for token usage validation.
- Updated tests to cover scenarios for zero and non-zero token usage.
2026-04-27 10:56:22 +08:00
sususu98
6fc23568df logging: mark antigravity credits requests 2026-04-26 23:04:27 +08:00
Luis Pater
c5bea6f6f8 Merge pull request #3020 from Matthias319/fix/codex-error-classification
fix(codex): classify context, thinking-signature, previous-response, and auth failures
2026-04-26 22:26:40 +08:00
Luis Pater
c7b28ba058 feat(executor): add support for Codex image generation tool usage tracking
- Introduced `publishCodexImageToolUsage` to report image generation tool metrics.
- Updated executor logic to handle image generation tool events and defaults.
- Added parsing logic for `image_gen` tool usage details in `helps/usage_helpers.go`.
- Updated `UsageReporter` for additional model-specific usage publishing.
- Refactored usage detail normalizations.

Closes: #3063
2026-04-26 22:19:03 +08:00
Luis Pater
38573050aa feat(config): add support for disabling OpenAI compatibility providers
- Introduced a `Disabled` flag to OpenAI compatibility configurations.
- Updated routing, auth selection, and API handling logic to respect the `Disabled` state.
- Extended relevant APIs, YAML configurations, and data structures to include the `Disabled` field.
- Adjusted all relevant loops and filters to skip disabled providers.

Closes: #3060 #3059 #2977
2026-04-26 21:49:36 +08:00
Enzo Lucchesi
fc1ddf365f fix(claude): centralize oauth tool-name transform flow 2026-04-25 17:45:03 -04:00
edlsh
03ea4e569f perf(claude): pre-allocate reverseMap capacity
Address Gemini code review suggestion: the reverseMap can contain at
most len(oauthToolRenameMap) entries, so pre-allocating avoids
reallocations as entries are added.
2026-04-25 17:45:03 -04:00
Enzo Lucchesi
e707cf7d46 fix(claude): only reverse-remap OAuth tool names that were forward-renamed
remapOAuthToolNames renames lowercase client-sent tools (e.g. `glob` →
`Glob`) to Claude Code equivalents on OAuth requests to avoid tool-name
fingerprinting. The reverse pass previously ran against a *global*
reverse map and rewrote every tool_use block whose name matched any
value in oauthToolRenameMap — regardless of what the client actually
sent.

For clients that send mixed casing (notably Amp CLI — `Bash`, `Read`,
`Grep`, `Task` alongside `glob`, `skill`, etc.) this corrupted the
response. Any forward rename in the request set the "renamed" flag,
which then unconditionally lowercased every `Bash` in the response to
`bash`. Amp's tool registry has `Bash`, not `bash`, so it rejected the
tool_use with `tool "bash" is not allowed for smart mode` and tool
execution failed.

Fix: `remapOAuthToolNames` now returns a per-request map keyed on the
upstream (TitleCase) name valued with the original client-sent name.
The reverse functions take this map and only touch entries in it.
Names the client sent in TitleCase pass through untouched in both
directions.

- Change remapOAuthToolNames signature from `([]byte, bool)` to
  `([]byte, map[string]string)`; populate at every rename site
  (tools[], tool_choice.name, message tool_use, tool_reference,
  nested tool_reference inside tool_result).
- Change reverseRemapOAuthToolNames and
  reverseRemapOAuthToolNamesFromStreamLine to accept and consume the
  per-request map; remove the global oauthToolRenameReverseMap.
- Update all three executor call sites (Execute, ExecuteStream direct
  passthrough, ExecuteStream translated) + count_tokens.
- Add regression tests for the mixed-case scenario in both the
  non-streaming and SSE code paths.
2026-04-25 17:45:03 -04:00
Luis Pater
28d78273e4 feat(api): implement protocol multiplexer and Redis queue for usage integration
- Added `protocol_multiplexer.go`, enabling support for both HTTP and Redis protocols on a single listener.
- Introduced `redis_queue_protocol.go` to handle Redis-compatible RESP commands for queue management.
- Integrated `redisqueue` package, supporting in-memory queuing with expiration pruning.
- Updated server initialization to manage a shared listener and multiplex connections.
- Adjusted `Handler` to adopt `AuthenticateManagementKey` for modular key validation, supporting both HTTP and Redis flows.
2026-04-25 18:52:24 +08:00
Luis Pater
a7e92e2639 feat(auth): disallow free-tier Codex auth during selection process
- Introduced `disallowFreeAuthFromMetadata` and `isFreeCodexAuth` to enforce skipping free-tier credentials.
- Modified scheduler logic to honor `DisallowFreeAuthMetadataKey` during auth selection.
- Updated `ensureImageGenerationTool` to skip tool injection for free-tier Codex auth.
- Added context utility `WithDisallowFreeAuth` and integrated with image handlers.
- Augmented relevant tests to cover free-tier exclusion scenarios.
2026-04-24 23:18:56 +08:00
Matthias319
4056c2590b fix(codex): classify known upstream failures
Normalize Codex context, thinking-signature, previous-response, and auth failures to explicit error codes: context_too_large, thinking_signature_invalid, previous_response_not_found, auth_unavailable.

Refs #2596.
2026-04-24 17:13:23 +02:00
Luis Pater
f1ba6151a9 feat(codex): pass base model to enable conditional image_generation tool injection
- Modified `ensureImageGenerationTool` to accept `baseModel` for conditional logic.
- Ensured `gpt-5.3-codex-spark` models bypass image_generation tool injection.
- Updated relevant tests and executor logic to reflect changes.
2026-04-24 07:21:03 +08:00
sususu98
12195a276e Merge pull request #2971 from sususu98/feat/antigravity-credits-fallback
feat(antigravity): conductor-level credits fallback for Claude models
2026-04-24 00:15:23 +08:00
sususu98
7ad1900041 perf(antigravity): async credits hint refresh for warm tokens 2026-04-23 23:58:10 +08:00
sususu98
920b6efffa refactor(logging): strip unrelated deferred body changes, keep credits-only logging
Remove deferred body optimization and maxErrorLog constants that were
unrelated to credits fallback. Keep only MarkCreditsUsed/CreditsUsed
helpers for flagging requests that consumed AI credits.
2026-04-23 17:41:54 +08:00
sususu98
e75daa299b fix(antigravity): respect pinned auth in credits fallback, release deferred body on success
- findAllAntigravityCreditsCandidateAuths now filters by PinnedAuthMetadataKey
  to prevent credential isolation violations during credits fallback
- Release deferredBody reference on success path to avoid holding large
  payloads in memory for the lifetime of the gin context
2026-04-23 17:38:02 +08:00
sususu98
4de5c29f86 fix(antigravity): remove credits fallback from CountTokens, fix gofmt
CountTokens upstream API does not support enabledCreditTypes, so
remove the dead credits fallback path from ExecuteCount and delete
the unused tryAntigravityCreditsExecuteCount method. Fix gofmt on
credits test file.
2026-04-23 15:17:00 +08:00
sususu98
14d46a0a5d feat(antigravity): conductor-level credits fallback for Claude models
Move credits handling from executor-level retry to conductor-level
orchestration. When all free-tier auths are exhausted (429/503), the
conductor discovers auths with available Google One AI credits and
retries with enabledCreditTypes injected via context flag.

Key changes:
- Add AntigravityCreditsHint system for tracking per-auth credits state
- Conductor tries credits fallback after all auths fail (Execute/Stream/Count)
- Executor injects enabledCreditTypes only when conductor sets context flag
- Credits fallback respects provider scope (requires antigravity in providers)
- Add context cancellation check in credits fallback to avoid wasted requests
- Remove executor-level attemptCreditsFallback and preferCredits machinery
- Restructure 429 decision logic (parse details first, keyword fallback)
- Expand shouldAbort to cover INVALID_ARGUMENT/FAILED_PRECONDITION/500+UNKNOWN
- Support human-readable retry delay parsing (e.g. "1h43m56s")
2026-04-23 13:44:20 +08:00
MoYeRanQianZhi
31934ae04c feat(codex): enable image generation for all Codex upstream requests
Codex CLI gates the built-in image_generation tool behind
AuthMode::Chatgpt (OAuth only). When clients connect via API key
auth through CPA, the tool is absent from requests, making image
generation unavailable through the reverse proxy.

Changes:

1. Inject image_generation tool (codex_executor.go):
   Add ensureImageGenerationTool() that appends
   {"type":"image_generation","output_format":"png"} to the tools
   array if not already present. Applied to all three execution
   paths: Execute, executeCompact, and ExecuteStream.

2. Route aliases for Codex CLI direct access (server.go):
   Add /backend-api/codex/responses routes that map to the same
   OpenAI Responses API handlers as /v1/responses. This allows
   Codex CLI to connect via chatgpt_base_url config while keeping
   AuthMode::Chatgpt, which enables the built-in image_generation
   tool on the client side.

3. Unit tests (codex_executor_imagegen_test.go):
   Cover no-tools, existing tools, already-present, empty array,
   and mixed built-in tool scenarios.
2026-04-23 01:24:40 +08:00
stringer07
b6781d69be perf(codex): avoid repeated output patch writes 2026-04-21 16:29:54 +08:00
stringer07
bb8408cef5 fix(codex): backfill streaming response output 2026-04-21 16:03:56 +08:00
octo-patch
f4eb16102b fix(executor): drop obsolete context-1m-2025-08-07 beta header (fixes #2866)
Anthropic has moved the 1M-context-window feature to General Availability,
so the context-1m-2025-08-07 beta flag is no longer accepted and now causes
400 Bad Request errors when forwarded upstream.

Remove the X-CPA-CLAUDE-1M detection and the corresponding injection of the
now-invalid beta header.  Also drop the unused net/textproto import that was
only needed for the header-key lookup.
2026-04-19 10:38:16 +08:00
hkfires
d9a3b3e5f3 fix(tests): update model lookup references and enhance Claude executor tests 2026-04-17 08:32:07 +08:00
Luis Pater
f5dc6483d5 chore: remove iFlow-related modules and dependencies
- Deleted `iflow` provider implementation, including thinking configuration (`apply.go`) and authentication modules.
- Removed iFlow-specific tests, executors, and helpers across SDK and internal components.
- Updated all references to exclude iFlow functionality.
2026-04-17 01:07:12 +08:00
Luis Pater
d949921143 feat(auth): add proxy URL override support to auth constructors and executors
- Introduced `WithProxyURL` variants for `CodexAuth`, `ClaudeAuth`, `IFlowAuth`, and `DeviceFlowClient`.
- Updated executors to use proxy-aware constructors for improved configurability.
- Added unit tests to validate proxy override precedence and functionality.

Closes: #2823
2026-04-16 22:11:39 +08:00