- Integrated `StripVertexOpenAIResponsesToolCallIDs` to remove tool call ID data from request bodies and translated requests.
- Ensures uniformity and avoids unnecessary payload data propagation.
Fixed: #2549
- Introduced `claudeUsageTokens` struct for detailed token usage tracking.
- Replaced `calculateClaudeUsageTokens` with `Merge` and `OpenAIUsage` methods for better modularity.
- Enhanced integration of usage tokens into response processing, enabling more accurate reporting of token details.
Fixed: #2419
- Moved `ApplyThinking` logic earlier in `openai_compat_executor` to align with configuration application sequence.
- Added test to verify payload override precedence over Thinking suffix configuration.
- Added `setToolCallOutputContent` to process various content types, including arrays and fallback cases.
- Implemented robust handling for specific tool output types like text, image URLs, and files, ensuring proper serialization.
- Improved fallback logic to handle unexpected or missing data.
Fixed: #2313Closes: #2349
- Implemented `validateClaudeStreamingResponse` to ensure upstream streaming data integrity.
- Added new tests to verify response validation, including empty streams, error events, incomplete streams, and valid streams.
- Integrated validation logic into the Claude executor's streaming handler, returning detailed errors for malformed upstream data.
Fixed: #2193
All streaming executors use bare channel sends (out <- chunk) inside goroutines
that process upstream SSE responses. When the downstream consumer disconnects
(client timeout, network drop, etc.), these sends block indefinitely, causing
the goroutine and all associated resources (HTTP response body, scanner buffers,
translation state) to leak permanently.
Over time, leaked goroutines accumulate monotonically, leading to RSS growth
from ~30MB to 3.7GB+ and eventual OOM kills on resource-constrained VPS hosts.
Fix: Replace all bare 'out <- ...' sends with:
select {
case out <- ...:
case <-ctx.Done():
return
}
This ensures goroutines terminate promptly when the request context is canceled,
allowing GC to reclaim all associated resources.
Affected executors (9 files, 36+ send sites):
- antigravity_executor.go (5 sites)
- gemini_cli_executor.go (6 sites)
- gemini_vertex_executor.go (6 sites)
- aistudio_executor.go (4 sites)
- gemini_executor.go (3 sites)
- openai_compat_executor.go (3 sites)
- claude_executor.go (4 sites)
- codex_executor.go (2 sites)
- kimi_executor.go (3 sites)
- Added `filterKimiEmptyAssistantMessages` to identify and remove empty assistant messages with no content, tool links, or reasoning.
- Integrated logging to track the number of dropped messages.
- Updated tests to validate the filtering logic for both empty and valid assistant messages.
Fixed: #1730
- Introduced `redis-usage-queue-retention-seconds` config parameter with a default of 60 seconds and a max of 3600 seconds.
- Updated logic in `redisqueue` to honor configurable retention periods for enqueued usage data.
- Modified config validation and initialization to support and enforce retention limits.
- Enhanced change tracking in `config_diff` to detect updates to this parameter.
- Deleted the `LoggerPlugin` along with associated usage tracking and in-memory statistics logic.
- Removed all related tests (`logger_plugin_test.go`, `usage_tab_test.go`) and external-facing handler (`usage.go`) for usage statistics export/import.
- Cleaned up TUI integration by deleting `usage_tab.go`.
- Introduced `Success` and `Failed` fields in auth records to track request outcomes.
- Updated `/v0/management/auth-files` and `/v0/management/api-key-usage` responses to include success and failure counts.
- Enhanced tests to validate tracking logic and API responses.
- Updated `GetAPIKeyUsage` to group API key usage by "base_url|api_key" composite keys.
- Adjusted logic to handle `base_url` extraction from auth attributes.
- Revised unit tests to validate "base_url|api_key" grouping behavior.
- Implemented `GetAPIKeyUsage` to expose recent request data grouped by provider and API key.
- Added supporting function `mergeRecentRequestBuckets` for bucket aggregation.
- Registered new endpoint `/v0/management/api-key-usage` in the management API.
- Included extensive unit tests for provider and key-based grouping validation.
- Updated `formatRecentRequestBucketLabel` to support configurable bucket duration.
- Implemented `RecentRequestsSnapshot` in `Auth` to capture bucketed recent request data.
- Added new fields and methods to `Auth` for tracking request success and failure counts over time.
- Updated `/v0/management/auth-files` response to include recent request data for each auth record.
- Introduced unit tests to validate request tracking and snapshot generation logic.
- Introduced reusable utilities in `requestmeta` to manage endpoint and response status in request contexts.
- Refactored plugins and handlers to use context-based metadata, removing direct dependency on `gin`.
- Updated tests to validate new context utilities and replaced `gin`-based context handling.
Fixed: #3166
- Introduced a new test file for validating the conversion of OpenAI responses to chat completions.
- Implemented tests to ensure correct merging of consecutive function calls and proper handling of interrupted function calls.
- Enhanced the main conversion function to buffer consecutive function calls and emit them as a single assistant message.
- Updated `ApplyPayloadConfigWithRoot` to prioritize `disable-image-generation` filtering before applying payload rules.
- Ensured payload overrides can explicitly re-enable `image_generation` when required.
- Added unit tests to validate `image_generation` restoration through overrides.
- Introduced `DisableImageGenerationMode` with support for `false`, `true`, and `chat` values.
- Updated payload handling to preserve `image_generation` on images endpoints when `chat` mode is enabled.
- Modified OpenAI image handlers (`ImagesGenerations`, `ImagesEdits`) to respect tri-state logic.
- Added unit tests for `DisableImageGenerationMode` behavior and endpoint-specific handling.
- Enhanced configuration diff logging to support `DisableImageGenerationMode`.
- Added logic to remove `tool_choice` entries of type `image_generation` from payloads when `disable-image-generation` is enabled.
- Updated `ApplyPayloadConfigWithRoot` to handle new removal logic.
- Added unit tests to verify `tool_choice` removal behavior.
- Added `disable-image-generation` configuration flag to disable the `image_generation` tool globally.
- Updated payload handling to remove `image_generation` tools from request payload arrays when the flag is enabled.
- Modified OpenAI image handlers (`ImagesGenerations`, `ImagesEdits`) to return 404 when the feature is disabled.
- Enhanced configuration diff logging to track changes for the `disable-image-generation` flag.
- Added accompanying unit tests for the new feature in payload helpers and image handler logic.
The loadCodeAssist polling call hardcoded the User-Agent to
google-api-nodejs-client/9.15.1. Google Cloud Code returns the
paidTier object WITHOUT the availableCredits array for that UA,
so updateAntigravityCreditsBalance always saw "no credits", set the
hint to Available=false for every Google One AI Ultra account, and
the conductor-level credits fallback could never find a candidate.
Switching to resolveUserAgent(auth) (the same UA used for
streamGenerateContent / generateContent) makes the response include
availableCredits, so the credits hint is populated correctly and the
fallback can actually inject enabledCreditTypes:["GOOGLE_ONE_AI"]
when free tier is exhausted.
- Implemented `appendReasoningContent` to support processing of `thinking` signature and text as reasoning input.
- Added test cases to validate reasoning content conversion with and without text.
- Added `buildAdditionalModelRecord` to filter out zero-token usage details.
- Introduced `hasNonZeroTokenUsage` helper function for token usage validation.
- Updated tests to cover scenarios for zero and non-zero token usage.
- Introduced a `Disabled` flag to OpenAI compatibility configurations.
- Updated routing, auth selection, and API handling logic to respect the `Disabled` state.
- Extended relevant APIs, YAML configurations, and data structures to include the `Disabled` field.
- Adjusted all relevant loops and filters to skip disabled providers.
Closes: #3060#3059#2977
GPT-5.5 was correctly removed from codex-free tier in 7b89583c
(since free accounts cannot access it), but the test was not updated
to reflect this. This caused TestCodexStaticModelsIncludeGPT55 to
fail on the free subtest.
Changes:
- Remove free tier from GPT-5.5 inclusion test
- Add new TestCodexFreeModelsExcludeGPT55 to explicitly verify
that free tier does NOT include GPT-5.5
Address Gemini code review suggestion: the reverseMap can contain at
most len(oauthToolRenameMap) entries, so pre-allocating avoids
reallocations as entries are added.
remapOAuthToolNames renames lowercase client-sent tools (e.g. `glob` →
`Glob`) to Claude Code equivalents on OAuth requests to avoid tool-name
fingerprinting. The reverse pass previously ran against a *global*
reverse map and rewrote every tool_use block whose name matched any
value in oauthToolRenameMap — regardless of what the client actually
sent.
For clients that send mixed casing (notably Amp CLI — `Bash`, `Read`,
`Grep`, `Task` alongside `glob`, `skill`, etc.) this corrupted the
response. Any forward rename in the request set the "renamed" flag,
which then unconditionally lowercased every `Bash` in the response to
`bash`. Amp's tool registry has `Bash`, not `bash`, so it rejected the
tool_use with `tool "bash" is not allowed for smart mode` and tool
execution failed.
Fix: `remapOAuthToolNames` now returns a per-request map keyed on the
upstream (TitleCase) name valued with the original client-sent name.
The reverse functions take this map and only touch entries in it.
Names the client sent in TitleCase pass through untouched in both
directions.
- Change remapOAuthToolNames signature from `([]byte, bool)` to
`([]byte, map[string]string)`; populate at every rename site
(tools[], tool_choice.name, message tool_use, tool_reference,
nested tool_reference inside tool_result).
- Change reverseRemapOAuthToolNames and
reverseRemapOAuthToolNamesFromStreamLine to accept and consume the
per-request map; remove the global oauthToolRenameReverseMap.
- Update all three executor call sites (Execute, ExecuteStream direct
passthrough, ExecuteStream translated) + count_tokens.
- Add regression tests for the mixed-case scenario in both the
non-streaming and SSE code paths.