- Added `RefreshAPIKeyModelAlias` for explicit alias table rebuilds.
- Introduced deferred rebuild support with `WithDeferredAPIKeyModelAliasRebuild` and context flag validation.
- Implemented `openAICompatibilityRegistrationCache` to streamline OpenAI compatibility model registrations.
- Updated executor and model registration workflows to utilize cached compatibility data, improving efficiency in batch operations.
- Adjusted max worker limits dynamically based on model categories.
Closes: #3953
- Added `rpcPluginError` to encapsulate plugin errors with HTTP status codes.
- Enhanced `decodeEnvelopeResult` to preserve and return detailed plugin errors with status codes.
- Introduced `isPluginErrorEnvelope` to identify plugin error envelopes.
- Updated plugin call logic in Unix and Windows loaders to differentiate plugin errors from system errors.
- Added unit tests to verify error handling and status code preservation.
- Added logic to expand single auth JSON payloads into multiple plugin virtual auth records.
- Updated related API endpoints such as `PatchAuthFileStatus` and `DeleteAuthFile` to handle plugin virtual auths with rollback mechanisms.
- Introduced `NormalizePluginOAuthCallbackProvider` and other normalization functions for better handling of OAuth callbacks.
- Enhanced tests to validate multi-auth parsing, rollback behavior, and API response consistency.
- Added `applyCodexClientNonTemplatePriorities` to assign higher priorities to non-template Codex client models dynamically.
- Implemented `maxCodexClientTemplatePriority` to set base priority for non-template models relative to template models.
- Updated unit tests to validate priority calculation for custom models.
- Introduced a new "max" level for reasoning depth in Codex client model configuration, providing maximum problem-solving capability.
- Added `service_tiers` field to model responses for better tier categorization.
- Updated unit tests to validate the inclusion and default behavior of `service_tiers` and the new "max" reasoning depth.
- Introduced a new `/reset-quota` API endpoint in the management handler to clear quota and cooldown state for auth records.
- Implemented `ResetQuota` method in the auth manager to handle runtime and registry state resets for affected models.
- Added tests to validate quota reset behavior, including proper state updates and registry consistency.
- Refactored utility functions to support deduplication and registered models handling in quota resets.
Closes: #3866
- Introduced `SetOAuthModelAliasesAttribute` and `OAuthModelAliasesFromAttributes` for managing per-auth model aliases.
- Enhanced OAuth model resolution logic to prioritize per-auth aliases over global aliases.
- Updated metadata handling to extract and sanitize per-account model aliases.
- Added tests to validate alias precedence, empty attributes, and conflict scenarios.
Closes: #3764
- Replaced direct `strings.ToLower` usage with `util.OpenAICompatibleProviderKey` for generating provider keys.
- Updated auth and executor workflows to use namespaced keys for OpenAI-compatible providers.
- Adjusted tests to validate namespaced key handling, including new test cases for provider registration and execution logic.
- Added `OpenAICompatibleProviderKey` helper in `util` for consistent key transformations.
Closes: #3600
- Introduced `CooldownStateStore` interface for managing independent cooldown state persistence.
- Implemented `FileCooldownStateStore` for storing cooldown states as per-auth `.cds` files with atomic writes and stale file cleanup.
- Enhanced `Manager` to support restoring state from `CooldownStateStore` and persisting state changes during auth updates.
- Updated tests to validate cooldown state saving, loading, concurrency handling, and error scenarios.
Closes: #3368
- Introduced `SetTransientErrorCooldownSeconds` to enable configurable cooldowns for transient errors (e.g., 408/500/502/503/504).
- Updated retry scheduling logic to use the new `nextTransientErrorRetryAfter` function.
- Modified config parsing to include `transient-error-cooldown-seconds` with support for disabling or defaulting to legacy behavior.
- Expanded tests to validate transient cooldown logic with various configurations and edge cases.
Closes: #3315
- Refactored `ConfigReloadHook` to use `reloadConfigFromWatcher` for consistency.
- Added async `reloadConfigAfterManagementSaveAsync` to handle post-save operations.
- Introduced `ReloadConfigIfChanged` in watcher for manual trigger support.
- Enhanced config reload paths to separate auth synthesis from standard updates.
- Updated `applyConfigUpdate` logic to allow more granular reload behaviors.
Closes: #3235
- Updated error handling in `RPopAuth` to distinguish `auth_not_found` from transport errors.
- Added a new test, `TestPickNextViaHomeClassifiesTransportErrorsAsHomeUnavailable`, to validate correct error classification and retryable property.
- Introduced the `gpt-image-2` model in Codex built-ins and updated visibility logic in the registry.
- Added direct proxy support for OpenAI image generation and editing endpoints.
- Implemented new execution paths for `/images/generations` and `/images/edit`, ensuring seamless handling for both JSON and multipart payloads.
- Expanded test coverage to validate the new model and direct proxy features, including streaming scenarios and error handling.
- Deleted `geminicli` provider and related `Apply` logic.
- Removed all translator packages specific to Gemini CLI (Claude, Codex integrations).
- Purged associated test files for Gemini CLI translation.
- Removed `GeminiAuthenticator` and all associated authentication logic (OAuth flows, token handling, refresh logic).
- Deleted internal/executor Gemini OAuth support, including bearer token handling and runtime API logic.
- Purged all tests, configs, and command-line flags specific to Gemini OAuth flows.
- Updated documentation and aliases to reflect Gemini removal.
- Renamed `parseRetryDelay` to `ParseRetryDelay` and `deleteJSONField` to `DeleteJSONField`.
- Updated references in `antigravity_executor` and tests to use the new `helps` package.
- Adjusted import paths and test cases to ensure compatibility with the new location.
- Updated README files to reflect changes in the retry logic references.
- Updated `.github/ISSUE_TEMPLATE/bug_report.md` to remove deprecated Gemini CLI mention.
- Added auth binding logic to tie video requests to specific authentication IDs.
- Enhanced video content handlers to support proxy configuration based on selected auth.
- Introduced helper functions for creating HTTP clients with direct or global proxy fallback.
- Expanded unit tests to validate auth binding, proxy usage, and fallback behavior.
- Updated `openai_videos_handlers` to extract and set `video_url` from payloads when available.
- Enhanced unit tests to validate correct `video_url` extraction and inclusion in responses.
* feat(plugin): add ModelRouter before auth with single-slot routing targets
## Motivation
Plugins that need to change execution based on the **original inbound request**
(protocol format, raw body, headers, query, stream flag, metadata, etc.) often
resorted to virtual/trampoline models or routing inside interceptors. This
commit adds **ModelRouter**: a pluggable layer **before** model-to-provider
resolution and AuthManager credential selection, so plugins can declare who
executes a request without spoofing the client model name.
This is a **new capability**, not a bugfix on the existing chain. With no
ModelRouter plugins loaded, behavior matches upstream.
## Pipeline placement
- `execute`, `stream`, and `count` (and image paths via AuthManager) call
`applyModelRouter()` before building `coreexecutor.Request`.
- Routing runs **before** the request interceptor (before auth), so routers see
the client’s original context. After a plugin executor is chosen, the existing
**after-auth interceptor → response/stream interceptor** chain still applies.
- Internal `ExecuteModel` / `ExecuteModelStream` (host callbacks) support
`SkipRouterPluginID` so nested calls do not re-enter the same router.
## Routing API (single slot, mutually exclusive)
`ModelRouteResponse` uses **one target slot** to avoid ambiguity when both
`TargetExecutorPluginID` and `TargetProvider` were set and the host ignored one:
| Field | Meaning |
|-------|---------|
| `Handled` | `false`: this router declines; try the next router or default path |
| `TargetKind` | `self` \| `executor` \| `provider` (pick one) |
| `Target` | `self`/`executor`: plugin ID; `provider`: built-in provider key |
| `TargetModel` | Optional on `provider` only; empty keeps client `RequestedModel` |
| `Reason` | Optional diagnostic text |
- **self**: the router plugin’s own executor (`Target` normalized to the router’s plugin ID).
- **executor**: another plugin’s executor; host pre-checks with `executorPluginReady()`
(executor declared and provider identifier resolvable) to avoid handled routes that 500 at execution.
- **provider**: skip registry model resolution; fixed built-in AuthManager path; optional
`TargetModel` for execution model only—**does not** change outward requested-model metadata.
Routers run in **descending plugin priority** (tie-break: ascending plugin ID). Panic, error,
invalid target, or unavailable executor/provider → log and **fall through to the next router**;
if none handle, use the original provider+auth flow.
## Context exposed to routers
`ModelRouteRequest` includes:
- `SourceFormat`, `RequestedModel`, `Stream`
- `Headers`, `Query`, `Body` (defensive copies)
- `Metadata` (best-effort read-only context snapshot)
- `AvailableProviders`: built-in provider keys with at least one **non-disabled** auth
(`AuthManager.AvailableProviders()`). **Does not** reflect per-model cooldown or transient
unavailability—treat as an optimistic snapshot.
Adds `AuthManager.HasProviderAuth()` and `AvailableProviders()`, excluding `Disabled` and
`StatusDisabled` auths consistently with credential selection.
## Host and RPC
- Go plugins: `pluginapi.ModelRouter` + `RouteModel()`.
- RPC plugins: `pluginabi.MethodModelRoute` (`model.route`), capability flag `model_router`.
- `pluginhost.Host` implements `RouteModel` / `RouteModelExcept`; handlers use
`SetModelRouterHost` or a `PluginHost` type assertion; **direct executor** paths use
`ExecutePluginExecutor*` / `CountPluginExecutor`.
- No bundled example ModelRouter plugin; capability is active only when a third-party plugin
declares `model_router` and loads.
## Plugin RPC schema (policy A, upstream-aligned)
- `pluginabi.SchemaVersion` stays **1**: capability additions (`model_router`, `model.route`)
do not bump the number; increment only on breaking RPC JSON changes.
- Host sends `schema_version` at register; reject only if the plugin declares a **higher**
version than the host.
- No unpublished “ModelRouter requires schema ≥ 3” gate (v3 single-slot API was never public).
- Existing plugins and examples without `model_router` (`schema_version: 1`) need no changes.
- RPC ModelRouter: `schema_version: 1` + `model_router: true` + implement `model.route`.
## Path consistency within this commit
- Provider routes reuse image-only model checks (e.g. `gpt-image-2`) on the normalized model,
same as the default AuthManager path.
- `count` aligned with execute/stream: `SkipRouterPluginID`, query/headers injection,
interceptor skip semantics.
- Handlers: `modelRoutersEnabled` treats hosts without `HasModelRouters` as disabled
(same as before ModelRouter existed); `pluginhost.Host` implements the detector.
- API docs: `ModelRouter` explicitly includes built-in **provider** targets (in addition to
plugin executors and the router’s own executor).
## Testing
go test ./internal/pluginhost ./sdk/api/handlers ./sdk/pluginapi ./sdk/pluginabi ./sdk/cliproxy/auth
go build -o test-output ./cmd/server && rm test-output
go test ./...
* fix(handlers): address ModelRouter review feedback
- Use modelExecutionQuery for plugin executor and AuthManager paths so
inbound URL query matches router/header behavior
- Guard queryFromContext when gin Request.URL is nil
- Read plugin executor stream chunks via nextStreamChunk to exit on cancel
- Drop redundant clonePluginMetadata on capability record meta
Tests cover query propagation, stream cancel, and nil URL safety.
* feat(plugin): add Claude web search router example
Add a Claude Code web_search ModelRouter example that can route matching Claude requests through Antigravity, Codex, xAI, or Tavily.
The plugin includes executor orchestration, backend fallback/penalty handling, Tavily API key support, Claude-compatible response assembly, stream forwarding, and focused unit coverage for detection, fallback routing, model resolution, penalties, stream forwarding, and Tavily behavior.
Verification: go test -count=1 ./... in examples/plugin/claude-web-search-router/go; go build -buildmode=c-shared for the plugin; go build ./cmd/server; live local CPA curl coverage for plugin load, four explicit routes, fallback, and Codex spark routing.
* fix(pluginhost): validate executor routes before fallback
* fix(pluginhost): skip oauth-only executor routes
- Introduced `videoAuthBindingStore` for managing mappings of video IDs to credentials with TTL support.
- Updated video creation and retrieval handlers to bind and utilize credentials for authentication.
- Enhanced response models to include upstream models and adjusted request preparation logic.
- Added test coverage for video auth binding, TTL configuration, and expiration handling.
- Implemented helper methods `IsConfigAPIKeyAuth` and `toggleConfigAPIKeyExcludedAll` for managing config API key exclusions.
- Updated API request handling to support enabling/disabling config API key exclusion patterns.
- Added test coverage to validate exclusion toggling logic and persistence behavior.
- Refactored duplicate code for identifying config API key auth entries into reusable utilities.
- Developed `XAIWebsocketsExecutor` for handling xAI Responses via WebSocket transport.
- Introduced session and state management with `codexWebsocketSessionStore` and `xaiWebsocketIDStateStore`.
- Added robust ID mapping for upstream and downstream request/response sequences.
- Enhanced error propagation and handling of WebSocket terminal events.
- Included utility methods for WebSocket request preparation, connection management, and state tracking.
- Added foundational support for compact and streamed responses via enhanced session tracking.
- Implemented `websocketDirectCaptureExecutor` for Codex websocket passthrough functionality.
- Added logic to bypass incremental state handling for passthrough models.
- Updated normalization, compaction, and replay handling to support passthrough mode.
- Introduced `responsesWebsocketUsesCodexWebsocketPassthrough` utility for model-specific passthrough determination.
- Expanded test coverage for websocket passthrough scenarios, including compaction and response validation.
- Introduced `/openai/v1/videos` endpoint to support OpenAI-specific video generation.
- Added error normalization and handling for OpenAI video resources, including detailed error propagation.
- Enhanced response structure to include OpenAI-specific fields for status, progress, and model mappings.
- Implemented new handlers for video content retrieval and error scenarios.
- Expanded test coverage to validate OpenAI video support, error handling, and backend compatibility.
- Enhanced Codex Websockets Executor to capture `response.done` as a terminal event, alongside `response.completed` and `error`.
- Improved error propagation for upstream websocket errors with comprehensive message handling.
- Introduced utility functions for recognizing terminal events and extracting error messages.
- Expanded tests to validate new websocket event logic, including terminal event handling and upstream error propagation.
- Introduced `auth_callbacks` for handling host authentication list, get, runtime, and save operations.
- Added extensive unit tests to validate functionality, including disk fallback and runtime-specific cases.
- Created example implementation in Go to demonstrate host callback integrations.
- Updated Antigravity Credits fallback to handle KV store unavailability as a service error.
- Enhanced signature caching mechanisms with request-time KV access and sliding expiration.
- Added and improved tests for KV client interactions, including error handling and expiration behaviors.
- Introduced `CacheSignatureBestEffort` for non-critical signature caching and clarified function flows with required context.
- Ensured consistent error reporting for missing or unavailable KV stores in various scenarios.
- Replaced direct `homekv` calls with injectable KV client interfaces for `antigravity` and `codex_reasoning_replay` modules.
- Improved error reporting and handling for KV operations, including `KVGet`, `KVSet`, `KVDel`, and `KVExpire`.
- Introduced dedicated fake KV clients for expanded and granular test coverage.
- Added new unit tests to validate KV client behaviors and error scenarios, ensuring robustness and sliding expiration functionality.
Add a native Antigravity WebSearch path for Claude typed WebSearch requests.
Detect Claude Messages requests whose tools are only typed WebSearch tools
(web_search_20250305 / web_search_20260209), and convert them into an
Antigravity requestType=web_search payload instead of sending the request
through the normal tool-calling path.
Preserve the user's requested model. The native path is enabled only when that
Antigravity model is known to support Google Search. Capability data fetched
from Antigravity model info is used only as an enhancement to the local model
registry, not as a replacement for the existing registry fallback behavior.
Unsupported models keep the existing Antigravity request behavior and are not
silently rerouted to another web-search-capable model.
Translate Claude WebSearch request options to the verified Antigravity
googleSearch shape:
- max_uses -> googleSearch.enhancedContent.imageSearch.maxResultCount
- allowed_domains -> googleSearch.includedDomains
Leave blocked_domains and user_location unmapped because the Antigravity
googleSearch request shape has no verified equivalent for them. This avoids
sending speculative fields or pretending unsupported Claude WebSearch options
are enforced upstream.
Translate Antigravity web-search responses back into Claude-compatible output:
server_tool_use blocks, web_search_tool_result blocks, cited text blocks,
grounding URLs, and usage-compatible stream/non-stream responses.
Cover the behavior with tests for request conversion, response conversion,
grounding URL resolution, domain filter mapping, fetched capability hints,
excluded-model handling, and unsupported-model behavior.
- Introduced logic to handle plugin unloading during updates to prevent conflicts with loaded plugins.
- Preserved existing plugin configurations during updates, ensuring seamless transitions and maintaining custom fields.
- Added support for reloading the configuration after management saves changes.
- Enhanced unit tests to validate unloading, configuration preservation, and reloading behaviors.
- Introduced test scenarios to validate `previous_response_id` injection during incremental and non-incremental requests.
- Verified behavior for pending tool calls, including proper inclusion or exclusion in websocket requests.
- Updated websocket handling logic to track `lastResponseID` and `pendingToolCallIDs`.
- Added utility functions for pending tool call validation and cleanup.
- Updated host model callback logic to skip originating plugin's interceptors during nested model executions.
- Added `SkipInterceptorPluginID` field to plugin API structs for controlling interceptor bypass behavior.
- Introduced supporting logic in host API handlers, plugin host registry, and callback contexts to identify and skip specific plugins.
- Enhanced unit tests across plugin host, API handlers, and execution paths to verify interceptor skipping behavior and plugin isolation.
- Revised documentation to clarify non-recursive behavior of host model callbacks and the use of `SkipInterceptorPluginID`.
- Added an example plugin `host-model-callback` in Go to summarize host model callbacks.
- Implemented `cliproxy_plugin_init`, `cliproxyPluginCall`, and other plugin functions for callback handling.
- Introduced API handlers for `ModelExecution` and `ModelExecutionStream` with support for both streaming and non-streaming requests.
- Included unit tests (`model_execution_test.go`) to validate execution logic and streaming responses.
- Introduced `applyRequestAfterAuthInterceptor` to modify requests after credential selection and before executor translation.
- Added `InterceptRequestAfterAuth` method across plugin adapters with corresponding tests for context validation.
- Enhanced format resolution logic (`requestToFormat`) to support additional providers and formats.
- Updated JavaScript handler to include a new `on_after_auth_request` hook for post-auth request handling.
- Refactored interceptor methods for clarity and better encapsulation of request/response lifecycles.
- Added `resources` field in `management.register` for defining browser-accessible resources.
- Updated examples and documentation to reflect resource-based paths under `/v0/resource/plugins/<pluginID>/...`.
- Replaced legacy `GET` menu routes with resource-based implementations for consistent plugin behavior.
- Enhanced request handling for resource paths, including proper response headers and streamlined test coverage.
- Removed `pluginDelegateRoundRobin` and related logic to streamline plugin scheduler management.
- Consolidated scheduler strategies under `builtinSchedulerStrategy` with `pickViaBuiltinScheduler`.
- Introduced new methods `pickSingleWithStrategy` and `pickMixedWithStrategy` for strategy-specific behavior.
- Updated tests to reflect changes, including added coverage for round-robin and mixed-provider scenarios.
- Improved maintainability by unifying scheduling logic and reducing redundant structures.
- Added `opts.OriginalRequest` handling to `applyResponseInterceptors` for improved context passing.
- Introduced new test `TestApplyJSBeforeRequestUsesReturnedCtxBody` to validate JavaScript interceptor behavior.
- Updated JavaScript-based handler to safely rewrite sensitive content and headers in requests.
- Refined interceptor logic to ensure consistent state retention across request processing.
- Added `pluginSchedulerState` interface with `HasScheduler` method for improved plugin scheduler state checks.
- Updated `Manager.hasPluginScheduler` to handle `HasScheduler` logic.
- Implemented and tested fast-path handling for inactive plugin schedulers, including mixed provider scenarios.
- Expanded unit test coverage to ensure correct behavior in various scheduler states.
- Added a Go scheduler plugin demonstrating CLIProxyAPI capabilities, such as `plugin.register`, `plugin.reconfigure`, and `scheduler.pick`.
- Implemented methods for plugin configuration, built-in scheduler delegation (`fill-first`, `round-robin`), dynamic candidate selection, and error handling.
- Extended `pluginhost` with scheduler handling, candidate normalization, and fallback mechanisms.
- Included examples, tests, and detailed documentation for scheduler usage and implementation.
- Introduced `FrontendAuthProviderExclusive` capability to restrict authentication to a single selected provider.
- Added `SetExclusiveProvider` and `ClearExclusiveProvider` methods for managing exclusive providers in the access registry.
- Updated `pluginhost` to prioritize and enforce exclusive providers based on plugin priority and ID.
- Enhanced RPC capabilities schema to include `FrontendAuthProviderExclusive` field.
- Added example plugin and tests for exclusive frontend auth behavior.
- Implemented `RequestInterceptor`, `ResponseInterceptor`, and `StreamChunkInterceptor` capabilities.
- Added `sanitizePluginMetadata` to clean metadata for RPC compatibility.
- Enhanced interceptor chaining, error handling, and test coverage.
- Updated plugin configuration to register and dispatch interceptor methods.
- Removed `examples/plugin/main.go` and `internal/pluginhost/loader_plugin.go` after migrating to a more modular system.
- Introduced `streamBridge` in `internal/pluginhost/stream_bridge.go` for efficient stream handling and communication.
- Added examples of `thinking` plugins written in both Rust and Go under `examples/plugin/thinking`.
- Enhanced test coverage for plugin host system changes, including stream chunk translation and thinking logic.
- Improved API compatibility and ensured backward-compatible upgrades for plugin execution.
- Implemented command-line flag registration and execution for plugins with priority-based conflict resolution.
- Enabled plugin-owned command-line flag execution and persistence of plugin-auth data.
- Added new `Host` methods to support command-line capabilities, including flag normalization, validation, and execution state management.
- Introduced unit tests to ensure coverage for command-line plugin functionality, including auth data persistence.
- Updated configs to normalize plugins during initialization.