Add documentation for new v2.10.0 features in both English and Chinese: - Favorites: example application to workspaces - Models: provider-specific request details and capability tags - Prompt Garden: direct use vs favorite saving guide - Testing: Run All parallel execution说明 - Desktop: localhost/private network direct routing
7.8 KiB
Model Management
This page answers the most practical first-time question:
what do you need to configure first so Prompt Optimizer can actually run?
If you are still going through first-time setup, read this together with Quick Start.
The entry point is the Model Management button in the top-right corner.
!!! note For your first run, do not configure too many providers at once. One working text model is more useful than a long unfinished provider list.
First-time users: only do these 3 steps
- Add one text model
- Run one optimize / test / evaluate flow in a text workspace
- Only then decide whether you need a second text model or an image model
Most first-time users do not need a large model list.
Minimum working setup
| Your goal | Minimum setup |
|---|---|
| Start using text workspaces | 1 text model |
| Compare results | 2 text models |
| Use image workspaces | 1 text model + 1 image model |
| Use reference-image replication or style learning in text-to-image | 1 text model + 1 image model + 1 image recognition model |
This is enough to understand at first
Text model vs image model
- Text models handle left-side analysis, optimization, iteration, and text-side testing/evaluation
- Image models only handle actual image generation on the right side
Left-side model vs right-side model
In text workspaces:
- left-side model: analyzes and improves prompts
- right-side model: executes prompts and produces evidence
They can be the same model, but they do not have to be.
How to configure models for the first run
Case A: you just want the app to work
Configure one text model.
That one model is enough to start:
- left-side analysis / optimization
- right-side testing
- right-side Result Evaluation
- right-side Compare Evaluation
Case B: you want real result comparison
Configure two text models:
- one main model
- one comparison model
This makes it easier to tell whether the difference comes from the prompt or from the model.
Case C: you want image workspaces
Configure at least:
- one text model
- one image model
Because:
- the left side still uses a text model to improve image prompts
- the right side uses an image model to generate the actual image
Case D: you want reference-image actions inside text-to-image
If you want to use:
- reference-image replication
- style learning
- prompt-variable extraction from images
you also need an image recognition model.
Those actions are not normal image generation. They first require a model that can understand the image and turn it into prompt clues or variables.
Recommended setup order
Step 1: add one text model
Choose the provider you know best and can connect with the least friction.
Step 2: make sure connection testing succeeds
After you add the model, run Test Connection.
Step 3: run one text workspace
The simplest starting points are:
If you can complete:
- left-side optimization
- right-side testing
- one evaluation
then your minimum setup is already good enough.
Step 4: add more models only when needed
Add a second text model only if you want comparison. Add an image model only if you are entering image workspaces.
Three common connection patterns
1. Public model platforms
Examples:
- OpenAI
- Gemini
- DeepSeek
- SiliconFlow
In most cases you only need:
- choose the provider
- paste the API key
- select the model
- run connection testing
Some providers have provider-specific request details:
- OpenAI-compatible text models may use either Chat Completions or Responses request style, depending on the configured provider and model capability.
- DeepSeek configurations can expose thinking or reasoning parameters in advanced settings. If output behavior looks different from what you expect, check whether those parameters are enabled.
2. Ollama
If you run Ollama locally, use the built-in Ollama provider.
Typical behavior:
- default endpoint:
http://localhost:11434/v1 - API key often not required
- model list can refresh from your installed local models
3. Custom
If your service is OpenAI-compatible, use Custom.
Typical cases:
- LM Studio
- internal company gateway
- self-hosted OpenAI-compatible service
- any service that needs a custom base URL
Example:
Provider: Custom
Base URL: https://your-api.example.com/v1
Model: your-model-name
API Key: fill based on your service
If connection fails, then check deployment and environment
Web / hosted version
The browser sends requests directly to your model service, so you may hit:
- CORS
- mixed content when HTTPS pages call local HTTP endpoints
Desktop app
Usually better for:
- Ollama
- LM Studio
- local network services
- internal APIs
- custom gateways with browser restrictions
Docker
Docker packages the web UI and MCP together, but the page still runs in the browser, so browser restrictions still matter.
Related pages:
Supported text providers
The current codebase currently includes:
- OpenAI
- Gemini
- Anthropic
- DeepSeek
- SiliconFlow
- Zhipu AI
- DashScope
- OpenRouter
- ModelScope
- MiniMax
- Ollama
- Custom (OpenAI-compatible endpoints)
What the model manager can do
In addition to add / edit / delete, the text-side manager supports:
- connection testing
- cloning configs
- refreshing model lists
- advanced parameters
- provider-specific API-key links for some providers
The image-side manager supports:
- add / edit / clone / delete
- enable / disable
- connection testing
- preview test image
- provider / model / capability tags
Built-in image presets may expose capability differences between model versions. For example, Seedream 4.5 supports multi-image scenarios, while Seedream 5.0 Lite has its own default settings. Prefer checking the capability tags in the model manager instead of assuming from the model name alone.
There is also a function-model area for image recognition.
If you want image extraction, reference-image replication, or style learning, do not stop at text and image generation models. Make sure the image recognition model is configured too.
How to tell whether setup is already good enough
You can stop tuning model setup for now if all three are true:
- at least one text model passes connection testing
- you can produce one real result in a text workspace
- you can run one evaluation on that result
Where configuration is stored
- web / hosted version: current browser storage
- desktop app: local application data
- extension: extension-local storage
If you need backup or migration, use Data Management.
Common questions
Connection test passes, but real runs still fail
Common reasons:
- quota or billing limits
- wrong model name
- browser-side CORS / mixed-content blocking
- left-side model and right-side model are not what you thought they were
Do I need many models on day one?
No. In most cases:
- one text model is enough for text workspaces
- add a second text model only for comparison
- add image models only for image workspaces
I configured a model, but the app still won’t run
Check these first:
- did connection testing actually succeed?
- is this a text model when the page expects text?
- are you in a browser trying to call a local HTTP endpoint?
- does this workspace also need an image model or additional inputs?