Files
zqq61 1ad755aa4e feat: Add vLLM engine support and API enhancements
Major updates:
- Add vLLM engine support with automatic fallback to HuggingFace
- Complete REST API implementation with sync/async modes
- Add comprehensive API documentation
- Organize scripts into dedicated directory

API Features:
- Support both HuggingFace and vLLM inference engines
- Sync and async generation endpoints
- Task queue management with concurrency control
- Health check with engine information
- Automatic file cleanup

Configuration:
- Environment variable based configuration
- Engine validation and auto-fallback
- Configurable concurrency limits

Documentation:
- README_API.md: Complete API usage guide
- CHANGELOG_API.md: API version history
- VLLM_UPGRADE_SUMMARY.md: Detailed upgrade guide
- scripts/README.md: Scripts documentation

Scripts Organization:
- Move all test and utility scripts to scripts/
- Add configuration test script
- Add singleton pattern test

Performance:
- vLLM engine provides 2-3x speedup
- Better GPU memory utilization
- Support for prefix caching

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-04 00:22:04 +08:00
..