lywsvip/autoclip

Fork 0

mirror of https://github.com/zhouxiaoka/autoclip.git synced 2026-05-06 14:04:32 +08:00

Files

Kris Ka 308b442547 修复各种问题

2025-08-21 01:05:23 +08:00

8.6 KiB

Raw Blame History

🎤 语音识别模块重新设计

📋 概述

根据用户需求，我们对语音识别模块进行了全面重新设计，主要改进包括：

移除测试字幕数据 - 转写失败直接报任务失败，不再使用mock数据
支持多语种识别 - 支持中文、英文、日文、韩文等多种语言
支持多种API接入 - 支持本地Whisper、OpenAI API、Azure Speech Services等

🔧 主要改进

1. 移除测试字幕功能

之前的问题：

当语音识别失败时，系统会生成测试字幕文件
测试字幕内容不准确，影响后续处理质量
用户可能误以为处理成功

现在的改进：

完全移除测试字幕生成功能
语音识别失败时直接抛出异常
确保生产环境的数据质量

# 之前：返回None或测试字幕
result = generate_subtitle_for_video(video_path)
if result is None:
    # 生成测试字幕...

# 现在：失败时抛出异常
try:
    result = generate_subtitle_for_video(video_path)
except SpeechRecognitionError as e:
    # 处理语音识别失败
    logger.error(f"语音识别失败: {e}")
    raise

2. 多语种支持

支持的语言：

中文（简体/繁体）
英文（美式/英式）
日文
韩文
法文
德文
西班牙文
俄文
阿拉伯文
葡萄牙文
意大利文
自动检测

使用方式：

from shared.utils.speech_recognizer import generate_subtitle_for_video, LanguageCode

# 指定语言
result = generate_subtitle_for_video(
    video_path, 
    language=LanguageCode.CHINESE_SIMPLIFIED
)

# 自动检测语言
result = generate_subtitle_for_video(
    video_path, 
    language=LanguageCode.AUTO
)

3. 多种语音识别服务

支持的服务：

服务	特点	配置要求
本地Whisper	免费、离线、准确率高	安装whisper和ffmpeg
OpenAI API	准确率最高、支持多种语言	OpenAI API密钥
Azure Speech	企业级、功能丰富	Azure账户和API密钥
Google Speech	准确率高、支持高级功能	Google Cloud账户
阿里云语音	中文识别效果好	阿里云账户和API密钥

自动选择策略：

本地Whisper（推荐）
OpenAI API
Azure Speech Services
Google Speech-to-Text
阿里云语音识别

🚀 新的API接口

语音识别状态查询

GET /api/v1/speech-recognition/status

{
  "available_methods": {
    "whisper_local": true,
    "openai_api": false,
    "azure_speech": false,
    "google_speech": false,
    "aliyun_speech": false
  },
  "supported_languages": ["zh", "en", "ja", "ko", "auto"],
  "whisper_models": ["tiny", "base", "small", "medium", "large"],
  "default_config": {
    "method": "whisper_local",
    "language": "auto",
    "model": "base",
    "timeout": 300
  }
}

配置测试

POST /api/v1/speech-recognition/test

请求体：

{
  "method": "whisper_local",
  "language": "zh",
  "model": "base",
  "timeout": 300
}

安装指南

GET /api/v1/speech-recognition/install-guide?method=whisper_local

📝 配置管理

环境变量配置

# 语音识别方法
export SPEECH_RECOGNITION_METHOD="whisper_local"

# 语言设置
export SPEECH_RECOGNITION_LANGUAGE="zh"

# Whisper模型
export SPEECH_RECOGNITION_MODEL="base"

# 超时时间
export SPEECH_RECOGNITION_TIMEOUT="300"

# API密钥（根据选择的服务）
export OPENAI_API_KEY="your-openai-key"
export AZURE_SPEECH_KEY="your-azure-key"
export AZURE_SPEECH_REGION="your-region"
export GOOGLE_APPLICATION_CREDENTIALS="path/to/credentials.json"
export ALIYUN_ACCESS_KEY_ID="your-access-key"
export ALIYUN_ACCESS_KEY_SECRET="your-secret-key"
export ALIYUN_SPEECH_APP_KEY="your-app-key"

配置文件

在 data/settings.json 中可以配置：

{
  "speech_recognition_method": "whisper_local",
  "speech_recognition_language": "zh",
  "speech_recognition_model": "base",
  "speech_recognition_timeout": 300
}

🔍 错误处理

新的异常类型

from shared.utils.speech_recognizer import SpeechRecognitionError

try:
    result = generate_subtitle_for_video(video_path)
except SpeechRecognitionError as e:
    # 处理语音识别错误
    logger.error(f"语音识别失败: {e}")
    # 可以选择重试或使用其他方法

错误类型

服务不可用 - 指定的语音识别服务未安装或配置
文件不存在 - 视频文件不存在或无法访问
执行超时 - 语音识别处理超时
执行失败 - 语音识别服务执行失败
配置错误 - 参数配置不正确

📊 性能优化

Whisper模型选择

模型	大小	速度	准确率	适用场景
tiny	39MB	⭐⭐⭐⭐⭐	⭐⭐	快速测试
base	74MB	⭐⭐⭐⭐	⭐⭐⭐	日常使用
small	244MB	⭐⭐⭐	⭐⭐⭐⭐	高质量需求
medium	769MB	⭐⭐	⭐⭐⭐⭐⭐	专业用途
large	1550MB	⭐	⭐⭐⭐⭐⭐	最高质量

超时设置

短视频（<5分钟）：60秒
中等视频（5-30分钟）：300秒
长视频（>30分钟）：600秒

🛠️ 安装指南

本地Whisper安装

# 安装Python依赖
pip install openai-whisper

# 安装系统依赖
# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# macOS
brew install ffmpeg

# Windows
# 下载ffmpeg并添加到PATH

# 验证安装
whisper --help

API服务配置

OpenAI API

export OPENAI_API_KEY="your-api-key"

Azure Speech Services

export AZURE_SPEECH_KEY="your-api-key"
export AZURE_SPEECH_REGION="your-region"

Google Speech-to-Text

export GOOGLE_APPLICATION_CREDENTIALS="path/to/credentials.json"

阿里云语音识别

export ALIYUN_ACCESS_KEY_ID="your-access-key"
export ALIYUN_ACCESS_KEY_SECRET="your-secret-key"
export ALIYUN_SPEECH_APP_KEY="your-app-key"

🔄 迁移指南

从旧版本迁移

更新导入语句

# 旧版本
from shared.utils.speech_recognizer import generate_subtitle_for_video

# 新版本
from shared.utils.speech_recognizer import (
    generate_subtitle_for_video, 
    SpeechRecognitionError,
    LanguageCode
)

更新错误处理

# 旧版本
result = generate_subtitle_for_video(video_path)
if result is None:
    # 处理失败

# 新版本
try:
    result = generate_subtitle_for_video(video_path)
except SpeechRecognitionError as e:
    # 处理失败

移除测试字幕相关代码

# 删除这些代码
if method == "simple":
    return recognizer.generate_subtitle_simple(video_path, output_path)

📈 监控和日志

日志记录

import logging
logger = logging.getLogger(__name__)

# 语音识别开始
logger.info(f"开始语音识别: {video_path}")

# 语音识别成功
logger.info(f"语音识别成功: {output_path}")

# 语音识别失败
logger.error(f"语音识别失败: {error}")

性能监控

建议监控以下指标：

语音识别成功率
处理时间
错误类型分布
不同服务的使用情况

🎯 最佳实践

生产环境建议
- 使用 small 或 medium 模型
- 设置合理的超时时间
- 配置错误重试机制
多语言处理
- 优先使用自动语言检测
- 对于特定语言内容，明确指定语言代码
- 考虑使用专门的语音识别服务
错误处理
- 实现优雅的错误处理
- 提供用户友好的错误信息
- 考虑降级策略
性能优化
- 根据视频长度选择合适的模型
- 使用GPU加速（如果可用）
- 考虑并行处理多个视频

🔮 未来计划

实现更多API服务
- 百度语音识别
- 腾讯云语音识别
- 华为云语音识别
增强功能
- 说话人分离
- 情感识别
- 关键词提取
性能优化
- 流式处理
- 缓存机制
- 分布式处理

📞 技术支持

如果遇到问题，请：

检查日志文件中的错误信息
验证语音识别服务是否正确安装
确认配置文件是否正确
查看API文档和安装指南

更多帮助请参考：

8.6 KiB Raw Blame History Unescape Escape