mirror of https://github.com/timerring/bilive.git synced 2026-05-08 22:37:16 +08:00

Files

John Howe 4ee9f9b476 refactor: refactor configurations (#135 )

* refactor: refactor configurations
* docs: revise docs

2024-12-13 19:22:32 +08:00

whisper 参数模型

本项目采用 OpenAI 开源的 whisper 模型进行 Automatic Speech Recognition (ASR) 任务。

模型信息

模型基本参数参数及链接如下，注意 GPU 显存必须大于所需 VRAM：

Tip

如果追求识别准确率，推荐使用参数量 small 及以上的模型。

Size	Parameters	Multilingual model	Required VRAM
tiny	39 M	`tiny`	~1 GB
base	74 M	`base`	~1 GB
small	244 M	`small`	~2 GB
medium	769 M	`medium`	~5 GB
large	1550 M	`large`	~10 GB

用 Nvidia 显卡加速 ffmpeg 渲染过程，每个任务所需的 VRAM 约为 180 MB。whisper 模型运行所需显存如上表所示。因此可以大约计算所需显存。

以 small 模型为例:

Warning

请一定保证 GPU 显存大于计算结果，否则会爆显存，RuntimeError: CUDA out of memory.。

请将 src/config.py 文件中的 Inference_Model 参数设置为模型对应Size名称，如 tiny，base，small，medium，large。
将对应的模型文件下载，并放置在 src/subtitle/models 文件夹中。
重新运行 ./scan.sh 脚本。