# whisper model parameters

This project uses the [OpenAI's open source whisper model](https://github.com/openai/whisper) for Automatic Speech Recognition (ASR) tasks.

## Model information
The basic parameters and links of the model are as follows, note that the GPU VRAM must be greater than the required VRAM:

> [!TIP]
> If you pursue recognition accuracy, it is recommended to use models with parameters `small` or above.

|  Size  | Parameters | Multilingual model | Required VRAM |
|:------:|:----------:|:------------------:|:-------------:|
|  tiny  |    39 M    |       [`tiny`](https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt)       |     ~1 GB     |
|  base  |    74 M    |       [`base`](https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt)       |     ~1 GB     |
| small  |   244 M    |      [`small`](https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt)       |     ~2 GB     |
| medium |   769 M    |      [`medium`](https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt)      |     ~5 GB     |
| large  |   1550 M   |      [`large`](https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt)       |    ~10 GB     |


## Calculate VRAM requirements

Use Nvidia GPU to accelerate the rendering process of ffmpeg, each task requires approximately 180 MB of VRAM. The VRAM required for the `whisper` model is as shown in the table above.

Therefore, you can roughly calculate the required VRAM.

For example, using the `small` model:
+ If using the `pipeline` mode, since it runs in parallel, at least 180 + 2620 = 2800 MB of VRAM is required.
+ If using the `append` or `merge` mode, at least 2620 MB of VRAM is required.

> [!WARNING]
> Please ensure that the GPU VRAM is greater than the calculated result, otherwise the VRAM will be exhausted, resulting in `RuntimeError: CUDA out of memory.`

## Change model method

1. Please set the `Inference_Model` parameter in the `bilive.toml` file to the corresponding model size name, such as `tiny`, `base`, `small`, `medium`, `large`.
2. Download the corresponding model file and place it in the `src/subtitle/models` folder.
3. Re-run the `./scan.sh` script.