forked from ecker/tortoise-tts
Noticed that the autoregressive batch size was being set off of VRAM size. Adjusted to scale for the VRAM capacity of 90 series GPUs. In this case, 16 -> 32 batches. Using the standard pre-set with ChungusVGAN, I went from 16 steps to 8. Over an average of 3 runs, I achieved an average of 294 seconds with 16 batches, to 234 seconds with 32. Can't complain at a 1.2x speed increase with functionally 2 lines of code. Can't complain. I restarted tortoise each run, and executing ```torch.cuda.empty_cache()``` just before loading the autoregressive model to clean the memory cache each time. |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| audio.py | ||
| device.py | ||
| diffusion.py | ||
| stft.py | ||
| text.py | ||
| tokenizer.py | ||
| torch_intermediary.py | ||
| typical_sampling.py | ||
| wav2vec_alignment.py | ||