And you can only really get good compute with Ada cards (4070Ti and up) or multiple Ampere cards
I mean if that's the case I'll have a second 3090 with NVLINK sometime next month, so maybe…
For zero-shot inferencing applications, diversity (ick) is a HUGE factor in having a good model. There's only so much data to sample from when trying to mimic voices. I worry that when I finally…
Each new line restarts the voice process. IMO, if you find a line you like, you should use that as your voice.
Otherwise you're asking a complete system rework.
Cuda can keep things cached, I have torch.cuda.empty_cache() added to get_device so it trigger's every time the TTS system is reloaded. Supposedly stuff can remained cached, which can cause weird…
There's a seperate config file for it. Here's the raw JSON.
Also, funny joke ;)
config.json
{
"resblock": "1",
"num_gpus": 0,
"batch_size": 32,
"learning_rate":…
You can jury rig it a bit. It's to do with whisperX. If you're not using it just do the following in powershell. Had the same issue on linux and this worked for me.
./venv/scripts/activate…
Yeah that sounds like a good middle ground. It's only the english model that get's this benefit anyway.
Thanks for implementing this so quickly, and its pretty neato that it's having a noticeable effect.
I had this error on windows. I fixed it by dropping ffmpeg.exe into the root folder of the repo.