Hi @mrq, thank you for the response, I've already tried the tortoise-tts-fast fork, and it behaves the same. When I talk about not being able to control the output speaker, I don't have the…
Thank you very much for this awesome new feature, I wan't aware of that possibility. I've tried it and since I'm trying to add the clone of a heavy accented speaker, the merged model nails the…
Hi @mrq thanks for the response. I have tried dumping all speakers into the same folder (i've used LibriTTS dev clean part) and after 10 epochs the model is still solid in the outputs (which means…
Has anybody tried yet to fine tune on a large multi-speaker dataset? I've read on the other repo that we can put all the speakers' wavs in the same folder and the model figures out itself how to…
Thank you very much for the extended response, I've tried training with an italian dataset, with Text LR Ratio to 1 and with no change to the tokenizer. The results are decent but there is a…
I've tried training unofficial enhuiz Vall-E implementation but with my resources I wasn't going anywhere unfortunately, so I gave up. Have you got any success in training it?