! RETRAIN YOUR MODELS ! #103

New Issue

mrq · 2023-03-09T20:27:07Z

mrq commented

2023-03-09 20:27:07 +00:00

It seems I've made a grave mistake with not looking at the other DLAS repo, as it contained a small little tweak that helps finetunes that end up sounding like total trash.

It's a big enough improvement with implementing this that I must bring attention to it somehow, although I don't got much of a good way to go about it.

If you've also been affected with models sound like garbage (I'm not sure if there's a criteria on what voices would cause it, as it seemed more likely to happen if it was a non-male voice), please, please, please, retrain your finetunes after updating.

If you already finetuned with that repo, you're golden, and don't need to retrain.

I would suggest for smaller datasets (sub 100):

100 epochs
LR 0.0001
MultiStepLR
schedule: [9, 18, 25, 33, 50, 59]

to quickly train something to decent output.

It seems I've made a grave mistake with not looking at [the other DLAS repo](https://github.com/152334H/DL-Art-School), as it contained a small little tweak that helps finetunes that end up sounding like total trash. It's a big enough improvement with implementing [this](https://github.com/152334H/DL-Art-School/commit/ae80992817059acf6eef38a680efa5124cee570b) that I must bring attention to it somehow, although I don't got much of a good way to go about it. If you've also been affected with models sound like garbage (I'm not sure if there's a criteria on what voices would cause it, as it seemed more likely to happen if it was a non-male voice), please, please, *please*, retrain your finetunes after updating. If you already finetuned with that repo, you're golden, and don't need to retrain. I would suggest for smaller datasets (sub 100): * 100 epochs * LR 0.0001 * MultiStepLR * schedule: `[9, 18, 25, 33, 50, 59]` to quickly train something to decent output.

mrq added the

news

label 2023-03-09 20:27:07 +00:00

SyntheticVoices commented

2023-03-09 21:11:29 +00:00

I am getting some decent result on LR : 0.00009 /100-200 epochs on smaller dataset of say 10 to 30 mins. All other settings default apart from validting the training settings

Zapp Brannigan : https://vocaroo.com/1lT5i70dMj33 15 min dataset rougly

I am getting some decent result on LR : 0.00009 /100-200 epochs on smaller dataset of say 10 to 30 mins. All other settings default apart from validting the training settings Zapp Brannigan : https://vocaroo.com/1lT5i70dMj33 15 min dataset rougly

accountts commented

2023-03-09 22:18:23 +00:00

Unless I'm mistaken you did not implement the same way, you set -sub where he set sub

return text_logits[:, :sub]
vs
return text_logits[:, :-sub]

Is this intended ?

I still don't really understand that tortoise_compat setting either

Unless I'm mistaken you did not implement the same way, you set -sub where he set sub `return text_logits[:, :sub]` vs `return text_logits[:, :-sub]` Is this intended ? I still don't really understand that tortoise_compat setting either

mrq commented

2023-03-09 22:43:41 +00:00

If I did botch that then I'm going to scream, but that's what consistently staying up until 2AM gets. I'll check when I get a moment.

Anyways, the compat fix simply just makes the unified_voice2 model more inline with its implementation in tortoise-tts, as its like, 80% similar in code.
I just lazily applied his fix to it last night rather than derive it myself.

If I did botch that then I'm going to scream, but that's what consistently staying up until 2AM gets. I'll check when I get a moment. Anyways, the compat fix simply just makes the unified_voice2 model more inline with its implementation in tortoise-tts, as its like, 80% similar in code. I just lazily applied his fix to it last night rather than derive it myself.

👀 1

mrq commented

2023-03-09 22:49:29 +00:00

So I did. I suppose I'll have to retrain what I've been training today, since I imagine that's a pretty big problem.

So I did. I *suppose* I'll have to retrain what I've been training today, since I imagine that's a pretty big problem.

mrq referenced this issue

2023-03-09 22:54:49 +00:00

Training Error? Encountered 10 NaN losses in a row? #97

maki6003 commented

2023-03-10 00:32:38 +00:00

idk what i did to do this tbh... but i trained a model, i switched to the model in settings, and then in generating tab i selected the voice of the 5min audio i had. i then clicked recompute voice latents and then selected standard preset and hit generate... and now its Generating autoregressive samples. and its being super slow than the usual soo idk if i did something wrong with re computing voice latents or something idk...

update: it generated no voice.. nothing... after all the waiting time. not sure what went wrong

update on the update: it now generates voice, not very close to my voice though but 50 steps was closer than 100 steps hmmm. Its still slow at generating though idkk why

idk what i did to do this tbh... but i trained a model, i switched to the model in settings, and then in generating tab i selected the voice of the 5min audio i had. i then clicked recompute voice latents and then selected standard preset and hit generate... and now its Generating autoregressive samples. and its being super slow than the usual soo idk if i did something wrong with re computing voice latents or something idk... update: it generated no voice.. nothing... after all the waiting time. not sure what went wrong update on the update: it now generates voice, not very close to my voice though but 50 steps was closer than 100 steps hmmm. Its still slow at generating though idkk why

mrq commented

2023-03-10 02:01:53 +00:00

and its being super slow than the usual soo idk if i did something wrong with re computing voice latents or something idk...

I reverted my change to the routine that deduces sample batch sizes for generation (it seems it's haunted where it breaks if it ever gets touched), so you should be fine now to update.

A remedy to that is to manually set your sample batch size (which I heavily encourage to do, as the default tiers are very conservative).

> and its being super slow than the usual soo idk if i did something wrong with re computing voice latents or something idk... I reverted my change to the routine that deduces sample batch sizes for generation (it seems it's haunted where it breaks if it ever gets touched), so you should be fine now to update. A remedy to that is to manually set your sample batch size (which I heavily encourage to do, as the default tiers are very conservative).

st33lmouse commented

2023-03-10 05:12:08 +00:00

We could use a discussion tab for this git. A central place to compare notes and whatnot.

I noticed slowness, too, so I upped batch size, and that helped. I got lots of vram.

Are we sure that large datasets are the way to go for training? To capture a character in the Kohya SD script I use 16 images, that's it--and it works well. Very large datasets can actually cause trouble, not to mention slowing your training.

I get good resemblance out of a minute and a half of speech, which is what? 15 chunks?

We could use a discussion tab for this git. A central place to compare notes and whatnot. I noticed slowness, too, so I upped batch size, and that helped. I got lots of vram. Are we sure that large datasets are the way to go for training? To capture a character in the Kohya SD script I use 16 images, that's it--and it works well. Very large datasets can actually cause trouble, not to mention slowing your training. I get good resemblance out of a minute and a half of speech, which is what? 15 chunks?

mrq commented

2023-03-10 05:40:17 +00:00

We could use a discussion tab for this git.

Gitea doesn't have any feature like that. Although that's on me for using it over a gitlab instance, but oh well.

Are we sure that large datasets are the way to go for training?

I've had great luck with training against small datasets, sub-200 and sub-100 even. I've jsut been having issues with a large dataset since multi-GPU training is very particular when it comes to large datasets, so I've just been having my Japanese dataset trained on a paperspace A4000 and it's been training fine, but I haven't got a chance to test it.

> We could use a discussion tab for this git. Gitea doesn't have any feature like that. Although that's on me for using it over a gitlab instance, but oh well. > Are we sure that large datasets are the way to go for training? I've had great luck with training against small datasets, sub-200 and sub-100 even. I've jsut been having issues with a large dataset since multi-GPU training is very particular when it comes to large datasets, so I've just been having my Japanese dataset trained on a paperspace A4000 and it's been training fine, but I haven't got a chance to test it.

gannybal commented

2023-03-10 06:14:34 +00:00

just been having my Japanese dataset trained on a paperspace A4000 and it's been training fine, but I haven't got a chance to test it.

What do you think is the best way to trasncribe japanese speech? Is there a japanese whisper model? Do you have to transcribe to katakana?

> just been having my Japanese dataset trained on a paperspace A4000 and it's been training fine, but I haven't got a chance to test it. What do you think is the best way to trasncribe japanese speech? Is there a japanese whisper model? Do you have to transcribe to katakana?

mrq commented

2023-03-10 15:39:01 +00:00

What do you think is the best way to trasncribe japanese speech? Is there a japanese whisper model? Do you have to transcribe to katakana?

All three whisper implementations can transcribe to Japanese, just set the Language field to ja (or leave it blank to auto-deduce). I wouldn't use the default openai/whisper implementation for accuracy reasons, it'll trim the clips too liberally; WhisperX or WhisperCPP both work better than base Whisper.

I didn't do any editing desu, since it would be a pain to curate 15k lines for what would amount to maybe replacing a wrong kanji that sounds the same anyways. I wouldn't bother coercing them into bare kana, since the kanji should help train the text side of the AR model.

I'm waiting for a few more epochs of baking my Japanese finetune before testing it, although it looks pretty ready anyhow, as my reported loss is nearing the defacto loss.

> What do you think is the best way to trasncribe japanese speech? Is there a japanese whisper model? Do you have to transcribe to katakana? All three whisper implementations can transcribe to Japanese, just set the Language field to `ja` (or leave it blank to auto-deduce). I wouldn't use the default openai/whisper implementation for accuracy reasons, it'll trim the clips too liberally; WhisperX or WhisperCPP both work better than base Whisper. I didn't do any editing desu, since it would be a pain to curate 15k lines for what would amount to *maybe* replacing a wrong kanji that sounds the same anyways. I wouldn't bother coercing them into bare kana, since the kanji should help train the text side of the AR model. I'm waiting for a few more epochs of baking my Japanese finetune before testing it, although it looks pretty ready anyhow, as my reported loss is nearing the defacto loss. ![image](/attachments/dc47abba-bde8-4fe9-88d3-90e0cb2bd846)

image.png

30 KiB

maki6003 commented

2023-03-10 22:42:34 +00:00

not gonna lie, i dont really understand the graphs... what is good and what is bad lol?

mrq commented

2023-03-10 23:19:58 +00:00

not gonna lie, i dont really understand the graphs... what is good and what is bad lol?

#82 (comment)

> not gonna lie, i dont really understand the graphs... what is good and what is bad lol? https://git.ecker.tech/mrq/ai-voice-cloning/issues/82#issuecomment-772

zim33 commented

2023-03-12 12:12:24 +00:00

We could use a discussion tab for this git.

Gitea doesn't have any feature like that. Although that's on me for using it over a gitlab instance, but oh well.

If this project is gonna take off, and there are better features elsewhere, now is the perfect time to move. A discussion place would be nice rather than doing so in issues.

> > We could use a discussion tab for this git. > > Gitea doesn't have any feature like that. Although that's on me for using it over a gitlab instance, but oh well. If this project is gonna take off, and there are better features elsewhere, now is the perfect time to move. A discussion place would be nice rather than doing so in issues.

mrq closed this issue

2023-03-13 17:39:13 +00:00

hman360 referenced this issue

2023-03-22 07:14:06 +00:00

Can't train a single good model #160

mrq referenced this issue

2023-03-22 13:12:58 +00:00

Can't train a single good model #160

hman360 referenced this issue

2023-03-23 05:55:06 +00:00

Can't train a single good model #160

Sign in to join this conversation.