Epochs, iterations, and datasets #197

New Issue

demonauthor · 2023-04-09T15:32:56Z

demonauthor commented

2023-04-09 15:32:56 +00:00

I'm having a tough time wrapping my head around this process...

The epoch is one pass through the dataset, right?
Given a quality dataset,
Do more epochs during training equal a better clone...
Or is it more iterations per epoch?

Then once trained, the voices folder is used to create a "template" for how the entered text is performed?
So a happy example.wav is more likely to yield a happy performance...and a varied vocal tone for a varied vocal performance. Are multiple wavs required in the vocals folder, or just one good example? How long should these files be, or are they just pulled from the dataset?

Wish there was a discussion forum/discord/chatroom for folks to exchange experiences more easily.

I'm having a tough time wrapping my head around this process... The epoch is one pass through the dataset, right? Given a quality dataset, Do more epochs during training equal a better clone... Or is it more iterations per epoch? Then once trained, the voices folder is used to create a "template" for how the entered text is performed? So a happy example.wav is more likely to yield a happy performance...and a varied vocal tone for a varied vocal performance. Are multiple wavs required in the vocals folder, or just one good example? How long should these files be, or are they just pulled from the dataset? Wish there was a discussion forum/discord/chatroom for folks to exchange experiences more easily.

psammites commented

2023-04-11 03:45:14 +00:00

Or is it more iterations per epoch?

AIUI more iterations per epoch just means a smaller batch size.

Are multiple wavs required in the vocals folder, or just one good example?

Just one.

How long should these files be, or are they just pulled from the dataset?

When the latents are calculated it uses the every .wav in the folder for that voice.

> Or is it more iterations per epoch? AIUI more iterations per epoch just means a smaller batch size. >Are multiple wavs required in the vocals folder, or just one good example? Just one. > How long should these files be, or are they just pulled from the dataset? When the latents are calculated it uses the every .wav in the folder for that voice.

demonauthor commented

2023-04-11 20:23:24 +00:00

That all makes sense. I have found more epochs leads to cleaner audio...but I'm still getting a smoothed out version of the voice when I generate. What I want is a thick accent like the dataset files...but what I get is either no accent or just a light accent.

Can you make your own autoregressive.pth model with an accent? Or train your dataset back on itself to refine the accent?

Oddly, sometimes when I look for no accent, I get a slight British accent, which I see is fairly common... However, I trained on a voice with a British accent and got no accent in the end.

I tried an Indian accent and it worked well. Confusing.

I'm training anywhere from 100 to 200 epochs, and have between 300 and 500 wavs in the dataset. Always a single speaker. I feel like I'm doing something wrong.

That all makes sense. I have found more epochs leads to cleaner audio...but I'm still getting a smoothed out version of the voice when I generate. What I want is a thick accent like the dataset files...but what I get is either no accent or just a light accent. Can you make your own autoregressive.pth model with an accent? Or train your dataset back on itself to refine the accent? Oddly, sometimes when I look for no accent, I get a slight British accent, which I see is fairly common... However, I trained on a voice with a British accent and got no accent in the end. I tried an Indian accent and it worked well. Confusing. I'm training anywhere from 100 to 200 epochs, and have between 300 and 500 wavs in the dataset. Always a single speaker. I feel like I'm doing something wrong.

psammites commented

2023-04-11 21:01:41 +00:00

If your dataset has a thick accent you might need to check the transcriptions to make sure that they're accurate.