Epochs, iterations, and datasets #197

Open
opened 2023-04-09 15:32:56 +07:00 by demonauthor · 6 comments

I'm having a tough time wrapping my head around this process...

The epoch is one pass through the dataset, right?
Given a quality dataset,
Do more epochs during training equal a better clone...
Or is it more iterations per epoch?

Then once trained, the voices folder is used to create a "template" for how the entered text is performed?
So a happy example.wav is more likely to yield a happy performance...and a varied vocal tone for a varied vocal performance. Are multiple wavs required in the vocals folder, or just one good example? How long should these files be, or are they just pulled from the dataset?

Wish there was a discussion forum/discord/chatroom for folks to exchange experiences more easily.

I'm having a tough time wrapping my head around this process... The epoch is one pass through the dataset, right? Given a quality dataset, Do more epochs during training equal a better clone... Or is it more iterations per epoch? Then once trained, the voices folder is used to create a "template" for how the entered text is performed? So a happy example.wav is more likely to yield a happy performance...and a varied vocal tone for a varied vocal performance. Are multiple wavs required in the vocals folder, or just one good example? How long should these files be, or are they just pulled from the dataset? Wish there was a discussion forum/discord/chatroom for folks to exchange experiences more easily.

Or is it more iterations per epoch?

AIUI more iterations per epoch just means a smaller batch size.

Are multiple wavs required in the vocals folder, or just one good example?

Just one.

How long should these files be, or are they just pulled from the dataset?

When the latents are calculated it uses the every .wav in the folder for that voice.

> Or is it more iterations per epoch? AIUI more iterations per epoch just means a smaller batch size. >Are multiple wavs required in the vocals folder, or just one good example? Just one. > How long should these files be, or are they just pulled from the dataset? When the latents are calculated it uses the every .wav in the folder for that voice.

That all makes sense. I have found more epochs leads to cleaner audio...but I'm still getting a smoothed out version of the voice when I generate. What I want is a thick accent like the dataset files...but what I get is either no accent or just a light accent.

Can you make your own autoregressive.pth model with an accent? Or train your dataset back on itself to refine the accent?

Oddly, sometimes when I look for no accent, I get a slight British accent, which I see is fairly common... However, I trained on a voice with a British accent and got no accent in the end.

I tried an Indian accent and it worked well. Confusing.

I'm training anywhere from 100 to 200 epochs, and have between 300 and 500 wavs in the dataset. Always a single speaker. I feel like I'm doing something wrong.

That all makes sense. I have found more epochs leads to cleaner audio...but I'm still getting a smoothed out version of the voice when I generate. What I want is a thick accent like the dataset files...but what I get is either no accent or just a light accent. Can you make your own autoregressive.pth model with an accent? Or train your dataset back on itself to refine the accent? Oddly, sometimes when I look for no accent, I get a slight British accent, which I see is fairly common... However, I trained on a voice with a British accent and got no accent in the end. I tried an Indian accent and it worked well. Confusing. I'm training anywhere from 100 to 200 epochs, and have between 300 and 500 wavs in the dataset. Always a single speaker. I feel like I'm doing something wrong.

If your dataset has a thick accent you might need to check the transcriptions to make sure that they're accurate.

If your dataset has a thick accent you might need to check the transcriptions to make sure that they're accurate.

Will check again, but at first glance, they looked good.

Will check again, but at first glance, they looked good.

Transcriptions are accurate. Having the same problem generating a British accent now, which seems weird.

Transcriptions are accurate. Having the same problem generating a British accent now, which seems weird.

You could try restarting with a higher learning rate for a lower number of iterations and see if it makes a difference.

You could try restarting with a higher learning rate for a lower number of iterations and see if it makes a difference.
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#197
There is no content yet.