Best ways to get rid of static? #352

Closed
opened 2023-08-28 05:48:53 +07:00 by drew · 1 comments

I've trained a model for 400 epochs and loss graph looked really solid ttsLOSS.PNG but unfortunately there is lots of static in inference outputs. I've tried a bunch of inference settings (even high quality and standard presets) and can't get rid of static in the output. Only thing I can think to improve is run the training audio sample through Ultimate Vocal Remover and then retrain the model. Any other suggestions?

Context:


What I am after is a model that can capture porosity/tone/cadence of a voice well. I don't care too much about the quality of the voice (as long as doesn't contain a ton of static and distortion) because I'm going to be taking the output and shoving into RVC to match the voice pitch and quality really nicely. I'm trying to create a chatbot app so inference time is important to me (doesn't need to be realtime but needs to be reasonable.) I was inspired from Jarrod's video here https://www.youtube.com/watch?v=IcpRfHod1ic

I've trained a model for 400 epochs and loss graph looked really solid ![ttsLOSS.PNG](/attachments/1b30c432-be53-4c46-bfbb-845d1eec3212) but unfortunately there is lots of static in inference outputs. I've tried a bunch of inference settings (even high quality and standard presets) and can't get rid of static in the output. Only thing I can think to improve is run the training audio sample through Ultimate Vocal Remover and then retrain the model. Any other suggestions? ### Context: ___ What I am after is a model that can capture porosity/tone/cadence of a voice well. I don't care too much about the quality of the voice (as long as doesn't contain a ton of static and distortion) because I'm going to be taking the output and shoving into RVC to match the voice pitch and quality really nicely. I'm trying to create a chatbot app so inference time is important to me (doesn't need to be realtime but needs to be reasonable.) I was inspired from Jarrod's video here https://www.youtube.com/watch?v=IcpRfHod1ic

Given the graph, loss curve, and LR curve, I think your LR scheduling might have been too lax and ended up frying the finetune from the LR decaying very slowly. The default scheduling should be fine, especially if you're going to train for 400 epochs.

I'd start training from scratch but with the default scheduling.

Given the graph, loss curve, and LR curve, I think your LR scheduling might have been too lax and ended up frying the finetune from the LR decaying very slowly. The default scheduling should be fine, especially if you're going to train for 400 epochs. I'd start training from scratch but with the default scheduling.
drew closed this issue 2023-09-01 17:31:59 +07:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#352
There is no content yet.