I've trained a model for 400 epochs and loss graph looked really solid but unfortunately there is lots of static in inference outputs. I've tried a bunch of inference settings (even high quality and standard presets) and can't get rid of static in the output. Only thing I can think to improve is run the training audio sample through Ultimate Vocal Remover and then retrain the model. Any other suggestions?
Context:
What I am after is a model that can capture porosity/tone/cadence of a voice well. I don't care too much about the quality of the voice (as long as doesn't contain a ton of static and distortion) because I'm going to be taking the output and shoving into RVC to match the voice pitch and quality really nicely. I'm trying to create a chatbot app so inference time is important to me (doesn't need to be realtime but needs to be reasonable.) I was inspired from Jarrod's video here https://www.youtube.com/watch?v=IcpRfHod1ic
I've trained a model for 400 epochs and loss graph looked really solid ![ttsLOSS.PNG](/attachments/1b30c432-be53-4c46-bfbb-845d1eec3212) but unfortunately there is lots of static in inference outputs. I've tried a bunch of inference settings (even high quality and standard presets) and can't get rid of static in the output. Only thing I can think to improve is run the training audio sample through Ultimate Vocal Remover and then retrain the model. Any other suggestions?
### Context:
___
What I am after is a model that can capture porosity/tone/cadence of a voice well. I don't care too much about the quality of the voice (as long as doesn't contain a ton of static and distortion) because I'm going to be taking the output and shoving into RVC to match the voice pitch and quality really nicely. I'm trying to create a chatbot app so inference time is important to me (doesn't need to be realtime but needs to be reasonable.) I was inspired from Jarrod's video here https://www.youtube.com/watch?v=IcpRfHod1ic
Given the graph, loss curve, and LR curve, I think your LR scheduling might have been too lax and ended up frying the finetune from the LR decaying very slowly. The default scheduling should be fine, especially if you're going to train for 400 epochs.
I'd start training from scratch but with the default scheduling.
Given the graph, loss curve, and LR curve, I think your LR scheduling might have been too lax and ended up frying the finetune from the LR decaying very slowly. The default scheduling should be fine, especially if you're going to train for 400 epochs.
I'd start training from scratch but with the default scheduling.
I've trained a model for 400 epochs and loss graph looked really solid
but unfortunately there is lots of static in inference outputs. I've tried a bunch of inference settings (even high quality and standard presets) and can't get rid of static in the output. Only thing I can think to improve is run the training audio sample through Ultimate Vocal Remover and then retrain the model. Any other suggestions?
Context:
What I am after is a model that can capture porosity/tone/cadence of a voice well. I don't care too much about the quality of the voice (as long as doesn't contain a ton of static and distortion) because I'm going to be taking the output and shoving into RVC to match the voice pitch and quality really nicely. I'm trying to create a chatbot app so inference time is important to me (doesn't need to be realtime but needs to be reasonable.) I was inspired from Jarrod's video here https://www.youtube.com/watch?v=IcpRfHod1ic
Given the graph, loss curve, and LR curve, I think your LR scheduling might have been too lax and ended up frying the finetune from the LR decaying very slowly. The default scheduling should be fine, especially if you're going to train for 400 epochs.
I'd start training from scratch but with the default scheduling.