All finetuned models are unstable when synthesizing lengthy content #224
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#224
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I discovered an intriguing phenomenon while working on fine-tuned AI language models: When attempting to produce lengthier text using models previously shared on this platform, I observed instability in their performance. The majority of outputs generated contained
1: noticeable "artifacts,"
2: repetitions, or garbled sound following sentences.
3: Sometimes some sentences were completed omited from synthesis.
It's almost as though these systems struggle with generating coherent long-form content without losing clarity.
While this is the case with the finetuned models, Interestingly though the autoregressive model worded flawlessly without any of th above issues.
I dont know, What can be done, so that we get the models that are as close as autoregressive model and produce outputs without any of the mentioned issues.
Also to mention, All the finetuned models that i trained have mel loss ce value between 0.2 to 0.8 and are trained for 100 to 300 epoches.
All finetuned models are unstable when creting lengthy contentto All finetuned models are unstable when creating lengthy contentAll finetuned models are unstable when creating lengthy contentto All finetuned models are unstable when synthesizing lengthy contentNot a solution, but might be a temporary "workaround":
I've noticed that adding the line derimiter character as often as possible (after every period or comma), essentially turns your large request into a lot of small requests. For my use case, this works great.
What are you doing to achieve this? The line delimiter by default is set to "\n" in the repo. Do you change this?
I did not change it, it's still "\n".
But I make sure after every period or comma, I add a new line (the default line delimiter "\n" means new line).
This way, longer texts get cut up into smaller texts, and combined into 1 wav automatically.