All finetuned models are unstable when synthesizing lengthy content #224

Open
opened 2023-05-01 07:11:23 +00:00 by pheonis · 3 comments

I discovered an intriguing phenomenon while working on fine-tuned AI language models: When attempting to produce lengthier text using models previously shared on this platform, I observed instability in their performance. The majority of outputs generated contained
1: noticeable "artifacts,"
2: repetitions, or garbled sound following sentences.
3: Sometimes some sentences were completed omited from synthesis.

It's almost as though these systems struggle with generating coherent long-form content without losing clarity.

While this is the case with the finetuned models, Interestingly though the autoregressive model worded flawlessly without any of th above issues.

I dont know, What can be done, so that we get the models that are as close as autoregressive model and produce outputs without any of the mentioned issues.

Also to mention, All the finetuned models that i trained have mel loss ce value between 0.2 to 0.8 and are trained for 100 to 300 epoches.

I discovered an intriguing phenomenon while working on fine-tuned AI language models: When attempting to produce lengthier text using models previously shared on this platform, I observed instability in their performance. The majority of outputs generated contained 1: noticeable "artifacts," 2: repetitions, or garbled sound following sentences. 3: Sometimes some sentences were completed omited from synthesis. It's almost as though these systems struggle with generating coherent long-form content without losing clarity. While this is the case with the finetuned models, Interestingly though the autoregressive model worded flawlessly without any of th above issues. I dont know, What can be done, so that we get the models that are as close as autoregressive model and produce outputs without any of the mentioned issues. Also to mention, All the finetuned models that i trained have mel loss ce value between 0.2 to 0.8 and are trained for 100 to 300 epoches.
pheonis changed title from All finetuned models are unstable when creting lengthy content to All finetuned models are unstable when creating lengthy content 2023-05-01 07:11:40 +00:00
pheonis changed title from All finetuned models are unstable when creating lengthy content to All finetuned models are unstable when synthesizing lengthy content 2023-05-01 07:12:08 +00:00

Not a solution, but might be a temporary "workaround":

I've noticed that adding the line derimiter character as often as possible (after every period or comma), essentially turns your large request into a lot of small requests. For my use case, this works great.

Not a solution, but might be a temporary "workaround": I've noticed that adding the line derimiter character as often as possible (after every period or comma), essentially turns your large request into a lot of small requests. For my use case, this works great.
Author

Not a solution, but might be a temporary "workaround":

I've noticed that adding the line derimiter character as often as possible (after every period or comma), essentially turns your large request into a lot of small requests. For my use case, this works great.

What are you doing to achieve this? The line delimiter by default is set to "\n" in the repo. Do you change this?

> Not a solution, but might be a temporary "workaround": > > I've noticed that adding the line derimiter character as often as possible (after every period or comma), essentially turns your large request into a lot of small requests. For my use case, this works great. What are you doing to achieve this? The line delimiter by default is set to "\n" in the repo. Do you change this?

What are you doing to achieve this? The line delimiter by default is set to "\n" in the repo. Do you change this?

I did not change it, it's still "\n".
But I make sure after every period or comma, I add a new line (the default line delimiter "\n" means new line).

This way, longer texts get cut up into smaller texts, and combined into 1 wav automatically.

> What are you doing to achieve this? The line delimiter by default is set to "\n" in the repo. Do you change this? I did not change it, it's still "\n". But I make sure after every period or comma, I add a new line (the default line delimiter "\n" means new line). This way, longer texts get cut up into smaller texts, and combined into 1 wav automatically.
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#224
No description provided.