• Joined on 2023-03-05
nirurin commented on issue ecker/ai-voice-cloning#151 2023-03-17 23:57:33 +00:00
Is Cos.Annealing ever a better option?

IT just feels like its too fast, and so shouldn't be any good haha. No actual evidence to back that up!

This is my current latest graph, which is more like what (in my warped mind) I would…

nirurin commented on issue ecker/ai-voice-cloning#151 2023-03-17 23:29:45 +00:00
Is Cos.Annealing ever a better option?

Maybe 50 epochs is enough?..

Hard to say without knowing your batch size and how many steps per epoch you have.

In this case its a small dataset, 100 files, so batches of 100. 1…

nirurin commented on issue ecker/ai-voice-cloning#151 2023-03-17 22:17:39 +00:00
Is Cos.Annealing ever a better option?

image

LR = 0.00005

nirurin opened issue ecker/ai-voice-cloning#151 2023-03-17 22:04:22 +00:00
Is Cos.Annealing ever a better option?
nirurin commented on issue ecker/ai-voice-cloning#133 2023-03-15 01:11:45 +00:00
Mispronouncing certain letters for slavic languages

desu the first finetune test has a much smaller size (dataset size of 4.5k for 11 epochs). Granted, all of the hyperparameters play a role in…

nirurin commented on issue ecker/ai-voice-cloning#133 2023-03-15 00:54:06 +00:00
Mispronouncing certain letters for slavic languages
  • batch one didn't trim clips that exceeded 11.6s (dataset size of ~8k, for ~15 epochs)

Only 15 epochs? Is this a typo? I've been doing 200-1500 for most of my training, and that's just for…

nirurin opened issue ecker/ai-voice-cloning#128 2023-03-13 05:14:18 +00:00
Has anyone managed to train a voice to be able to shout?
nirurin closed issue ecker/ai-voice-cloning#126 2023-03-13 03:43:33 +00:00
Line delimited prompts - only output 'combined', maybe add an option to also save the non-combined generations?
nirurin commented on issue ecker/ai-voice-cloning#126 2023-03-13 03:43:30 +00:00
Line delimited prompts - only output 'combined', maybe add an option to also save the non-combined generations?

You can disable the Delete Non-Final Outputs setting under Settings to retain the individual pieces that get combined.

~Ahhh I see, I hadn't noticed that one, thanks

nirurin opened issue ecker/ai-voice-cloning#126 2023-03-13 01:27:29 +00:00
Line delimited prompts - only output 'combined', maybe add an option to also save the non-combined generations?
nirurin commented on issue ecker/ai-voice-cloning#113 2023-03-12 08:53:24 +00:00
Generated voices from training data always garbled.... but works fine using tortoise-tts-fast ... (?)

As an aside, but its part of my ongoing journey to clean up my training -

My outputs are now generating some fairly decent speech, even with very small data sets. However the output audio…

nirurin commented on issue ecker/ai-voice-cloning#113 2023-03-11 19:45:56 +00:00
Generated voices from training data always garbled.... but works fine using tortoise-tts-fast ... (?)

https://github.com/openai/whisper/discussions/435

This seems to be the most recent discussion involving a fix for the innaccurate timestamps in whisper.

Otherwise, as you suggest, I may…

nirurin commented on issue ecker/ai-voice-cloning#113 2023-03-11 18:15:51 +00:00
Generated voices from training data always garbled.... but works fine using tortoise-tts-fast ... (?)

Pushed commit 2424c455cb9614003c072f6cdc25fa80ba2694ba. It seems every passing day I regret more and more adding whisperx.

~~I'm very, very tempted to just remove it. It caused nothing…

nirurin closed issue ecker/ai-voice-cloning#69 2023-03-11 07:27:27 +00:00
Just some questions from a newbie...
nirurin commented on issue ecker/ai-voice-cloning#113 2023-03-11 07:27:11 +00:00
Generated voices from training data always garbled.... but works fine using tortoise-tts-fast ... (?)

So I'm trying again, with a fresh training session, but now I seem to be getting the groany/garbled generated voices on both mrq (using manual voice chunks) AND in fast-tts lol. So this time I…

nirurin commented on issue ecker/ai-voice-cloning#113 2023-03-11 04:51:43 +00:00
Generated voices from training data always garbled.... but works fine using tortoise-tts-fast ... (?)

Yeh I agree, though that's why I mentioned it would be nice if this was able to be automated... as I'll have to manually remove ~150 entries from the text file lol.

Just run the…

nirurin commented on issue ecker/ai-voice-cloning#113 2023-03-11 04:27:25 +00:00
Generated voices from training data always garbled.... but works fine using tortoise-tts-fast ... (?)

I can't imagine 0s files being anything other than poorly cut off. If you have enough data then I'd drop the worst part of it.

Yeh I agree, though that's why I mentioned it would be nice if…

nirurin commented on issue ecker/ai-voice-cloning#113 2023-03-11 04:14:25 +00:00
Generated voices from training data always garbled.... but works fine using tortoise-tts-fast ... (?)

I do think it could only be an improvement if there was an automated way to remove any transcribed clips that are below a certain length, as most of those are half-words or weirdly cut off. Not…

nirurin commented on issue ecker/ai-voice-cloning#113 2023-03-11 03:45:19 +00:00
Generated voices from training data always garbled.... but works fine using tortoise-tts-fast ... (?)

oh most of those files seem to be in the 'validation.txt' not in train.txt. Not sure what that file does.

nirurin commented on issue ecker/ai-voice-cloning#113 2023-03-11 03:44:26 +00:00
Generated voices from training data always garbled.... but works fine using tortoise-tts-fast ... (?)

Oh, no, that file is in the voices/patrick folder. In the training/patrick/audio folder its been cut up by whisper into a bunch of short files.

Ah, then it shouldn't affect it, as the…