Sharing a German fine-tuned model and Latin-1 tokenizer #379

Open
opened 2023-09-12 17:33:54 +00:00 by nanonomad · 1 comment

Hi everyone,
I've stopped making YouTube videos for now, but wanted to share a thing I was working on.

This is a partially finished German fine-tune. The pronunciation could probably be improved with more training, because it was on the right track; I just don't have the resources to finish it (and I have no practical use for any of this, I'm just messing around).

Its one of the more stable multispeaker models that I've done. There are 3 voices, two female speakers form LibriVox books and Thorsten Mueller from Thorsten-Voice on YouTube. Latents are already done and in the voices folder along with samples needed to regen them.

To fully utilize it, you'll need to edit the MRQ code to change the text cleaners and upload/set the tokenizer from the model download page; vague directions are on the Huggingface link

https://huggingface.co/AOLCDROM/Tortoise-TTS-de

It probably performs poorly for cloning, but would be a good base for further German fine-tuning.
You may not want to go over 1e-7 for the LR with batch under 16, can probably go higher if you raise the batch size.

Demo from an older checkpoint: https://youtu.be/AvK5jnOizm4

Hi everyone, I've stopped making YouTube videos for now, but wanted to share a thing I was working on. This is a partially finished German fine-tune. The pronunciation could probably be improved with more training, because it was on the right track; I just don't have the resources to finish it (and I have no practical use for any of this, I'm just messing around). Its one of the more stable multispeaker models that I've done. There are 3 voices, two female speakers form LibriVox books and Thorsten Mueller from Thorsten-Voice on YouTube. Latents are already done and in the voices folder along with samples needed to regen them. To fully utilize it, you'll need to edit the MRQ code to change the text cleaners and upload/set the tokenizer from the model download page; vague directions are on the Huggingface link https://huggingface.co/AOLCDROM/Tortoise-TTS-de It probably performs poorly for cloning, but would be a good base for further German fine-tuning. You may not want to go over 1e-7 for the LR with batch under 16, can probably go higher if you raise the batch size. Demo from an older checkpoint: https://youtu.be/AvK5jnOizm4

I don't know German, but this sounds fantastic. I am wondering how whisper translation to German (not sure if that is one of the options) would turn out.

Question -- based on the Alice in Wonderland video. Can you hear a difference in quality when using 10, 100, 1000, 3000, 6000, samples? I'm just starting to use this tool, and even using something like 30 samples to train gave me what I felt were phenomenal results. Another question here I wonder is -- is there an optimum amplification/gain? I was blasting my samples to max gain... but I think that may be introducing a lot of noise, even on high quality samples. I am just thinking, I should look at the tortoise defaults and see what amplification level (ie. dB below clipping) he used in his samples.

I don't know German, but this sounds fantastic. I am wondering how whisper translation to German (not sure if that is one of the options) would turn out. Question -- based on the Alice in Wonderland video. Can you hear a difference in quality when using 10, 100, 1000, 3000, 6000, samples? I'm just starting to use this tool, and even using something like 30 samples to train gave me what I felt were phenomenal results. Another question here I wonder is -- is there an optimum amplification/gain? I was blasting my samples to max gain... but I think that may be introducing a lot of noise, even on high quality samples. I am just thinking, I should look at the tortoise defaults and see what amplification level (ie. dB below clipping) he used in his samples.
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#379
No description provided.