Found some bad audio files during the middle of the training. What to do? #198
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#198
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I started the training with a 10 minute mp3(american english). After 100 epoches i checked the model and it has successfully cloned the voice but there are some artifacts in the generated voice and also some repeatations of some words and mispronounciation of some words.
Now i checked the audio files ,to find that some audio files have some music in the start. I want to delete those audio files.
Is this a good plan to do so? What will you do if you find some bad audio files in the middle of the training?
Im planning to delete the audio file, modify the train.txt and whisper.json file and remove those parts.
Or should i train to may be 200 epoches?
Found some audio files not good with some music in the middle of the training. What to do?to Found some bad audio files during the middle of the training. What to do?I'd restart the training with a clean dataset, just to be sure.
How do you go about it? I read your comment in here #133, You mentioned there that you proofread all the transcriptions and use smaller dataset? So, do you do it manually? like trim the audio files in audacity and then transcribe or do you rely fully on this repo for the preparation of dataset.
Also, do you recommend using
Trim Silence
option . I used this option in one of my dataset preparation and i didnt like the result, there were harsh cuts in the start and end of audio files..like the speaker is missing some initial letters while speaking the starting word of a sentence.Semi-manually, I use
whisperx
to produce a timestamped transcription and then feed the timestamps intoffmpeg
to cut things to size.I've never used it.
-Okay. I tried to follow your method. Installed yt-dl, whisperx and ffmpeg.
-Downloaded the youttube file in mp3 format.
-Used whisperx to generate transcription. It generated transcription in many formats(srt,vtt,txt,tsv)
-Now, How do i cut the audio file to segments by transcription using ffmpeg?
And Once the audio files are segmented after that,how do you create the train.txt file?
The most important thing is that when you do the transcription with
whisperx
you specify--align_model WAV2VEC2_ASR_LARGE_LV60K_960H
or else the timestamps are going to be inaccurate. I haveffmpeg
split the file into segments using the segment filter based on the second column of the .tsv file (depending on your OS and version offfmpeg
you may need to truncate the timestamps to 3 digits after the decimal point), then outputaudio/
+ file name of the segment to the train.txt followed by a|
and then the contents of the third column of the .tsv, and once all that's done create the folder structure for the dataset under training/ and copy over everything. Probably would be faster to use a bash/<your shell of choice> loop to do it all but it's the kind of small task that I'm too lazy to automate.I successfully managed to segment the audio file, created the train.txt file ...and moved the clips to a folder
ai-voice-cloning/traning/test/audio
. Placed the train.txt file in thetraning/test
folder.Now, I generated train.yaml under training>generate configuration and started the training,
The traning is runnning smoothly but I can see these errors in the backend.
It seems there is a problem reading the wav files. I have checked all the wav files though. They were good. If anything im missing here?
I haven't run into that error before. Does running
ffprobe
on the wav files reveal any errors?This is the output of ffprobe, I guess the sample rate needed is 22050hz and these wav files are 16000hz..causing the errors. i will try to change the sample rate and try again.
Ahh, I forgot to mention that, sorry. Appending
-ar 22050
to your ffmpeg command should fix it.