How strange, I did do some tests with XTTS's tokenizer but I didn't notice anything different. However, I did trigger some assertions when using
[{lang}] Text
, as it was trying to do redaction…
Make sure you've updated tortoise-tts with:
cd .\modules\tortoise-tts\ git pull
Yep it updated and I screwed up by not by downloading the Tokenizer correctly initally.…
Stored autoregressive model to settings: ./models/tortoise/autoregressive.pth
Loading autoregressive model: ./models/tortoise/autoregressive.pth
Traceback (most recent call last):
File…
Alrighty, I've converted the weights over and did some small tweaks to load it in TorToiSe. When it's uploaded, it'll be under https://huggingface.co/ecker/coqui-xtts.
Samples (using…
I use youtube all the time to train models. IMO if there is no background music don't bother with UVR.
\ai-voice-cloning\venv\lib\site-packages\faster_whisper\utils.py - I edited this file and just added a large on line 22 and that seems to have work. Just to check this was the correct thing to do?
Did you successfully get a transcript from it? If not, do so, so you have a baseline for the output to expect. If so than simply replace primarySpeaker with your desired speaker when its looping…
Long story short, I seem to have an issue with diarization not working that i need to sort out first
I've been considering releasing a full dataset preparation pipeline for…
Long story short, I seem to have an issue with diarization not working that i need to sort out first
I've been considering releasing a full dataset preparation pipeline for tortoise but…
Long story short, I seem to have an issue with diarization not working that i need to sort out first
`import whisperx import torch import soundfile as sf import numpy as np
def move_to_device(obj, device): if isinstance(obj, dict): return {key: move_to_device(value, device)…
It's supported by whisperx, see notes on Speaker Diarization in the README.
Thanks, but as far I can…
It's supported by whisperx, see notes on Speaker Diarization in the README.
Thanks, but as far I can understand…
repetitive words are usually the audio and train.txt not matching...which happens a lot even with WhisperX