Have problem cloning voice #331
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#331
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
until some days ago, used to work perfect. now , for some strage reason, and i can;t figure it out, when the program tries to prepare the dataset, all the audio generated by the whisper transcribed gets very slow motion pitch to voice... did anyone encounter this problem? tried to resinstall from 0 same problem.
How strange, I just checked yesterday on a clean install with it leveraging a transcribed dataset. However, that dataset was transcribed forever ago, so I imagine I just need to try again after freshly transcribing it; probably a regression somewhere when I was gutting it.
When I (hopefully) get a chance, I'll take a look and see what went wrong. I'm pretty sure somewhere along the line, the sample rate got mucked up. If you can verify that the audio under
./training/{voice name}/audio/
sounds right, then there's a logic error in my code with loading the audio from there, and you can try and regenerate them by clickingSlice audio
underTraining > Prepare Dataset
. If not, then there's a logic error with preparing/slicing the audio.But for now, a """hotfix""" is to simply not have your voice under the
./training/
folder and instead keep it under./voices/
. I'm not too sure how much of an improvement it is to have the script use the sliced audio from./training/
rather than the unprocessed audio under./voices/
.that is the issue , the audio from ./training/{voice name}/audio/ sounds bad... so not sure what messed up.. probably i need to just uninstall python conda everything and do a clean insteall... probably other ai scripts messed up somehow the ffmpeg or torch or something...
I have the same problem.
It seems when I process/transcribing my audio file from /voices/{name}/ the resulting sliced audio in /training/{name}/audio will have messed up audio. There's also the whole audio file unsliced in that same /training/{name}/audio folder, with lowered sample rate, but the audio on that one is normal. Only the slices ones are broken.
Same issue here.
I think I've pinpointed and resolved the problem in commit
2060b6f21c
.To be safe, I would rename the
./training/{voice}/audio/
folder to anything else to back it up, checkSkip existing
,Slice segments
, set your offsets accordingly, then clickTranscribe and process
to properly reslice your audio, but I think clicking(Re)Slice
should work too (with the above settings).The problem stemmed from me pushing my messy changes from my VALL-E training system (since I still use this repo to prepare my dataset) in commit
72a38ff2fc
. There's a very crucial line that I had carelessly left commented out, thinking that "oh, well the audio clips should already be resampled when the original audio gets copied to the./training/{voice}/audio/
folder, so the slices should already be at the right sample rate when slicing from the copy". I didn't seem to notice it in my testing the other day, as my audio was already at 22.050KHz, while audio at 24KHz sounds slightly off, and anything at 44.1KHz will sound drastically wrong.At least it was from an issue that wasn't in the repo very long, I was worried it was something that has been around for a very, very long time, and would have drastically mucked up finetunes. Gomen.
thanks man... upgraded and it fixed the problem :P saved me some days of stress... i was so close to format my pc and try to install everything from 0 as i thought my python messed up something :P
damn.... another issue... i did an update-force... and now the script doesn;t start anymore ....
E:\ai-voice-cloning>start.bat
E:\ai-voice-cloning>call .\venv\Scripts\activate.bat
Traceback (most recent call last):
File "E:\ai-voice-cloning\src\main.py", line 11, in
from utils import *
File "E:\ai-voice-cloning\src\utils.py", line 40, in
from tortoise.api import TextToSpeech as TorToise_TTS, MODELS, get_model_path, pad_or_truncate
File "e:\ai-voice-cloning\modules\tortoise-tts\tortoise\api.py", line 25, in
from tortoise.models.bigvgan import BigVGAN
File "e:\ai-voice-cloning\modules\tortoise-tts\tortoise\models\bigvgan.py", line 14, in
from librosa.filters import mel as librosa_mel_fn
File "E:\ai-voice-cloning\venv\Lib\site-packages\librosa_init_.py", line 211, in
from . import core
File "E:\ai-voice-cloning\venv\Lib\site-packages\librosa\core_init_.py", line 9, in
from .constantq import * # pylint: disable=wildcard-import
^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\ai-voice-cloning\venv\Lib\site-packages\librosa\core\constantq.py", line 1059, in
dtype=np.complex,
^^^^^^^^^^
File "E:\ai-voice-cloning\venv\Lib\site-packages\numpy_init_.py", line 305, in getattr
raise AttributeError(former_attrs[attr])
AttributeError: module 'numpy' has no attribute 'complex'.
np.complex
was a deprecated alias for the builtincomplex
. To avoid this error in existing code, usecomplex
by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, usenp.complex128
here.The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations. Did you mean: 'complex_'?
Press any key to continue . . .
did a clean install, training seems to be broken ...
[Training] [2023-08-23T12:57:45.241563] 23-08-23 12:57:44.644 - INFO: Random seed: 1587
[Training] [2023-08-23T12:57:46.881388] 23-08-23 12:57:46.881 - INFO: Number of training data elements: 81, iters: 1
[Training] [2023-08-23T12:57:46.886388] 23-08-23 12:57:46.881 - INFO: Total epochs needed: 500 for iters 500
[Training] [2023-08-23T12:57:49.708760] e:\aiclone2\ai-voice-cloning\venv\Lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing
gradient_checkpointing
to a config initialization is deprecated and will be removed in v5 Transformers. Usingmodel.gradient_checkpointing_enable()
instead, or if you are using theTrainer
API, passgradient_checkpointing=True
in yourTrainingArguments
.[Training] [2023-08-23T12:57:49.714756] warnings.warn(
[Training] [2023-08-23T12:58:02.774963] 23-08-23 12:58:02.773 - INFO: Loading model for [./models/tortoise/autoregressive.pth]
[Training] [2023-08-23T12:58:07.179449] 23-08-23 12:58:07.172 - INFO: Start training from epoch: 0, iter: 0
[Training] [2023-08-23T12:58:10.265197] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-08-23T12:58:13.757436] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-08-23T12:58:14.996971] e:\aiclone2\ai-voice-cloning\venv\Lib\site-packages\torch\optim\lr_scheduler.py:139: UserWarning: Detected call of
lr_scheduler.step()
beforeoptimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order:optimizer.step()
beforelr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate[Training] [2023-08-23T12:58:14.996971] warnings.warn("Detected call of
lr_scheduler.step()
beforeoptimizer.step()
. "[Training] [2023-08-23T12:58:21.514987] Disabled distributed training.
[Training] [2023-08-23T12:58:21.514987] Loading from ./models/tortoise/dvae.pth
[Training] [2023-08-23T12:58:21.518978] Traceback (most recent call last):
[Training] [2023-08-23T12:58:21.518978] File "e:\aiclone2\ai-voice-cloning\src\train.py", line 64, in
[Training] [2023-08-23T12:58:21.518978] train(config_path, args.launcher)
[Training] [2023-08-23T12:58:21.518978] File "e:\aiclone2\ai-voice-cloning\src\train.py", line 31, in train
[Training] [2023-08-23T12:58:21.518978] trainer.do_training()
[Training] [2023-08-23T12:58:21.519964] File "e:\aiclone2\ai-voice-cloning\modules\dlas\dlas\train.py", line 408, in do_training
[Training] [2023-08-23T12:58:21.519964] metric = self.do_step(train_data)
[Training] [2023-08-23T12:58:21.519964] ^^^^^^^^^^^^^^^^^^^^^^^^
[Training] [2023-08-23T12:58:21.519964] File "e:\aiclone2\ai-voice-cloning\modules\dlas\dlas\train.py", line 271, in do_step
[Training] [2023-08-23T12:58:21.519964] gradient_norms_dict = self.model.optimize_parameters(
[Training] [2023-08-23T12:58:21.519964] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Training] [2023-08-23T12:58:21.519964] File "e:\aiclone2\ai-voice-cloning\modules\dlas\dlas\trainer\ExtensibleTrainer.py", line 321, in optimize_parameters
[Training] [2023-08-23T12:58:21.519964] ns = step.do_forward_backward(
[Training] [2023-08-23T12:58:21.519964] ^^^^^^^^^^^^^^^^^^^^^^^^^
[Training] [2023-08-23T12:58:21.519964] File "e:\aiclone2\ai-voice-cloning\modules\dlas\dlas\trainer\steps.py", line 242, in do_forward_backward
[Training] [2023-08-23T12:58:21.520971] local_state[k] = v[grad_accum_step]
[Training] [2023-08-23T12:58:21.520971] ~^^^^^^^^^^^^^^^^^
[Training] [2023-08-23T12:58:21.520971] IndexError: list index out of range
numpy needs to be downgraded by running `pip3 install -U "numpy==1.23.0"
Your batch size is not evenly divisible by your gradient accumulation factor. The gradient accumulation factor also needs to be no more than half the batch size. DLAS is a bit of a mess in this regard.