Have problem cloning voice #331

Open
opened 2023-08-21 15:45:46 +00:00 by nfkmobile · 9 comments

until some days ago, used to work perfect. now , for some strage reason, and i can;t figure it out, when the program tries to prepare the dataset, all the audio generated by the whisper transcribed gets very slow motion pitch to voice... did anyone encounter this problem? tried to resinstall from 0 same problem.

until some days ago, used to work perfect. now , for some strage reason, and i can;t figure it out, when the program tries to prepare the dataset, all the audio generated by the whisper transcribed gets very slow motion pitch to voice... did anyone encounter this problem? tried to resinstall from 0 same problem.
Owner

How strange, I just checked yesterday on a clean install with it leveraging a transcribed dataset. However, that dataset was transcribed forever ago, so I imagine I just need to try again after freshly transcribing it; probably a regression somewhere when I was gutting it.

When I (hopefully) get a chance, I'll take a look and see what went wrong. I'm pretty sure somewhere along the line, the sample rate got mucked up. If you can verify that the audio under ./training/{voice name}/audio/ sounds right, then there's a logic error in my code with loading the audio from there, and you can try and regenerate them by clicking Slice audio under Training > Prepare Dataset. If not, then there's a logic error with preparing/slicing the audio.


But for now, a """hotfix""" is to simply not have your voice under the ./training/ folder and instead keep it under ./voices/. I'm not too sure how much of an improvement it is to have the script use the sliced audio from ./training/ rather than the unprocessed audio under ./voices/.

How strange, I just checked yesterday on a clean install with it leveraging a transcribed dataset. However, that dataset was transcribed forever ago, so I imagine I just need to try again after freshly transcribing it; probably a regression somewhere when I was gutting it. When I (hopefully) get a chance, I'll take a look and see what went wrong. I'm pretty sure somewhere along the line, the sample rate got mucked up. If you can verify that the audio under `./training/{voice name}/audio/` sounds right, then there's a logic error in my code with loading the audio from there, and you can try and regenerate them by clicking `Slice audio` under `Training > Prepare Dataset`. If not, then there's a logic error with preparing/slicing the audio. --- But for now, a """hotfix""" is to simply not have your voice under the `./training/` folder and instead keep it under `./voices/`. I'm not too sure how much of an improvement it is to have the script use the sliced audio from `./training/` rather than the unprocessed audio under `./voices/`.
Author

that is the issue , the audio from ./training/{voice name}/audio/ sounds bad... so not sure what messed up.. probably i need to just uninstall python conda everything and do a clean insteall... probably other ai scripts messed up somehow the ffmpeg or torch or something...

that is the issue , the audio from ./training/{voice name}/audio/ sounds bad... so not sure what messed up.. probably i need to just uninstall python conda everything and do a clean insteall... probably other ai scripts messed up somehow the ffmpeg or torch or something...

I have the same problem.
It seems when I process/transcribing my audio file from /voices/{name}/ the resulting sliced audio in /training/{name}/audio will have messed up audio. There's also the whole audio file unsliced in that same /training/{name}/audio folder, with lowered sample rate, but the audio on that one is normal. Only the slices ones are broken.

I have the same problem. It seems when I process/transcribing my audio file from /voices/{name}/ the resulting sliced audio in /training/{name}/audio will have messed up audio. There's also the whole audio file unsliced in that same /training/{name}/audio folder, with lowered sample rate, but the audio on that one is normal. Only the slices ones are broken.

Same issue here.

Same issue here.
Owner

I think I've pinpointed and resolved the problem in commit 2060b6f21c.

To be safe, I would rename the ./training/{voice}/audio/ folder to anything else to back it up, check Skip existing, Slice segments, set your offsets accordingly, then click Transcribe and process to properly reslice your audio, but I think clicking (Re)Slice should work too (with the above settings).


The problem stemmed from me pushing my messy changes from my VALL-E training system (since I still use this repo to prepare my dataset) in commit 72a38ff2fc. There's a very crucial line that I had carelessly left commented out, thinking that "oh, well the audio clips should already be resampled when the original audio gets copied to the ./training/{voice}/audio/ folder, so the slices should already be at the right sample rate when slicing from the copy". I didn't seem to notice it in my testing the other day, as my audio was already at 22.050KHz, while audio at 24KHz sounds slightly off, and anything at 44.1KHz will sound drastically wrong.

At least it was from an issue that wasn't in the repo very long, I was worried it was something that has been around for a very, very long time, and would have drastically mucked up finetunes. Gomen.

I think I've pinpointed and resolved the problem in commit 2060b6f21c340c3ab2e6a56f5ae3515de2cdcf45. To be safe, I would rename the `./training/{voice}/audio/` folder to anything else to back it up, check `Skip existing`, `Slice segments`, set your offsets accordingly, then click `Transcribe and process` to properly reslice your audio, but I think clicking `(Re)Slice` should work too (with the above settings). --- The problem stemmed from me pushing my messy changes from my VALL-E training system (since I still use this repo to prepare my dataset) in commit 72a38ff2fc6aeb8b8184bb8e8103497a37ce151d. There's a very crucial line that I had carelessly left commented out, thinking that "oh, well the audio clips should already be resampled when the original audio gets copied to the `./training/{voice}/audio/` folder, so the slices should already be at the right sample rate when slicing from the copy". I didn't seem to notice it in my testing the other day, as my audio was already at 22.050KHz, while audio at 24KHz sounds slightly off, and anything at 44.1KHz will sound drastically wrong. At least it was from an issue that wasn't in the repo very long, I was worried it was something that has been around for a very, very long time, and would have drastically mucked up finetunes. Gomen.
Author

thanks man... upgraded and it fixed the problem :P saved me some days of stress... i was so close to format my pc and try to install everything from 0 as i thought my python messed up something :P

thanks man... upgraded and it fixed the problem :P saved me some days of stress... i was so close to format my pc and try to install everything from 0 as i thought my python messed up something :P
Author

damn.... another issue... i did an update-force... and now the script doesn;t start anymore ....

E:\ai-voice-cloning>start.bat

E:\ai-voice-cloning>call .\venv\Scripts\activate.bat
Traceback (most recent call last):
File "E:\ai-voice-cloning\src\main.py", line 11, in
from utils import *
File "E:\ai-voice-cloning\src\utils.py", line 40, in
from tortoise.api import TextToSpeech as TorToise_TTS, MODELS, get_model_path, pad_or_truncate
File "e:\ai-voice-cloning\modules\tortoise-tts\tortoise\api.py", line 25, in
from tortoise.models.bigvgan import BigVGAN
File "e:\ai-voice-cloning\modules\tortoise-tts\tortoise\models\bigvgan.py", line 14, in
from librosa.filters import mel as librosa_mel_fn
File "E:\ai-voice-cloning\venv\Lib\site-packages\librosa_init_.py", line 211, in
from . import core
File "E:\ai-voice-cloning\venv\Lib\site-packages\librosa\core_init_.py", line 9, in
from .constantq import * # pylint: disable=wildcard-import
^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\ai-voice-cloning\venv\Lib\site-packages\librosa\core\constantq.py", line 1059, in
dtype=np.complex,
^^^^^^^^^^
File "E:\ai-voice-cloning\venv\Lib\site-packages\numpy_init_.py", line 305, in getattr
raise AttributeError(former_attrs[attr])
AttributeError: module 'numpy' has no attribute 'complex'.
np.complex was a deprecated alias for the builtin complex. To avoid this error in existing code, use complex by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.complex128 here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations. Did you mean: 'complex_'?
Press any key to continue . . .

damn.... another issue... i did an update-force... and now the script doesn;t start anymore .... E:\ai-voice-cloning>start.bat E:\ai-voice-cloning>call .\venv\Scripts\activate.bat Traceback (most recent call last): File "E:\ai-voice-cloning\src\main.py", line 11, in <module> from utils import * File "E:\ai-voice-cloning\src\utils.py", line 40, in <module> from tortoise.api import TextToSpeech as TorToise_TTS, MODELS, get_model_path, pad_or_truncate File "e:\ai-voice-cloning\modules\tortoise-tts\tortoise\api.py", line 25, in <module> from tortoise.models.bigvgan import BigVGAN File "e:\ai-voice-cloning\modules\tortoise-tts\tortoise\models\bigvgan.py", line 14, in <module> from librosa.filters import mel as librosa_mel_fn File "E:\ai-voice-cloning\venv\Lib\site-packages\librosa\__init__.py", line 211, in <module> from . import core File "E:\ai-voice-cloning\venv\Lib\site-packages\librosa\core\__init__.py", line 9, in <module> from .constantq import * # pylint: disable=wildcard-import ^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\ai-voice-cloning\venv\Lib\site-packages\librosa\core\constantq.py", line 1059, in <module> dtype=np.complex, ^^^^^^^^^^ File "E:\ai-voice-cloning\venv\Lib\site-packages\numpy\__init__.py", line 305, in __getattr__ raise AttributeError(__former_attrs__[attr]) AttributeError: module 'numpy' has no attribute 'complex'. `np.complex` was a deprecated alias for the builtin `complex`. To avoid this error in existing code, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here. The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations. Did you mean: 'complex_'? Press any key to continue . . .
Author

did a clean install, training seems to be broken ...
[Training] [2023-08-23T12:57:45.241563] 23-08-23 12:57:44.644 - INFO: Random seed: 1587
[Training] [2023-08-23T12:57:46.881388] 23-08-23 12:57:46.881 - INFO: Number of training data elements: 81, iters: 1
[Training] [2023-08-23T12:57:46.886388] 23-08-23 12:57:46.881 - INFO: Total epochs needed: 500 for iters 500
[Training] [2023-08-23T12:57:49.708760] e:\aiclone2\ai-voice-cloning\venv\Lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing gradient_checkpointing to a config initialization is deprecated and will be removed in v5 Transformers. Using model.gradient_checkpointing_enable() instead, or if you are using the Trainer API, pass gradient_checkpointing=True in your TrainingArguments.
[Training] [2023-08-23T12:57:49.714756] warnings.warn(
[Training] [2023-08-23T12:58:02.774963] 23-08-23 12:58:02.773 - INFO: Loading model for [./models/tortoise/autoregressive.pth]
[Training] [2023-08-23T12:58:07.179449] 23-08-23 12:58:07.172 - INFO: Start training from epoch: 0, iter: 0
[Training] [2023-08-23T12:58:10.265197] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-08-23T12:58:13.757436] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-08-23T12:58:14.996971] e:\aiclone2\ai-voice-cloning\venv\Lib\site-packages\torch\optim\lr_scheduler.py:139: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
[Training] [2023-08-23T12:58:14.996971] warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). "
[Training] [2023-08-23T12:58:21.514987] Disabled distributed training.
[Training] [2023-08-23T12:58:21.514987] Loading from ./models/tortoise/dvae.pth
[Training] [2023-08-23T12:58:21.518978] Traceback (most recent call last):
[Training] [2023-08-23T12:58:21.518978] File "e:\aiclone2\ai-voice-cloning\src\train.py", line 64, in
[Training] [2023-08-23T12:58:21.518978] train(config_path, args.launcher)
[Training] [2023-08-23T12:58:21.518978] File "e:\aiclone2\ai-voice-cloning\src\train.py", line 31, in train
[Training] [2023-08-23T12:58:21.518978] trainer.do_training()
[Training] [2023-08-23T12:58:21.519964] File "e:\aiclone2\ai-voice-cloning\modules\dlas\dlas\train.py", line 408, in do_training
[Training] [2023-08-23T12:58:21.519964] metric = self.do_step(train_data)
[Training] [2023-08-23T12:58:21.519964] ^^^^^^^^^^^^^^^^^^^^^^^^
[Training] [2023-08-23T12:58:21.519964] File "e:\aiclone2\ai-voice-cloning\modules\dlas\dlas\train.py", line 271, in do_step
[Training] [2023-08-23T12:58:21.519964] gradient_norms_dict = self.model.optimize_parameters(
[Training] [2023-08-23T12:58:21.519964] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Training] [2023-08-23T12:58:21.519964] File "e:\aiclone2\ai-voice-cloning\modules\dlas\dlas\trainer\ExtensibleTrainer.py", line 321, in optimize_parameters
[Training] [2023-08-23T12:58:21.519964] ns = step.do_forward_backward(
[Training] [2023-08-23T12:58:21.519964] ^^^^^^^^^^^^^^^^^^^^^^^^^
[Training] [2023-08-23T12:58:21.519964] File "e:\aiclone2\ai-voice-cloning\modules\dlas\dlas\trainer\steps.py", line 242, in do_forward_backward
[Training] [2023-08-23T12:58:21.520971] local_state[k] = v[grad_accum_step]
[Training] [2023-08-23T12:58:21.520971] ~^^^^^^^^^^^^^^^^^
[Training] [2023-08-23T12:58:21.520971] IndexError: list index out of range

did a clean install, training seems to be broken ... [Training] [2023-08-23T12:57:45.241563] 23-08-23 12:57:44.644 - INFO: Random seed: 1587 [Training] [2023-08-23T12:57:46.881388] 23-08-23 12:57:46.881 - INFO: Number of training data elements: 81, iters: 1 [Training] [2023-08-23T12:57:46.886388] 23-08-23 12:57:46.881 - INFO: Total epochs needed: 500 for iters 500 [Training] [2023-08-23T12:57:49.708760] e:\aiclone2\ai-voice-cloning\venv\Lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`. [Training] [2023-08-23T12:57:49.714756] warnings.warn( [Training] [2023-08-23T12:58:02.774963] 23-08-23 12:58:02.773 - INFO: Loading model for [./models/tortoise/autoregressive.pth] [Training] [2023-08-23T12:58:07.179449] 23-08-23 12:58:07.172 - INFO: Start training from epoch: 0, iter: 0 [Training] [2023-08-23T12:58:10.265197] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-08-23T12:58:13.757436] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-08-23T12:58:14.996971] e:\aiclone2\ai-voice-cloning\venv\Lib\site-packages\torch\optim\lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate [Training] [2023-08-23T12:58:14.996971] warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " [Training] [2023-08-23T12:58:21.514987] Disabled distributed training. [Training] [2023-08-23T12:58:21.514987] Loading from ./models/tortoise/dvae.pth [Training] [2023-08-23T12:58:21.518978] Traceback (most recent call last): [Training] [2023-08-23T12:58:21.518978] File "e:\aiclone2\ai-voice-cloning\src\train.py", line 64, in <module> [Training] [2023-08-23T12:58:21.518978] train(config_path, args.launcher) [Training] [2023-08-23T12:58:21.518978] File "e:\aiclone2\ai-voice-cloning\src\train.py", line 31, in train [Training] [2023-08-23T12:58:21.518978] trainer.do_training() [Training] [2023-08-23T12:58:21.519964] File "e:\aiclone2\ai-voice-cloning\modules\dlas\dlas\train.py", line 408, in do_training [Training] [2023-08-23T12:58:21.519964] metric = self.do_step(train_data) [Training] [2023-08-23T12:58:21.519964] ^^^^^^^^^^^^^^^^^^^^^^^^ [Training] [2023-08-23T12:58:21.519964] File "e:\aiclone2\ai-voice-cloning\modules\dlas\dlas\train.py", line 271, in do_step [Training] [2023-08-23T12:58:21.519964] gradient_norms_dict = self.model.optimize_parameters( [Training] [2023-08-23T12:58:21.519964] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [Training] [2023-08-23T12:58:21.519964] File "e:\aiclone2\ai-voice-cloning\modules\dlas\dlas\trainer\ExtensibleTrainer.py", line 321, in optimize_parameters [Training] [2023-08-23T12:58:21.519964] ns = step.do_forward_backward( [Training] [2023-08-23T12:58:21.519964] ^^^^^^^^^^^^^^^^^^^^^^^^^ [Training] [2023-08-23T12:58:21.519964] File "e:\aiclone2\ai-voice-cloning\modules\dlas\dlas\trainer\steps.py", line 242, in do_forward_backward [Training] [2023-08-23T12:58:21.520971] local_state[k] = v[grad_accum_step] [Training] [2023-08-23T12:58:21.520971] ~^^^^^^^^^^^^^^^^^ [Training] [2023-08-23T12:58:21.520971] IndexError: list index out of range
Owner

AttributeError: module 'numpy' has no attribute 'complex'.
np.complex was a deprecated alias for the builtin complex. To avoid this error in existing code, use complex by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.complex128 here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations. Did you mean: 'complex_'?

numpy needs to be downgraded by running `pip3 install -U "numpy==1.23.0"

[Training] [2023-08-23T12:58:21.520971] local_state[k] = v[grad_accum_step]
[Training] [2023-08-23T12:58:21.520971] ~^^^^^^^^^^^^^^^^^
[Training] [2023-08-23T12:58:21.520971] IndexError: list index out of range

Your batch size is not evenly divisible by your gradient accumulation factor. The gradient accumulation factor also needs to be no more than half the batch size. DLAS is a bit of a mess in this regard.

> AttributeError: module 'numpy' has no attribute 'complex'. np.complex was a deprecated alias for the builtin complex. To avoid this error in existing code, use complex by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.complex128 here. The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations. Did you mean: 'complex_'? numpy needs to be downgraded by running `pip3 install -U "numpy==1.23.0" > [Training] [2023-08-23T12:58:21.520971] local_state[k] = v[grad_accum_step] [Training] [2023-08-23T12:58:21.520971] ~^^^^^^^^^^^^^^^^^ [Training] [2023-08-23T12:58:21.520971] IndexError: list index out of range Your batch size is not evenly divisible by your gradient accumulation factor. The gradient accumulation factor also needs to be no more than half the batch size. DLAS is a bit of a mess in this regard.
Sign in to join this conversation.
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#331
No description provided.