Phenomizer expects a list of strs, not one giant str #236

Open
opened 2023-05-12 09:49:07 +00:00 by sobd · 6 comments

On a Ubuntu 22.04 LTS environment, running the latest commit of this repo (74bd0f0cdc) when I attempt to phenomize, I'm running into this issue:

raceback (most recent call last):
  File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/gradio/routes.py", line 414, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1320, in process_api
    result = await self.call_function(
  File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1048, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/gradio/helpers.py", line 589, in tracked_fn
    response = fn(*args)
  File "/home/.../ai-voice-cloning/src/webui.py", line 252, in prepare_dataset_proxy
    message = prepare_dataset( voice, use_segments=slice_audio, text_length=validation_text_length, audio_length=validation_audio_length, progress=progress )
  File "/home/.../ai-voice-cloning/src/utils.py", line 2458, in prepare_dataset
    phonemes = phonemizer( text, language=lang )
  File "/home/.../ai-voice-cloning/src/utils.py", line 2323, in phonemizer
    tokens = backend.phonemize( text, strip=True )
  File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/phonemizer/backend/base.py", line 181, in phonemize
    raise RuntimeError(
RuntimeError: input text to phonemize() is str but it must be list of str

I think simple enough to fix, but I wasn't sure if there's an easier way (aside from brute force regex) to strip the Whisper-generated transcript of extra punctuation and whitespace.

On a Ubuntu 22.04 LTS environment, running the latest commit of this repo (74bd0f0cdce350e9ca30b937fb5fc7b3d17242fb) when I attempt to phenomize, I'm running into this issue: ``` raceback (most recent call last): File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/gradio/routes.py", line 414, in run_predict output = await app.get_blocks().process_api( File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1320, in process_api result = await self.call_function( File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1048, in call_function prediction = await anyio.to_thread.run_sync( File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/gradio/helpers.py", line 589, in tracked_fn response = fn(*args) File "/home/.../ai-voice-cloning/src/webui.py", line 252, in prepare_dataset_proxy message = prepare_dataset( voice, use_segments=slice_audio, text_length=validation_text_length, audio_length=validation_audio_length, progress=progress ) File "/home/.../ai-voice-cloning/src/utils.py", line 2458, in prepare_dataset phonemes = phonemizer( text, language=lang ) File "/home/.../ai-voice-cloning/src/utils.py", line 2323, in phonemizer tokens = backend.phonemize( text, strip=True ) File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/phonemizer/backend/base.py", line 181, in phonemize raise RuntimeError( RuntimeError: input text to phonemize() is str but it must be list of str ``` I think simple enough to fix, but I wasn't sure if there's an easier way (aside from brute force regex) to strip the Whisper-generated transcript of extra punctuation and whitespace.
Owner

Oh right, I forgot that instantiating the phonemizer backend yourself rather than using its API to instantiate one-use backends requires it to be by array rather than string.
I'll see what I can do with whatever environments I have left.

Oh right, I forgot that instantiating the phonemizer backend yourself rather than using its API to instantiate one-use backends requires it to be by array rather than string. I'll see what I can do with whatever environments I have left.
Owner

Fixed in commit cbe21745df. Apologies.

I noticed in the VALL-E implementation I did hotfix this with text = [ text ], and I suppose I neglected to copy that part when I was porting it over to "fix" the memleaking when using the phonemizer there for IPA-based tokenizers (which I honestly haven't touched in ages now; the phonemizer for VALL-E datasets get handled through the VALL-E module).

Fixed in commit cbe21745df58ce220631ac34525e17b10c225c61. Apologies. I noticed in the VALL-E implementation I did hotfix this with `text = [ text ]`, and I suppose I neglected to copy that part when I was porting it over to "fix" the memleaking when using the phonemizer there for IPA-based tokenizers (which I honestly haven't touched in ages now; the phonemizer for VALL-E datasets get handled through the VALL-E module).
Author

Thanks, that commit fixed the phonemizer. I am running into a new problem when trying to train on the phonemized Whisper transcript (I had no issues with the vanilla tokens).

[Training] Loading from ./models/tortoise/dvae.pth
[Training] Traceback (most recent call last):
[Training]   File "/home/.../ai-voice-cloning/./src/train.py", line 64, in <module>
[Training]     train(config_path, args.launcher)
[Training]   File "/home/.../ai-voice-cloning/./src/train.py", line 31, in train
[Training]     trainer.do_training()
[Training]   File "/home/.../ai-voice-cloning/modules/dlas/dlas/train.py", line 406, in do_training
[Training]     for train_data in tq_ldr:
[Training]   File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 633, in __next__
[Training]     data = self._next_data()
[Training]   File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
[Training]     return self._process_data(data)
[Training]   File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
[Training]     data.reraise()
[Training]   File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise
[Training]     raise exception
[Training] AssertionError: Caught AssertionError in DataLoader worker process 0.
[Training] Original Traceback (most recent call last):
[Training]   File "/home/.../ai-voice-cloning/modules/dlas/dlas/data/audio/paired_voice_audio_dataset.py", line 218, in __getitem__
[Training]     tseq, wav, text, path, type = self.get_wav_text_pair(
[Training]   File "/home/.../ai-voice-cloning/modules/dlas/dlas/data/audio/paired_voice_audio_dataset.py", line 201, in get_wav_text_pair
[Training]     text_seq = self.get_text(text)
[Training]   File "/home/.../ai-voice-cloning/modules/dlas/dlas/data/audio/paired_voice_audio_dataset.py", line 210, in get_text
[Training]     assert not torch.any(tokens == 1)
[Training] AssertionError
Thanks, that commit fixed the phonemizer. I am running into a new problem when trying to train on the phonemized Whisper transcript (I had no issues with the vanilla tokens). ``` [Training] Loading from ./models/tortoise/dvae.pth [Training] Traceback (most recent call last): [Training] File "/home/.../ai-voice-cloning/./src/train.py", line 64, in <module> [Training] train(config_path, args.launcher) [Training] File "/home/.../ai-voice-cloning/./src/train.py", line 31, in train [Training] trainer.do_training() [Training] File "/home/.../ai-voice-cloning/modules/dlas/dlas/train.py", line 406, in do_training [Training] for train_data in tq_ldr: [Training] File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 633, in __next__ [Training] data = self._next_data() [Training] File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data [Training] return self._process_data(data) [Training] File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data [Training] data.reraise() [Training] File "/home/.../ai-voice-cloning/venv/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise [Training] raise exception [Training] AssertionError: Caught AssertionError in DataLoader worker process 0. [Training] Original Traceback (most recent call last): [Training] File "/home/.../ai-voice-cloning/modules/dlas/dlas/data/audio/paired_voice_audio_dataset.py", line 218, in __getitem__ [Training] tseq, wav, text, path, type = self.get_wav_text_pair( [Training] File "/home/.../ai-voice-cloning/modules/dlas/dlas/data/audio/paired_voice_audio_dataset.py", line 201, in get_wav_text_pair [Training] text_seq = self.get_text(text) [Training] File "/home/.../ai-voice-cloning/modules/dlas/dlas/data/audio/paired_voice_audio_dataset.py", line 210, in get_text [Training] assert not torch.any(tokens == 1) [Training] AssertionError ```

I am running into the same AssertionError when tring to train on a phonemized script, any updates on that issue?

I am running into the same AssertionError when tring to train on a phonemized script, any updates on that issue?

Ok, found it, the phonemizer also outputs the phonemized text as a list. I just fixed it in my train.txt for now, but the assertion error is gone and training is running without problems so far...

Ok, found it, the phonemizer also outputs the phonemized text as a list. I just fixed it in my train.txt for now, but the assertion error is gone and training is running without problems so far...

Ok, found it, the phonemizer also outputs the phonemized text as a list. I just fixed it in my train.txt for now, but the assertion error is gone and training is running without problems so far...

Hi stlohrey, can you explain the solution for this assertion error? In which way did you edit the train.txt?

> Ok, found it, the phonemizer also outputs the phonemized text as a list. I just fixed it in my train.txt for now, but the assertion error is gone and training is running without problems so far... Hi stlohrey, can you explain the solution for this assertion error? In which way did you edit the train.txt?
Sign in to join this conversation.
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#236
No description provided.