Can't get the model training started
#416
Closed
opened
Loading…
Reference in New Issue
There is no content yet.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. It CANNOT be undone. Continue?
I have nvidia GPU and Windows 10. Here's all the console output from the launch:
After that, nothing happens (although Python is using RAM and VRAM in the task manager)
Your gradient accumulation size is either too large or not divisible enough by your batch size.
Thanks a lot, I set Batch size to 64 and Gradient Accumulation Size to 32 and everything worked. But I have another question, probably very stupid. I liked one of the random generated voices, how do I use it? I tried using the same seed, but the voice was different in the second case. In the first case it was female and in the second case it was male.
If you happened to have
Embed Output Metadata
enabled, you can take the output with the random latents you want, and drag and drop it into theUtilities
>Import/Analyze
tab, and there should be a field that should extract the latents used for that generation. You can then take thecond_latents.pth
file and put it under./voices/{voice name}/
, and it should use those latents for subsequent generations.If you didn't, you should be able to just use the outputted file as a voice input again.
However, neither options are hard guarantees. I don't recall how consistent reusing randomly generated latents are.
I don't believe the seed gets used when generating random latents, just for the generation step.
Thank you again. I did as you say, checked, Embed Output Metadata is enabled in the settings, inserted the audio of the model I want in Import/Analyze, then imported that voice and selected it in the Generate tab, but I get an error when generating. Here is the console output: after "Loaded TTS, ready for generation.":
Importing latents to b'PK\x03\x04\x00\x00\x08\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x1e\x00\x04\x00cond_latents_d1f79232/data.pklFB\x00\x00\x80\x02N.PK\x07\x08\r\xd2\xb5}\x04\x00\x00\x00\x04\x00\x00\x00PK\x03\x04\x00\x00\x08\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x1d\x001\x00cond_latents_d1f79232/versionFB-\x00ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ3\nPK\x07\x08\xd1\x9egU\x02\x00\x00\x00\x02\x00\x00\x00PK\x01\x02\x00\x00\x00\x00\x08\x08\x00\x00\x00\x00\x00\x00\r\xd2\xb5}\x04\x00\x00\x00\x04\x00\x00\x00\x1e\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00cond_latents_d1f79232/data.pklPK\x01\x02\x00\x00\x00\x00\x08\x08\x00\x00\x00\x00\x00\x00\xd1\x9egU\x02\x00\x00\x00\x02\x00\x00\x00\x1d\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00T\x00\x00\x00cond_latents_d1f79232/versionPK\x06\x06,\x00\x00\x00\x00\x00\x00\x00\x1e\x03-\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x97\x00\x00\x00\x00\x00\x00\x00\xd2\x00\x00\x00\x00\x00\x00\x00PK\x06\x07\x00\x00\x00\x00i\x01\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00PK\x05\x06\x00\x00\x00\x00\x02\x00\x02\x00\x97\x00\x00\x00\xd2\x00\x00\x00\x00\x00'
Imported latents to ./voices/testvoice//cond_latents.pth
[1/1] Generating line: The Great Wall of China Is Not Visible from Space: Despite the common myth, the Great Wall of China is not visible to the naked eye from space.
Loading voice: testvoice with model d1f79232
Loading voice: testvoice
C:\Users\imint\Desktop\voice\ai-voice-cloning\venv\lib\site-packages\torchaudio\functional\functional.py:1458: UserWarning: "kaiser_window" resampling method name is being deprecated and replaced by "sinc_interp_kaiser" in the next release. The default behavior remains unchanged.
warnings.warn(
Traceback (most recent call last):
File "C:\Users\imint\Desktop\voice\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 394, in run_predict
output = await app.get_blocks().process_api(
File "C:\Users\imint\Desktop\voice\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1075, in process_api
result = await self.call_function(
File "C:\Users\imint\Desktop\voice\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 884, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Users\imint\Desktop\voice\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Users\imint\Desktop\voice\ai-voice-cloning\venv\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "C:\Users\imint\Desktop\voice\ai-voice-cloning\venv\lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "C:\Users\imint\Desktop\voice\ai-voice-cloning\venv\lib\site-packages\gradio\helpers.py", line 587, in tracked_fn
response = fn(*args)
File "C:\Users\imint\Desktop\voice\ai-voice-cloning\src\webui.py", line 94, in generate_proxy
raise e
File "C:\Users\imint\Desktop\voice\ai-voice-cloning\src\webui.py", line 88, in generate_proxy
sample, outputs, stats = generate(**kwargs)
File "C:\Users\imint\Desktop\voice\ai-voice-cloning\src\utils.py", line 351, in generate
return generate_tortoise(**kwargs)
File "C:\Users\imint\Desktop\voice\ai-voice-cloning\src\utils.py", line 1211, in generate_tortoise
gen, additionals = tts.tts(cut_text, **settings )
File "C:\Users\imint\Desktop\voice\ai-voice-cloning\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\imint\Desktop\voice\ai-voice-cloning\modules\tortoise-tts\tortoise\api.py", line 717, in tts
auto_conditioning, diffusion_conditioning, auto_conds, _ = self.get_conditioning_latents(voice_samples, return_mels=True, verbose=True)
File "C:\Users\imint\Desktop\voice\ai-voice-cloning\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\imint\Desktop\voice\ai-voice-cloning\modules\tortoise-tts\tortoise\api.py", line 545, in get_conditioning_latents
concat = torch.cat(samples, dim=-1)
RuntimeError: torch.cat(): expected a non-empty list of Tensors
I think I figured it out, it looks like I need to have audio with this voice in the folder. Thanks for your help.