Commit Graph

264 Commits

Author SHA1 Message Date
mrq
363d0b09b1 added options to pick tokenizer json and diffusion model (so I don't have to add it in later when I get bored and add in diffusion training) 2023-03-15 00:37:38 +00:00
mrq
07b684c4e7 removed redundant training data (they exist within tortoise itself anyways), added utility: view tokenized text 2023-03-14 21:51:27 +00:00
mrq
469dd47a44 fixes #131 2023-03-14 18:58:03 +00:00
mrq
84b7383428 fixes #134 2023-03-14 18:52:56 +00:00
mrq
4b952ea52a fixes #132 2023-03-14 18:46:20 +00:00
mrq
fe03ae5839 fixes 2023-03-14 17:42:42 +00:00
mrq
9d2c7fb942 cleanup 2023-03-14 16:23:29 +00:00
mrq
65fe304267 fixed broken graph displaying 2023-03-14 16:04:56 +00:00
mrq
7b16b3e88a ;) 2023-03-14 15:48:09 +00:00
mrq
54036fd780 :) 2023-03-14 05:02:14 +00:00
mrq
92a05d3c4c added PYTHONUTF8 to start/train bats 2023-03-14 02:29:11 +00:00
mrq
dadb1fca6b multichannel audio now report correct duration (surprised it took this long for me to source multichannel audio) 2023-03-13 21:24:51 +00:00
mrq
32d968a8cd (disabled by default until i validate it working) added additional transcription text normalization (something else I'm experimenting with requires it) 2023-03-13 19:07:23 +00:00
mrq
66ac8ba766 added mel LR weight (as I finally understand when to adjust the text), added text validation on dataset creation 2023-03-13 18:51:53 +00:00
mrq
ee1b048d07 when creating the train/validatio datasets, use segments if the main audio's duration is too long, and slice to make the segments if they don't exist 2023-03-13 04:26:00 +00:00
mrq
0cf9db5e69 oops 2023-03-13 01:33:45 +00:00
mrq
050bcefd73 resample to 22.5K when creating training inputs (to avoid redundant downsampling when loaded for training, even though most of my inputs are already at 22.5K), generalized resampler function to cache and reuse them, do not unload whisper when done transcribing since it gets unloaded anyways for any other non-transcription task 2023-03-13 01:20:55 +00:00
mrq
7c9c0dc584 forgot to clean up debug prints 2023-03-13 00:44:37 +00:00
mrq
239c984850 move validating audio to creating the text files instead, consider audio longer than 11 seconds invalid, consider text lengths over 200 invalid 2023-03-12 23:39:00 +00:00
mrq
51ddc205cd update submodules 2023-03-12 18:14:36 +00:00
mrq
ccbf2e6aff blame mrq/ai-voice-cloning#122 2023-03-12 17:51:52 +00:00
mrq
9238df0b03 fixed last generation settings not actually load because brain worms 2023-03-12 15:49:50 +00:00
mrq
9594a960b0 Disable loss ETA for now until I fix it 2023-03-12 15:39:54 +00:00
mrq
be8b290a1a Merge branch 'master' into save_more_user_config 2023-03-12 15:38:08 +00:00
mrq
296129ba9c output fixes, I'm not sure why ETA wasn't working but it works in testing 2023-03-12 15:17:07 +00:00
mrq
098d7ad635 uh I don't remember, small things 2023-03-12 14:47:48 +00:00
233baa4e45 updated several default configurations to not cause null/empty errors. also default samples/iterations to 16-30 ultra fast which is typically suggested. 2023-03-12 16:08:02 +02:00
29b3d1ae1d Fixed Keep X Previous States 2023-03-12 08:01:08 +02:00
9e320a34c8 Fixed Keep X Previous States 2023-03-12 08:00:03 +02:00
61500107ab Catch OOM and run whisper on cpu automatically. 2023-03-12 06:48:28 +02:00
mrq
ede9804b76 added option to trim silence using torchaudio's VAD 2023-03-11 21:41:35 +00:00
mrq
dea2fa9caf added fields to offset start/end slices to apply in bulk when slicing 2023-03-11 21:34:29 +00:00
mrq
89bb3d4419 rename transcribe button since it does more than transcribe 2023-03-11 21:18:04 +00:00
mrq
382a3e4104 rely on the whisper.json for handling a lot more things 2023-03-11 21:17:11 +00:00
mrq
9b376c381f brain worm 2023-03-11 18:14:32 +00:00
mrq
94551fb9ac split slicing dataset routine so it can be done after the fact 2023-03-11 17:27:01 +00:00
mrq
e3fdb79b49 rocm5.2 works for me desu so I bumped it back up 2023-03-11 17:02:56 +00:00
mrq
cf41492f76 fall back to normal behavior if theres actually no audiofiles loaded from the dataset when using it for computing latents 2023-03-11 16:46:03 +00:00
mrq
b90c164778 Farewell, parasite 2023-03-11 16:40:34 +00:00
mrq
2424c455cb added option to not slice audio when transcribing, added option to prepare validation dataset on audio duration, added a warning if youre using whisperx and you're slicing audio 2023-03-11 16:32:35 +00:00
tigi6346
dcdcf8516c master (#112)
Fixes Gradio bugging out when attempting to load a missing train.json.

Reviewed-on: mrq/ai-voice-cloning#112
Co-authored-by: tigi6346 <tigi6346@noreply.localhost>
Co-committed-by: tigi6346 <tigi6346@noreply.localhost>
2023-03-11 03:28:04 +00:00
mrq
008a1f5f8f simplified spawning the training process by having it spawn the distributed training processes in the train.py script, so it should work on Windows too 2023-03-11 01:37:00 +00:00
mrq
2feb6da0c0 cleanups and fixes, fix DLAS throwing errors from '''too short of sound files''' by just culling them during transcription 2023-03-11 01:19:49 +00:00
mrq
7f2da0f5fb rewrote how AIVC gets training metrics (need to clean up later) 2023-03-10 22:35:32 +00:00
mrq
df0edacc60 fix the cleanup actually only doing 2 despite requesting more than 2, surprised no one has pointed it out 2023-03-10 14:04:07 +00:00
mrq
8e890d3023 forgot to fix reset settings to use the new arg-agnostic way 2023-03-10 13:49:39 +00:00
mrq
c92b006129 I really hate YAML 2023-03-10 03:48:46 +00:00
mrq
eb1551ee92 what I thought was an override and not a ternary 2023-03-09 23:04:02 +00:00
mrq
c3b43d2429 today I learned adamw_zero actually negates ANY LR schemes 2023-03-09 19:42:31 +00:00
mrq
cb273b8428 cleanup 2023-03-09 18:34:52 +00:00
mrq
7c71f7239c expose options for CosineAnnealingLR_Restart (seems to be able to train very quickly due to the restarts 2023-03-09 14:17:01 +00:00
mrq
2f6dd9c076 some cleanup 2023-03-09 06:20:05 +00:00
mrq
5460e191b0 added loss graph, because I'm going to experiment with cosine annealing LR and I need to view my loss 2023-03-09 05:54:08 +00:00
mrq
a182df8f4e is 2023-03-09 04:33:12 +00:00
mrq
a01eb10960 (try to) unload voicefixer if it raises an error during loading voicefixer 2023-03-09 04:28:14 +00:00
mrq
dc1902b91c cleanup block that makes embedding latents for random/microphone happen, remove builtin voice options from voice list to avoid duplicates 2023-03-09 04:23:36 +00:00
mrq
797882336b maybe remedy an issue that crops up if you have a non-wav and non-json file in a results folder (assuming) 2023-03-09 04:06:07 +00:00
mrq
b64948d966 while I'm breaking things, migrating dependencies to modules folder for tidiness 2023-03-09 04:03:57 +00:00
mrq
3b4f4500d1 when you have three separate machines running and you test one one, but you accidentally revert changes because you then test on another 2023-03-09 03:26:18 +00:00
mrq
ef75dba995 I hate commas make tuples 2023-03-09 02:43:05 +00:00
mrq
f795dd5c20 you might be wondering why so many small commits instead of rolling the HEAD back one to just combine them, i don't want to force push and roll back the paperspace i'm testing in 2023-03-09 02:31:32 +00:00
mrq
51339671ec typo 2023-03-09 02:29:08 +00:00
mrq
1b18b3e335 forgot to save the simplified training input json first before touching any of the settings that dump to the yaml 2023-03-09 02:27:20 +00:00
mrq
221ac38b32 forgot to update to finetune subdir 2023-03-09 02:25:32 +00:00
mrq
0e80e311b0 added VRAM validation for a given batch:gradient accumulation size ratio (based emprically off of 6GiB, 16GiB, and 16x2GiB, would be nice to have more data on what's safe) 2023-03-09 02:08:06 +00:00
mrq
ef7b957fff oops 2023-03-09 00:53:00 +00:00
mrq
b0baa1909a forgot template 2023-03-09 00:32:35 +00:00
mrq
3f321fe664 big cleanup to make my life easier when i add more parameters 2023-03-09 00:26:47 +00:00
mrq
0ab091e7ff oops 2023-03-08 16:09:29 +00:00
mrq
34dcb845b5 actually make using adamw_zero optimizer for multi-gpus work 2023-03-08 15:31:33 +00:00
mrq
8494628f3c normalize validation batch size because i oom'd without it getting scaled 2023-03-08 05:27:20 +00:00
mrq
d7e75a51cf I forgot about the changelog and never kept up with it, so I'll just not use a changelog 2023-03-08 05:14:50 +00:00
mrq
ff07f707cb disable validation if validation dataset not found, clamp validation batch size to validation dataset size instead of simply reusing batch size, switch to adamw_zero optimizier when training with multi-gpus (because the yaml comment said to and I think it might be why I'm absolutely having garbage luck training this japanese dataset) 2023-03-08 04:47:05 +00:00
mrq
f1788a5639 lazy wrap around the voicefixer block because sometimes it just an heros itself despite having a specific block to load it beforehand 2023-03-08 04:12:22 +00:00
mrq
83b5125854 fixed notebooks, provided paperspace notebook 2023-03-08 03:29:12 +00:00
mrq
b4098dca73 made validation working (will document later) 2023-03-08 02:58:00 +00:00
mrq
a7e0dc9127 oops 2023-03-08 00:51:51 +00:00
mrq
e862169e7f set validation to save rate and validation file if exists (need to test later) 2023-03-07 20:38:31 +00:00
mrq
fe8bf7a9d1 added helper script to cull short enough lines from training set as a validation set (if it yields good results doing validation during training, i'll add it to the web ui) 2023-03-07 20:16:49 +00:00
mrq
7f89e8058a fixed update checker for dlas+tortoise-tts 2023-03-07 19:33:56 +00:00
mrq
6d7e143f53 added override for large training plots 2023-03-07 19:29:09 +00:00
mrq
3718e9d0fb set NaN alarm to show the iteration it happened it 2023-03-07 19:22:11 +00:00
mrq
c27ee3ce95 added update checking for dlas and tortoise-tts, caching voices (for a given model and voice name) so random latents will remain the same 2023-03-07 17:04:45 +00:00
mrq
166d491a98 fixes 2023-03-07 13:40:41 +00:00
mrq
df5ba634c0 brain dead 2023-03-07 05:43:26 +00:00
mrq
2726d98ee1 fried my brain trying to nail out bugs involving using solely ar model=auto 2023-03-07 05:35:21 +00:00
mrq
d7a5ad9fd9 cleaned up some model loading logic, added 'auto' mode for AR model (deduced by current voice) 2023-03-07 04:34:39 +00:00
mrq
3899f9b4e3 added (yet another) experimental voice latent calculation mode (when chunk size is 0 and theres a dataset generated, itll leverage it by padding to a common size then computing them, should help avoid splitting mid-phoneme) 2023-03-07 03:55:35 +00:00
mrq
5063728bb0 brain worms and headaches 2023-03-07 03:01:02 +00:00
mrq
0f31c34120 download dvae.pth for the people who managed to somehow put the web UI into a state where it never initializes TTS at all somehow 2023-03-07 02:47:10 +00:00
mrq
0f0b394445 moved (actually not working) setting to use BigVGAN to a dropdown to select between vocoders (for when slotting in future ones), and ability to load a new vocoder while TTS is loaded 2023-03-07 02:45:22 +00:00
mrq
e731b9ba84 reworked generating metadata to embed, should now store overrided settings 2023-03-06 23:07:16 +00:00
mrq
7798767fc6 added settings editing (will add a guide on what to do later, and an example) 2023-03-06 21:48:34 +00:00
mrq
119ac50c58 forgot to re-append the existing transcription when skipping existing (have to go back again and do the first 10% of my giant dataset 2023-03-06 16:50:55 +00:00
mrq
12c51b6057 Im not too sure if manually invoking gc actually closes all the open files from whisperx (or ROCm), but it seems to have gone away longside setting 'ulimit -Sn' to half the output of 'ulimit -Hn' 2023-03-06 16:39:37 +00:00
mrq
999878d9c6 and it turned out I wasn't even using the aligned segments, kmsing now that I have to *redo* my dataset again 2023-03-06 11:01:33 +00:00
mrq
14779a5020 Added option to skip transcribing if it exists in the output text file, because apparently whisperx will throw a "max files opened" error when using ROCm because it does not close some file descriptors if you're batch-transcribing or something, so poor little me, who's retranscribing his japanese dataset for the 305823042th time woke up to it partially done i am so mad I have to wait another few hours for it to continue when I was hoping to wake up to it done 2023-03-06 10:47:06 +00:00
mrq
0e3bbc55f8 added api_name for generation, added whisperx backend, relocated use whispercpp option to whisper backend list 2023-03-06 05:21:33 +00:00
mrq
788a957f79 stretch loss plot to target iteration just so its not so misleading with the scale 2023-03-06 00:44:29 +00:00
mrq
5be14abc21 UI cleanup, actually fix syncing the epoch counter (i hope), setting auto-suggest voice chunk size whatever to 0 will just split based on the average duration length, signal when a NaN info value is detected (there's some safeties in the training, but it will inevitably fuck the model) 2023-03-05 23:55:27 +00:00