8ed09f9b87Merge pull request 'Catch OOM and run whisper on cpu automatically.' (#117) from zim33/ai-voice-cloning:vram into master
mrq
2023-03-12 05:09:53 +0000
61500107abCatch OOM and run whisper on cpu automatically.tigi63462023-03-12 06:48:28 +0200
ede9804b76added option to trim silence using torchaudio's VADmrq2023-03-11 21:41:35 +0000
dea2fa9cafadded fields to offset start/end slices to apply in bulk when slicingmrq2023-03-11 21:34:29 +0000
89bb3d4419rename transcribe button since it does more than transcribemrq2023-03-11 21:18:04 +0000
382a3e4104rely on the whisper.json for handling a lot more thingsmrq2023-03-11 21:17:11 +0000
94551fb9acsplit slicing dataset routine so it can be done after the factmrq2023-03-11 17:27:01 +0000
e3fdb79b49rocm5.2 works for me desu so I bumped it back upmrq2023-03-11 17:02:56 +0000
e680d84a13removed the hotfix pip installs that whisperx requires now that whisperx is gonemrq2023-03-11 16:55:19 +0000
cf41492f76fall back to normal behavior if theres actually no audiofiles loaded from the dataset when using it for computing latentsmrq2023-03-11 16:46:03 +0000
2424c455cbadded option to not slice audio when transcribing, added option to prepare validation dataset on audio duration, added a warning if youre using whisperx and you're slicing audiomrq2023-03-11 16:32:35 +0000
6ef5bae46aadded cpu option for whisperx only.tigi63462023-03-11 08:23:35 +0200
008a1f5f8fsimplified spawning the training process by having it spawn the distributed training processes in the train.py script, so it should work on Windows toomrq2023-03-11 01:37:00 +0000
2feb6da0c0cleanups and fixes, fix DLAS throwing errors from '''too short of sound files''' by just culling them during transcriptionmrq2023-03-11 01:19:49 +0000
7f2da0f5fbrewrote how AIVC gets training metrics (need to clean up later)mrq2023-03-10 22:35:32 +0000
df0edacc60fix the cleanup actually only doing 2 despite requesting more than 2, surprised no one has pointed it outmrq2023-03-10 14:04:07 +0000
8e890d3023forgot to fix reset settings to use the new arg-agnostic waymrq2023-03-10 13:49:39 +0000
0b364b590emaybe don't --force-reinstall to try and force downgrading, it just forces everything to uninstall then reinstallmrq2023-03-10 04:22:47 +0000
c231d842aamake dependencies after the one in this repo force reinstall to downgrade, i hope, I hav eother things to do than validate this worksmrq2023-03-10 03:53:21 +0000
c92b006129I really hate YAMLmrq2023-03-10 03:48:46 +0000
d3184004fdonly God knows why the YAML spec lets you specify string values without quotesmrq2023-03-10 01:58:30 +0000
eb1551ee92what I thought was an override and not a ternarymrq2023-03-09 23:04:02 +0000
c3b43d2429today I learned adamw_zero actually negates ANY LR schemesmrq2023-03-09 19:42:31 +0000
a01eb10960(try to) unload voicefixer if it raises an error during loading voicefixermrq2023-03-09 04:28:14 +0000
dc1902b91ccleanup block that makes embedding latents for random/microphone happen, remove builtin voice options from voice list to avoid duplicatesmrq2023-03-09 04:23:36 +0000
797882336bmaybe remedy an issue that crops up if you have a non-wav and non-json file in a results folder (assuming)mrq2023-03-09 04:06:07 +0000
b64948d966while I'm breaking things, migrating dependencies to modules folder for tidinessmrq2023-03-09 04:03:57 +0000
b8867a5fb0added the mysterious tortoise_compat flag mentioned in DLAS repomrq2023-03-09 03:41:40 +0000
3b4f4500d1when you have three separate machines running and you test one one, but you accidentally revert changes because you then test on anothermrq2023-03-09 03:26:18 +0000
ef75dba995I hate commas make tuplesmrq2023-03-09 02:43:05 +0000
f795dd5c20you might be wondering why so many small commits instead of rolling the HEAD back one to just combine them, i don't want to force push and roll back the paperspace i'm testing inmrq2023-03-09 02:31:32 +0000
1b18b3e335forgot to save the simplified training input json first before touching any of the settings that dump to the yamlmrq2023-03-09 02:27:20 +0000
221ac38b32forgot to update to finetune subdirmrq2023-03-09 02:25:32 +0000
0e80e311b0added VRAM validation for a given batch:gradient accumulation size ratio (based emprically off of 6GiB, 16GiB, and 16x2GiB, would be nice to have more data on what's safe)mrq2023-03-09 02:08:06 +0000
d58b67004acolab notebook uses venv and normal scripts to keep it on parity with a local install (and it literally just works stop creating issues for someething inconsistent with known solutions)mrq2023-03-08 15:51:13 +0000
34dcb845b5actually make using adamw_zero optimizer for multi-gpus workmrq2023-03-08 15:31:33 +0000
8494628f3cnormalize validation batch size because i oom'd without it getting scaledmrq2023-03-08 05:27:20 +0000
d7e75a51cfI forgot about the changelog and never kept up with it, so I'll just not use a changelogmrq2023-03-08 05:14:50 +0000
ff07f707cbdisable validation if validation dataset not found, clamp validation batch size to validation dataset size instead of simply reusing batch size, switch to adamw_zero optimizier when training with multi-gpus (because the yaml comment said to and I think it might be why I'm absolutely having garbage luck training this japanese dataset)mrq2023-03-08 04:47:05 +0000
f1788a5639lazy wrap around the voicefixer block because sometimes it just an heros itself despite having a specific block to load it beforehandmrq2023-03-08 04:12:22 +0000
e862169e7fset validation to save rate and validation file if exists (need to test later)mrq2023-03-07 20:38:31 +0000
fe8bf7a9d1added helper script to cull short enough lines from training set as a validation set (if it yields good results doing validation during training, i'll add it to the web ui)mrq2023-03-07 20:16:49 +0000
7f89e8058afixed update checker for dlas+tortoise-ttsmrq2023-03-07 19:33:56 +0000
6d7e143f53added override for large training plotsmrq2023-03-07 19:29:09 +0000
3718e9d0fbset NaN alarm to show the iteration it happened itmrq2023-03-07 19:22:11 +0000
c27ee3ce95added update checking for dlas and tortoise-tts, caching voices (for a given model and voice name) so random latents will remain the samemrq2023-03-07 17:04:45 +0000
2726d98ee1fried my brain trying to nail out bugs involving using solely ar model=automrq2023-03-07 05:35:21 +0000
d7a5ad9fd9cleaned up some model loading logic, added 'auto' mode for AR model (deduced by current voice)mrq2023-03-07 04:34:39 +0000
3899f9b4e3added (yet another) experimental voice latent calculation mode (when chunk size is 0 and theres a dataset generated, itll leverage it by padding to a common size then computing them, should help avoid splitting mid-phoneme)mrq2023-03-07 03:55:35 +0000
5063728bb0brain worms and headachesmrq2023-03-07 03:01:02 +0000
0f31c34120download dvae.pth for the people who managed to somehow put the web UI into a state where it never initializes TTS at all somehowmrq2023-03-07 02:47:10 +0000
0f0b394445moved (actually not working) setting to use BigVGAN to a dropdown to select between vocoders (for when slotting in future ones), and ability to load a new vocoder while TTS is loadedmrq2023-03-07 02:45:22 +0000
e731b9ba84reworked generating metadata to embed, should now store overrided settingsmrq2023-03-06 23:07:16 +0000
7798767fc6added settings editing (will add a guide on what to do later, and an example)mrq2023-03-06 21:48:34 +0000
119ac50c58forgot to re-append the existing transcription when skipping existing (have to go back again and do the first 10% of my giant datasetmrq2023-03-06 16:50:55 +0000
11a1f6a00eforgot to reorder the dependency install because whisperx needs to be installed before DLASmrq2023-03-06 16:43:17 +0000
12c51b6057Im not too sure if manually invoking gc actually closes all the open files from whisperx (or ROCm), but it seems to have gone away longside setting 'ulimit -Sn' to half the output of 'ulimit -Hn'mrq2023-03-06 16:39:37 +0000
999878d9c6and it turned out I wasn't even using the aligned segments, kmsing now that I have to *redo* my dataset againmrq2023-03-06 11:01:33 +0000
14779a5020Added option to skip transcribing if it exists in the output text file, because apparently whisperx will throw a "max files opened" error when using ROCm because it does not close some file descriptors if you're batch-transcribing or something, so poor little me, who's retranscribing his japanese dataset for the 305823042th time woke up to it partially done i am so mad I have to wait another few hours for it to continue when I was hoping to wake up to it donemrq2023-03-06 10:47:06 +0000
0e3bbc55f8added api_name for generation, added whisperx backend, relocated use whispercpp option to whisper backend listmrq2023-03-06 05:21:33 +0000
5be14abc21UI cleanup, actually fix syncing the epoch counter (i hope), setting auto-suggest voice chunk size whatever to 0 will just split based on the average duration length, signal when a NaN info value is detected (there's some safeties in the training, but it will inevitably fuck the model)mrq2023-03-05 23:55:27 +0000
287738a338(should) fix reported epoch metric desyncing from defacto metric, fixed finding next milestone from wrong sign because of 2AM brainmrq2023-03-05 20:42:45 +0000