ai-voice-cloning-oneapi

a-One-Fan/ai-voice-cloning-oneapi

Fork 0

forked from mrq/ai-voice-cloning

0cf9db5e69 oops mrq 2023-03-13 01:33:45 +0000
050bcefd73 resample to 22.5K when creating training inputs (to avoid redundant downsampling when loaded for training, even though most of my inputs are already at 22.5K), generalized resampler function to cache and reuse them, do not unload whisper when done transcribing since it gets unloaded anyways for any other non-transcription task mrq 2023-03-13 01:20:55 +0000
7c9c0dc584 forgot to clean up debug prints mrq 2023-03-13 00:44:37 +0000
239c984850 move validating audio to creating the text files instead, consider audio longer than 11 seconds invalid, consider text lengths over 200 invalid mrq 2023-03-12 23:39:00 +0000
51ddc205cd update submodules mrq 2023-03-12 18:14:36 +0000
ccbf2e6aff blame mrq/ai-voice-cloning#122 mrq 2023-03-12 17:51:52 +0000
9238df0b03 fixed last generation settings not actually load because brain worms mrq 2023-03-12 15:49:50 +0000
9594a960b0 Disable loss ETA for now until I fix it mrq 2023-03-12 15:39:54 +0000
51f6c347fe Merge pull request 'updated several default configurations to not cause null/empty errors. also default samples/iterations to 16-30 ultra fast which is typically suggested.' (#122) from zim33/ai-voice-cloning:save_more_user_config into master mrq 2023-03-12 15:38:34 +0000
be8b290a1a Merge branch 'master' into save_more_user_config mrq 2023-03-12 15:38:08 +0000
296129ba9c output fixes, I'm not sure why ETA wasn't working but it works in testing mrq 2023-03-12 15:17:07 +0000
098d7ad635 uh I don't remember, small things mrq 2023-03-12 14:47:48 +0000
233baa4e45 updated several default configurations to not cause null/empty errors. also default samples/iterations to 16-30 ultra fast which is typically suggested. tigi6346 2023-03-12 16:08:02 +0200
1ac278e885 Merge pull request 'keep_training' (#118) from zim33/ai-voice-cloning:keep_training into master mrq 2023-03-12 06:47:01 +0000
29b3d1ae1d Fixed Keep X Previous States tigi6346 2023-03-12 08:01:08 +0200
9e320a34c8 Fixed Keep X Previous States tigi6346 2023-03-12 08:00:03 +0200
8ed09f9b87 Merge pull request 'Catch OOM and run whisper on cpu automatically.' (#117) from zim33/ai-voice-cloning:vram into master mrq 2023-03-12 05:09:53 +0000
61500107ab Catch OOM and run whisper on cpu automatically. tigi6346 2023-03-12 06:48:28 +0200
ede9804b76 added option to trim silence using torchaudio's VAD mrq 2023-03-11 21:41:35 +0000
dea2fa9caf added fields to offset start/end slices to apply in bulk when slicing mrq 2023-03-11 21:34:29 +0000
89bb3d4419 rename transcribe button since it does more than transcribe mrq 2023-03-11 21:18:04 +0000
382a3e4104 rely on the whisper.json for handling a lot more things mrq 2023-03-11 21:17:11 +0000
9b376c381f brain worm mrq 2023-03-11 18:14:32 +0000
94551fb9ac split slicing dataset routine so it can be done after the fact mrq 2023-03-11 17:27:01 +0000
e3fdb79b49 rocm5.2 works for me desu so I bumped it back up mrq 2023-03-11 17:02:56 +0000
e680d84a13 removed the hotfix pip installs that whisperx requires now that whisperx is gone mrq 2023-03-11 16:55:19 +0000
cf41492f76 fall back to normal behavior if theres actually no audiofiles loaded from the dataset when using it for computing latents mrq 2023-03-11 16:46:03 +0000
b90c164778 Farewell, parasite mrq 2023-03-11 16:40:34 +0000
2424c455cb added option to not slice audio when transcribing, added option to prepare validation dataset on audio duration, added a warning if youre using whisperx and you're slicing audio mrq 2023-03-11 16:32:35 +0000
dcdcf8516c master (#112) tigi6346 2023-03-11 03:28:04 +0000
008a1f5f8f simplified spawning the training process by having it spawn the distributed training processes in the train.py script, so it should work on Windows too mrq 2023-03-11 01:37:00 +0000
2feb6da0c0 cleanups and fixes, fix DLAS throwing errors from '''too short of sound files''' by just culling them during transcription mrq 2023-03-11 01:19:49 +0000
7f2da0f5fb rewrote how AIVC gets training metrics (need to clean up later) mrq 2023-03-10 22:35:32 +0000
df0edacc60 fix the cleanup actually only doing 2 despite requesting more than 2, surprised no one has pointed it out mrq 2023-03-10 14:04:07 +0000
8e890d3023 forgot to fix reset settings to use the new arg-agnostic way mrq 2023-03-10 13:49:39 +0000
d250e0ec17 brain fried mrq 2023-03-10 04:27:34 +0000
0b364b590e maybe don't --force-reinstall to try and force downgrading, it just forces everything to uninstall then reinstall mrq 2023-03-10 04:22:47 +0000
c231d842aa make dependencies after the one in this repo force reinstall to downgrade, i hope, I hav eother things to do than validate this works mrq 2023-03-10 03:53:21 +0000
c92b006129 I really hate YAML mrq 2023-03-10 03:48:46 +0000
d3184004fd only God knows why the YAML spec lets you specify string values without quotes mrq 2023-03-10 01:58:30 +0000
eb1551ee92 what I thought was an override and not a ternary mrq 2023-03-09 23:04:02 +0000
c3b43d2429 today I learned adamw_zero actually negates ANY LR schemes mrq 2023-03-09 19:42:31 +0000
cb273b8428 cleanup mrq 2023-03-09 18:34:52 +0000
7c71f7239c expose options for CosineAnnealingLR_Restart (seems to be able to train very quickly due to the restarts mrq 2023-03-09 14:17:01 +0000
2f6dd9c076 some cleanup mrq 2023-03-09 06:20:05 +0000
5460e191b0 added loss graph, because I'm going to experiment with cosine annealing LR and I need to view my loss mrq 2023-03-09 05:54:08 +0000
a182df8f4e is mrq 2023-03-09 04:33:12 +0000
a01eb10960 (try to) unload voicefixer if it raises an error during loading voicefixer mrq 2023-03-09 04:28:14 +0000
dc1902b91c cleanup block that makes embedding latents for random/microphone happen, remove builtin voice options from voice list to avoid duplicates mrq 2023-03-09 04:23:36 +0000
797882336b maybe remedy an issue that crops up if you have a non-wav and non-json file in a results folder (assuming) mrq 2023-03-09 04:06:07 +0000
b64948d966 while I'm breaking things, migrating dependencies to modules folder for tidiness mrq 2023-03-09 04:03:57 +0000
b8867a5fb0 added the mysterious tortoise_compat flag mentioned in DLAS repo mrq 2023-03-09 03:41:40 +0000
3b4f4500d1 when you have three separate machines running and you test one one, but you accidentally revert changes because you then test on another mrq 2023-03-09 03:26:18 +0000
ef75dba995 I hate commas make tuples mrq 2023-03-09 02:43:05 +0000
f795dd5c20 you might be wondering why so many small commits instead of rolling the HEAD back one to just combine them, i don't want to force push and roll back the paperspace i'm testing in mrq 2023-03-09 02:31:32 +0000
51339671ec typo mrq 2023-03-09 02:29:08 +0000
1b18b3e335 forgot to save the simplified training input json first before touching any of the settings that dump to the yaml mrq 2023-03-09 02:27:20 +0000
221ac38b32 forgot to update to finetune subdir mrq 2023-03-09 02:25:32 +0000
0e80e311b0 added VRAM validation for a given batch:gradient accumulation size ratio (based emprically off of 6GiB, 16GiB, and 16x2GiB, would be nice to have more data on what's safe) mrq 2023-03-09 02:08:06 +0000
ef7b957fff oops mrq 2023-03-09 00:53:00 +0000
b0baa1909a forgot template mrq 2023-03-09 00:32:35 +0000
3f321fe664 big cleanup to make my life easier when i add more parameters mrq 2023-03-09 00:26:47 +0000
0ab091e7ff oops mrq 2023-03-08 16:09:29 +0000
40e8d0774e share if you mrq 2023-03-08 15:59:16 +0000
d58b67004a colab notebook uses venv and normal scripts to keep it on parity with a local install (and it literally just works stop creating issues for someething inconsistent with known solutions) mrq 2023-03-08 15:51:13 +0000
34dcb845b5 actually make using adamw_zero optimizer for multi-gpus work mrq 2023-03-08 15:31:33 +0000
8494628f3c normalize validation batch size because i oom'd without it getting scaled mrq 2023-03-08 05:27:20 +0000
d7e75a51cf I forgot about the changelog and never kept up with it, so I'll just not use a changelog mrq 2023-03-08 05:14:50 +0000
ff07f707cb disable validation if validation dataset not found, clamp validation batch size to validation dataset size instead of simply reusing batch size, switch to adamw_zero optimizier when training with multi-gpus (because the yaml comment said to and I think it might be why I'm absolutely having garbage luck training this japanese dataset) mrq 2023-03-08 04:47:05 +0000
f1788a5639 lazy wrap around the voicefixer block because sometimes it just an heros itself despite having a specific block to load it beforehand mrq 2023-03-08 04:12:22 +0000
83b5125854 fixed notebooks, provided paperspace notebook mrq 2023-03-08 03:29:12 +0000
b4098dca73 made validation working (will document later) mrq 2023-03-08 02:58:00 +0000
a7e0dc9127 oops mrq 2023-03-08 00:51:51 +0000
e862169e7f set validation to save rate and validation file if exists (need to test later) mrq 2023-03-07 20:38:31 +0000
fe8bf7a9d1 added helper script to cull short enough lines from training set as a validation set (if it yields good results doing validation during training, i'll add it to the web ui) mrq 2023-03-07 20:16:49 +0000
7f89e8058a fixed update checker for dlas+tortoise-tts mrq 2023-03-07 19:33:56 +0000
6d7e143f53 added override for large training plots mrq 2023-03-07 19:29:09 +0000
3718e9d0fb set NaN alarm to show the iteration it happened it mrq 2023-03-07 19:22:11 +0000
c27ee3ce95 added update checking for dlas and tortoise-tts, caching voices (for a given model and voice name) so random latents will remain the same mrq 2023-03-07 17:04:45 +0000
166d491a98 fixes mrq 2023-03-07 13:40:41 +0000
df5ba634c0 brain dead mrq 2023-03-07 05:43:26 +0000
2726d98ee1 fried my brain trying to nail out bugs involving using solely ar model=auto mrq 2023-03-07 05:35:21 +0000
d7a5ad9fd9 cleaned up some model loading logic, added 'auto' mode for AR model (deduced by current voice) mrq 2023-03-07 04:34:39 +0000
3899f9b4e3 added (yet another) experimental voice latent calculation mode (when chunk size is 0 and theres a dataset generated, itll leverage it by padding to a common size then computing them, should help avoid splitting mid-phoneme) mrq 2023-03-07 03:55:35 +0000
5063728bb0 brain worms and headaches mrq 2023-03-07 03:01:02 +0000
0f31c34120 download dvae.pth for the people who managed to somehow put the web UI into a state where it never initializes TTS at all somehow mrq 2023-03-07 02:47:10 +0000
0f0b394445 moved (actually not working) setting to use BigVGAN to a dropdown to select between vocoders (for when slotting in future ones), and ability to load a new vocoder while TTS is loaded mrq 2023-03-07 02:45:22 +0000
e731b9ba84 reworked generating metadata to embed, should now store overrided settings mrq 2023-03-06 23:07:16 +0000
7798767fc6 added settings editing (will add a guide on what to do later, and an example) mrq 2023-03-06 21:48:34 +0000
119ac50c58 forgot to re-append the existing transcription when skipping existing (have to go back again and do the first 10% of my giant dataset mrq 2023-03-06 16:50:55 +0000
da0af4c498 one more mrq 2023-03-06 16:47:34 +0000
11a1f6a00e forgot to reorder the dependency install because whisperx needs to be installed before DLAS mrq 2023-03-06 16:43:17 +0000
12c51b6057 Im not too sure if manually invoking gc actually closes all the open files from whisperx (or ROCm), but it seems to have gone away longside setting 'ulimit -Sn' to half the output of 'ulimit -Hn' mrq 2023-03-06 16:39:37 +0000
999878d9c6 and it turned out I wasn't even using the aligned segments, kmsing now that I have to *redo* my dataset again mrq 2023-03-06 11:01:33 +0000
14779a5020 Added option to skip transcribing if it exists in the output text file, because apparently whisperx will throw a "max files opened" error when using ROCm because it does not close some file descriptors if you're batch-transcribing or something, so poor little me, who's retranscribing his japanese dataset for the 305823042th time woke up to it partially done i am so mad I have to wait another few hours for it to continue when I was hoping to wake up to it done mrq 2023-03-06 10:47:06 +0000
0e3bbc55f8 added api_name for generation, added whisperx backend, relocated use whispercpp option to whisper backend list mrq 2023-03-06 05:21:33 +0000
788a957f79 stretch loss plot to target iteration just so its not so misleading with the scale mrq 2023-03-06 00:44:29 +0000
5be14abc21 UI cleanup, actually fix syncing the epoch counter (i hope), setting auto-suggest voice chunk size whatever to 0 will just split based on the average duration length, signal when a NaN info value is detected (there's some safeties in the training, but it will inevitably fuck the model) mrq 2023-03-05 23:55:27 +0000
287738a338 (should) fix reported epoch metric desyncing from defacto metric, fixed finding next milestone from wrong sign because of 2AM brain mrq 2023-03-05 20:42:45 +0000
206a14fdbe brianworms mrq 2023-03-05 20:30:27 +0000

Commit Graph Select branches Hide Pull Requests master Mono Color

Commit Graph

Select branches

Hide Pull Requests

master