51ddc205cd
update submodules
2023-03-12 18:14:36 +00:00
ccbf2e6aff
blame mrq/ai-voice-cloning#122
2023-03-12 17:51:52 +00:00
9238df0b03
fixed last generation settings not actually load because brain worms
2023-03-12 15:49:50 +00:00
9594a960b0
Disable loss ETA for now until I fix it
2023-03-12 15:39:54 +00:00
mrq
be8b290a1a
Merge branch 'master' into save_more_user_config
2023-03-12 15:38:08 +00:00
296129ba9c
output fixes, I'm not sure why ETA wasn't working but it works in testing
2023-03-12 15:17:07 +00:00
098d7ad635
uh I don't remember, small things
2023-03-12 14:47:48 +00:00
233baa4e45
updated several default configurations to not cause null/empty errors. also default samples/iterations to 16-30 ultra fast which is typically suggested.
2023-03-12 16:08:02 +02:00
29b3d1ae1d
Fixed Keep X Previous States
2023-03-12 08:01:08 +02:00
9e320a34c8
Fixed Keep X Previous States
2023-03-12 08:00:03 +02:00
61500107ab
Catch OOM and run whisper on cpu automatically.
2023-03-12 06:48:28 +02:00
ede9804b76
added option to trim silence using torchaudio's VAD
2023-03-11 21:41:35 +00:00
dea2fa9caf
added fields to offset start/end slices to apply in bulk when slicing
2023-03-11 21:34:29 +00:00
89bb3d4419
rename transcribe button since it does more than transcribe
2023-03-11 21:18:04 +00:00
382a3e4104
rely on the whisper.json for handling a lot more things
2023-03-11 21:17:11 +00:00
9b376c381f
brain worm
2023-03-11 18:14:32 +00:00
94551fb9ac
split slicing dataset routine so it can be done after the fact
2023-03-11 17:27:01 +00:00
e3fdb79b49
rocm5.2 works for me desu so I bumped it back up
2023-03-11 17:02:56 +00:00
cf41492f76
fall back to normal behavior if theres actually no audiofiles loaded from the dataset when using it for computing latents
2023-03-11 16:46:03 +00:00
b90c164778
Farewell, parasite
2023-03-11 16:40:34 +00:00
2424c455cb
added option to not slice audio when transcribing, added option to prepare validation dataset on audio duration, added a warning if youre using whisperx and you're slicing audio
2023-03-11 16:32:35 +00:00
tigi6346
dcdcf8516c
master ( #112 )
...
Fixes Gradio bugging out when attempting to load a missing train.json.
Reviewed-on: mrq/ai-voice-cloning#112
Co-authored-by: tigi6346 <tigi6346@noreply.localhost>
Co-committed-by: tigi6346 <tigi6346@noreply.localhost>
2023-03-11 03:28:04 +00:00
008a1f5f8f
simplified spawning the training process by having it spawn the distributed training processes in the train.py script, so it should work on Windows too
2023-03-11 01:37:00 +00:00
2feb6da0c0
cleanups and fixes, fix DLAS throwing errors from '''too short of sound files''' by just culling them during transcription
2023-03-11 01:19:49 +00:00
7f2da0f5fb
rewrote how AIVC gets training metrics (need to clean up later)
2023-03-10 22:35:32 +00:00
df0edacc60
fix the cleanup actually only doing 2 despite requesting more than 2, surprised no one has pointed it out
2023-03-10 14:04:07 +00:00
8e890d3023
forgot to fix reset settings to use the new arg-agnostic way
2023-03-10 13:49:39 +00:00
c92b006129
I really hate YAML
2023-03-10 03:48:46 +00:00
eb1551ee92
what I thought was an override and not a ternary
2023-03-09 23:04:02 +00:00
c3b43d2429
today I learned adamw_zero actually negates ANY LR schemes
2023-03-09 19:42:31 +00:00
cb273b8428
cleanup
2023-03-09 18:34:52 +00:00
7c71f7239c
expose options for CosineAnnealingLR_Restart (seems to be able to train very quickly due to the restarts
2023-03-09 14:17:01 +00:00
2f6dd9c076
some cleanup
2023-03-09 06:20:05 +00:00
5460e191b0
added loss graph, because I'm going to experiment with cosine annealing LR and I need to view my loss
2023-03-09 05:54:08 +00:00
a182df8f4e
is
2023-03-09 04:33:12 +00:00
a01eb10960
(try to) unload voicefixer if it raises an error during loading voicefixer
2023-03-09 04:28:14 +00:00
dc1902b91c
cleanup block that makes embedding latents for random/microphone happen, remove builtin voice options from voice list to avoid duplicates
2023-03-09 04:23:36 +00:00
797882336b
maybe remedy an issue that crops up if you have a non-wav and non-json file in a results folder (assuming)
2023-03-09 04:06:07 +00:00
b64948d966
while I'm breaking things, migrating dependencies to modules folder for tidiness
2023-03-09 04:03:57 +00:00
3b4f4500d1
when you have three separate machines running and you test one one, but you accidentally revert changes because you then test on another
2023-03-09 03:26:18 +00:00
ef75dba995
I hate commas make tuples
2023-03-09 02:43:05 +00:00
f795dd5c20
you might be wondering why so many small commits instead of rolling the HEAD back one to just combine them, i don't want to force push and roll back the paperspace i'm testing in
2023-03-09 02:31:32 +00:00
51339671ec
typo
2023-03-09 02:29:08 +00:00
1b18b3e335
forgot to save the simplified training input json first before touching any of the settings that dump to the yaml
2023-03-09 02:27:20 +00:00
221ac38b32
forgot to update to finetune subdir
2023-03-09 02:25:32 +00:00
0e80e311b0
added VRAM validation for a given batch:gradient accumulation size ratio (based emprically off of 6GiB, 16GiB, and 16x2GiB, would be nice to have more data on what's safe)
2023-03-09 02:08:06 +00:00
ef7b957fff
oops
2023-03-09 00:53:00 +00:00
b0baa1909a
forgot template
2023-03-09 00:32:35 +00:00
3f321fe664
big cleanup to make my life easier when i add more parameters
2023-03-09 00:26:47 +00:00
0ab091e7ff
oops
2023-03-08 16:09:29 +00:00
34dcb845b5
actually make using adamw_zero optimizer for multi-gpus work
2023-03-08 15:31:33 +00:00
8494628f3c
normalize validation batch size because i oom'd without it getting scaled
2023-03-08 05:27:20 +00:00
d7e75a51cf
I forgot about the changelog and never kept up with it, so I'll just not use a changelog
2023-03-08 05:14:50 +00:00
ff07f707cb
disable validation if validation dataset not found, clamp validation batch size to validation dataset size instead of simply reusing batch size, switch to adamw_zero optimizier when training with multi-gpus (because the yaml comment said to and I think it might be why I'm absolutely having garbage luck training this japanese dataset)
2023-03-08 04:47:05 +00:00
f1788a5639
lazy wrap around the voicefixer block because sometimes it just an heros itself despite having a specific block to load it beforehand
2023-03-08 04:12:22 +00:00
83b5125854
fixed notebooks, provided paperspace notebook
2023-03-08 03:29:12 +00:00
b4098dca73
made validation working (will document later)
2023-03-08 02:58:00 +00:00
a7e0dc9127
oops
2023-03-08 00:51:51 +00:00
e862169e7f
set validation to save rate and validation file if exists (need to test later)
2023-03-07 20:38:31 +00:00
fe8bf7a9d1
added helper script to cull short enough lines from training set as a validation set (if it yields good results doing validation during training, i'll add it to the web ui)
2023-03-07 20:16:49 +00:00
7f89e8058a
fixed update checker for dlas+tortoise-tts
2023-03-07 19:33:56 +00:00
6d7e143f53
added override for large training plots
2023-03-07 19:29:09 +00:00
3718e9d0fb
set NaN alarm to show the iteration it happened it
2023-03-07 19:22:11 +00:00
c27ee3ce95
added update checking for dlas and tortoise-tts, caching voices (for a given model and voice name) so random latents will remain the same
2023-03-07 17:04:45 +00:00
166d491a98
fixes
2023-03-07 13:40:41 +00:00
df5ba634c0
brain dead
2023-03-07 05:43:26 +00:00
2726d98ee1
fried my brain trying to nail out bugs involving using solely ar model=auto
2023-03-07 05:35:21 +00:00
d7a5ad9fd9
cleaned up some model loading logic, added 'auto' mode for AR model (deduced by current voice)
2023-03-07 04:34:39 +00:00
3899f9b4e3
added (yet another) experimental voice latent calculation mode (when chunk size is 0 and theres a dataset generated, itll leverage it by padding to a common size then computing them, should help avoid splitting mid-phoneme)
2023-03-07 03:55:35 +00:00
5063728bb0
brain worms and headaches
2023-03-07 03:01:02 +00:00
0f31c34120
download dvae.pth for the people who managed to somehow put the web UI into a state where it never initializes TTS at all somehow
2023-03-07 02:47:10 +00:00
0f0b394445
moved (actually not working) setting to use BigVGAN to a dropdown to select between vocoders (for when slotting in future ones), and ability to load a new vocoder while TTS is loaded
2023-03-07 02:45:22 +00:00
e731b9ba84
reworked generating metadata to embed, should now store overrided settings
2023-03-06 23:07:16 +00:00
7798767fc6
added settings editing (will add a guide on what to do later, and an example)
2023-03-06 21:48:34 +00:00
119ac50c58
forgot to re-append the existing transcription when skipping existing (have to go back again and do the first 10% of my giant dataset
2023-03-06 16:50:55 +00:00
12c51b6057
Im not too sure if manually invoking gc actually closes all the open files from whisperx (or ROCm), but it seems to have gone away longside setting 'ulimit -Sn' to half the output of 'ulimit -Hn'
2023-03-06 16:39:37 +00:00
999878d9c6
and it turned out I wasn't even using the aligned segments, kmsing now that I have to *redo* my dataset again
2023-03-06 11:01:33 +00:00
14779a5020
Added option to skip transcribing if it exists in the output text file, because apparently whisperx will throw a "max files opened" error when using ROCm because it does not close some file descriptors if you're batch-transcribing or something, so poor little me, who's retranscribing his japanese dataset for the 305823042th time woke up to it partially done i am so mad I have to wait another few hours for it to continue when I was hoping to wake up to it done
2023-03-06 10:47:06 +00:00
0e3bbc55f8
added api_name for generation, added whisperx backend, relocated use whispercpp option to whisper backend list
2023-03-06 05:21:33 +00:00
788a957f79
stretch loss plot to target iteration just so its not so misleading with the scale
2023-03-06 00:44:29 +00:00
5be14abc21
UI cleanup, actually fix syncing the epoch counter (i hope), setting auto-suggest voice chunk size whatever to 0 will just split based on the average duration length, signal when a NaN info value is detected (there's some safeties in the training, but it will inevitably fuck the model)
2023-03-05 23:55:27 +00:00
287738a338
(should) fix reported epoch metric desyncing from defacto metric, fixed finding next milestone from wrong sign because of 2AM brain
2023-03-05 20:42:45 +00:00
206a14fdbe
brianworms
2023-03-05 20:30:27 +00:00
b82961ba8a
typo
2023-03-05 20:13:39 +00:00
b2e89d8da3
oops
2023-03-05 19:58:15 +00:00
8094401a6d
print in e-notation for LR
2023-03-05 19:48:24 +00:00
8b9c9e1bbf
remove redundant stats, add showing LR
2023-03-05 18:53:12 +00:00
0231550287
forgot to remove a debug print
2023-03-05 18:27:16 +00:00
d97639e138
whispercpp actually works now (language loading was weird, slicing needed to divide time by 100), transcribing audio checks for silence and discards them
2023-03-05 17:54:36 +00:00
b8a620e8d7
actually accumulate derivatives when estimating milestones and final loss by using half of the log
2023-03-05 14:39:24 +00:00
35225a35da
oops v2
2023-03-05 14:19:41 +00:00
b5e9899bbf
5 hour sleep brained
2023-03-05 13:37:05 +00:00
cd8702ab0d
oops
2023-03-05 13:24:07 +00:00
d312019d05
reordered things so it uses fresh data and not last-updated data
2023-03-05 07:37:27 +00:00
ce3866d0cd
added '''estimating''' iterations until milestones (lr=[1, 0.5, 0.1] and final lr, very, very inaccurate because it uses instantaneous delta lr, I'll need to do a riemann sum later
2023-03-05 06:45:07 +00:00
1316331be3
forgot to try and have it try and auto-detect for openai/whisper when no language is specified
2023-03-05 05:22:35 +00:00
3e220ed306
added option to set worker size in training config generator (because the default is overkill), for whisper transcriptions, load a specialized language model if it exists (for now, only english), output transcription to web UI when done transcribing
2023-03-05 05:17:19 +00:00
37cab14272
use torchrun instead for multigpu
2023-03-04 20:53:00 +00:00
5026d93ecd
sloppy fix to actually kill children when using multi-GPU distributed training, set GPU training count based on what CUDA exposes automatically so I don't have to keep setting it to 2
2023-03-04 20:42:54 +00:00
1a9d159b2a
forgot to add 'bs / gradient accum < 2 clamp validation logic
2023-03-04 17:37:08 +00:00