|
d58b67004a
|
colab notebook uses venv and normal scripts to keep it on parity with a local install (and it literally just works stop creating issues for someething inconsistent with known solutions)
|
2023-03-08 15:51:13 +00:00 |
|
|
34dcb845b5
|
actually make using adamw_zero optimizer for multi-gpus work
|
2023-03-08 15:31:33 +00:00 |
|
|
8494628f3c
|
normalize validation batch size because i oom'd without it getting scaled
|
2023-03-08 05:27:20 +00:00 |
|
|
d7e75a51cf
|
I forgot about the changelog and never kept up with it, so I'll just not use a changelog
|
2023-03-08 05:14:50 +00:00 |
|
|
ff07f707cb
|
disable validation if validation dataset not found, clamp validation batch size to validation dataset size instead of simply reusing batch size, switch to adamw_zero optimizier when training with multi-gpus (because the yaml comment said to and I think it might be why I'm absolutely having garbage luck training this japanese dataset)
|
2023-03-08 04:47:05 +00:00 |
|
|
f1788a5639
|
lazy wrap around the voicefixer block because sometimes it just an heros itself despite having a specific block to load it beforehand
|
2023-03-08 04:12:22 +00:00 |
|
|
83b5125854
|
fixed notebooks, provided paperspace notebook
|
2023-03-08 03:29:12 +00:00 |
|
|
b4098dca73
|
made validation working (will document later)
|
2023-03-08 02:58:00 +00:00 |
|
|
a7e0dc9127
|
oops
|
2023-03-08 00:51:51 +00:00 |
|
|
e862169e7f
|
set validation to save rate and validation file if exists (need to test later)
|
2023-03-07 20:38:31 +00:00 |
|
|
fe8bf7a9d1
|
added helper script to cull short enough lines from training set as a validation set (if it yields good results doing validation during training, i'll add it to the web ui)
|
2023-03-07 20:16:49 +00:00 |
|
|
7f89e8058a
|
fixed update checker for dlas+tortoise-tts
|
2023-03-07 19:33:56 +00:00 |
|
|
6d7e143f53
|
added override for large training plots
|
2023-03-07 19:29:09 +00:00 |
|
|
3718e9d0fb
|
set NaN alarm to show the iteration it happened it
|
2023-03-07 19:22:11 +00:00 |
|
|
c27ee3ce95
|
added update checking for dlas and tortoise-tts, caching voices (for a given model and voice name) so random latents will remain the same
|
2023-03-07 17:04:45 +00:00 |
|
|
166d491a98
|
fixes
|
2023-03-07 13:40:41 +00:00 |
|
|
df5ba634c0
|
brain dead
|
2023-03-07 05:43:26 +00:00 |
|
|
2726d98ee1
|
fried my brain trying to nail out bugs involving using solely ar model=auto
|
2023-03-07 05:35:21 +00:00 |
|
|
d7a5ad9fd9
|
cleaned up some model loading logic, added 'auto' mode for AR model (deduced by current voice)
|
2023-03-07 04:34:39 +00:00 |
|
|
3899f9b4e3
|
added (yet another) experimental voice latent calculation mode (when chunk size is 0 and theres a dataset generated, itll leverage it by padding to a common size then computing them, should help avoid splitting mid-phoneme)
|
2023-03-07 03:55:35 +00:00 |
|
|
5063728bb0
|
brain worms and headaches
|
2023-03-07 03:01:02 +00:00 |
|
|
0f31c34120
|
download dvae.pth for the people who managed to somehow put the web UI into a state where it never initializes TTS at all somehow
|
2023-03-07 02:47:10 +00:00 |
|
|
0f0b394445
|
moved (actually not working) setting to use BigVGAN to a dropdown to select between vocoders (for when slotting in future ones), and ability to load a new vocoder while TTS is loaded
|
2023-03-07 02:45:22 +00:00 |
|
|
e731b9ba84
|
reworked generating metadata to embed, should now store overrided settings
|
2023-03-06 23:07:16 +00:00 |
|
|
7798767fc6
|
added settings editing (will add a guide on what to do later, and an example)
|
2023-03-06 21:48:34 +00:00 |
|
|
119ac50c58
|
forgot to re-append the existing transcription when skipping existing (have to go back again and do the first 10% of my giant dataset
|
2023-03-06 16:50:55 +00:00 |
|
|
da0af4c498
|
one more
|
2023-03-06 16:47:34 +00:00 |
|
|
11a1f6a00e
|
forgot to reorder the dependency install because whisperx needs to be installed before DLAS
|
2023-03-06 16:43:17 +00:00 |
|
|
12c51b6057
|
Im not too sure if manually invoking gc actually closes all the open files from whisperx (or ROCm), but it seems to have gone away longside setting 'ulimit -Sn' to half the output of 'ulimit -Hn'
|
2023-03-06 16:39:37 +00:00 |
|
|
999878d9c6
|
and it turned out I wasn't even using the aligned segments, kmsing now that I have to *redo* my dataset again
|
2023-03-06 11:01:33 +00:00 |
|
|
14779a5020
|
Added option to skip transcribing if it exists in the output text file, because apparently whisperx will throw a "max files opened" error when using ROCm because it does not close some file descriptors if you're batch-transcribing or something, so poor little me, who's retranscribing his japanese dataset for the 305823042th time woke up to it partially done i am so mad I have to wait another few hours for it to continue when I was hoping to wake up to it done
|
2023-03-06 10:47:06 +00:00 |
|
|
0e3bbc55f8
|
added api_name for generation, added whisperx backend, relocated use whispercpp option to whisper backend list
|
2023-03-06 05:21:33 +00:00 |
|
|
788a957f79
|
stretch loss plot to target iteration just so its not so misleading with the scale
|
2023-03-06 00:44:29 +00:00 |
|
|
5be14abc21
|
UI cleanup, actually fix syncing the epoch counter (i hope), setting auto-suggest voice chunk size whatever to 0 will just split based on the average duration length, signal when a NaN info value is detected (there's some safeties in the training, but it will inevitably fuck the model)
|
2023-03-05 23:55:27 +00:00 |
|
|
287738a338
|
(should) fix reported epoch metric desyncing from defacto metric, fixed finding next milestone from wrong sign because of 2AM brain
|
2023-03-05 20:42:45 +00:00 |
|
|
206a14fdbe
|
brianworms
|
2023-03-05 20:30:27 +00:00 |
|
|
b82961ba8a
|
typo
|
2023-03-05 20:13:39 +00:00 |
|
|
b2e89d8da3
|
oops
|
2023-03-05 19:58:15 +00:00 |
|
|
8094401a6d
|
print in e-notation for LR
|
2023-03-05 19:48:24 +00:00 |
|
|
8b9c9e1bbf
|
remove redundant stats, add showing LR
|
2023-03-05 18:53:12 +00:00 |
|
|
0231550287
|
forgot to remove a debug print
|
2023-03-05 18:27:16 +00:00 |
|
|
d97639e138
|
whispercpp actually works now (language loading was weird, slicing needed to divide time by 100), transcribing audio checks for silence and discards them
|
2023-03-05 17:54:36 +00:00 |
|
|
b8a620e8d7
|
actually accumulate derivatives when estimating milestones and final loss by using half of the log
|
2023-03-05 14:39:24 +00:00 |
|
|
35225a35da
|
oops v2
|
2023-03-05 14:19:41 +00:00 |
|
|
b5e9899bbf
|
5 hour sleep brained
|
2023-03-05 13:37:05 +00:00 |
|
|
cd8702ab0d
|
oops
|
2023-03-05 13:24:07 +00:00 |
|
|
d312019d05
|
reordered things so it uses fresh data and not last-updated data
|
2023-03-05 07:37:27 +00:00 |
|
|
ce3866d0cd
|
added '''estimating''' iterations until milestones (lr=[1, 0.5, 0.1] and final lr, very, very inaccurate because it uses instantaneous delta lr, I'll need to do a riemann sum later
|
2023-03-05 06:45:07 +00:00 |
|
|
1316331be3
|
forgot to try and have it try and auto-detect for openai/whisper when no language is specified
|
2023-03-05 05:22:35 +00:00 |
|
|
3e220ed306
|
added option to set worker size in training config generator (because the default is overkill), for whisper transcriptions, load a specialized language model if it exists (for now, only english), output transcription to web UI when done transcribing
|
2023-03-05 05:17:19 +00:00 |
|