|
7b80f7a42f
|
fixed not cleaning up states while training (oops)
|
2023-03-15 02:48:05 +00:00 |
|
|
b31bf1206e
|
oops
|
2023-03-15 01:51:04 +00:00 |
|
|
d752a22331
|
print a warning if automatically deduced batch size returns 1
|
2023-03-15 01:20:15 +00:00 |
|
|
f6d34e1dd3
|
and maybe I should have actually tested with ./models/tokenizers/ made
|
2023-03-15 01:09:20 +00:00 |
|
|
5e4f6808ce
|
I guess I didn't test on a blank-ish slate
|
2023-03-15 00:54:27 +00:00 |
|
|
363d0b09b1
|
added options to pick tokenizer json and diffusion model (so I don't have to add it in later when I get bored and add in diffusion training)
|
2023-03-15 00:37:38 +00:00 |
|
|
07b684c4e7
|
removed redundant training data (they exist within tortoise itself anyways), added utility: view tokenized text
|
2023-03-14 21:51:27 +00:00 |
|
|
4b952ea52a
|
fixes #132
|
2023-03-14 18:46:20 +00:00 |
|
|
fe03ae5839
|
fixes
|
2023-03-14 17:42:42 +00:00 |
|
|
9d2c7fb942
|
cleanup
|
2023-03-14 16:23:29 +00:00 |
|
|
65fe304267
|
fixed broken graph displaying
|
2023-03-14 16:04:56 +00:00 |
|
|
7b16b3e88a
|
;)
|
2023-03-14 15:48:09 +00:00 |
|
|
54036fd780
|
:)
|
2023-03-14 05:02:14 +00:00 |
|
|
92a05d3c4c
|
added PYTHONUTF8 to start/train bats
|
2023-03-14 02:29:11 +00:00 |
|
|
dadb1fca6b
|
multichannel audio now report correct duration (surprised it took this long for me to source multichannel audio)
|
2023-03-13 21:24:51 +00:00 |
|
|
32d968a8cd
|
(disabled by default until i validate it working) added additional transcription text normalization (something else I'm experimenting with requires it)
|
2023-03-13 19:07:23 +00:00 |
|
|
66ac8ba766
|
added mel LR weight (as I finally understand when to adjust the text), added text validation on dataset creation
|
2023-03-13 18:51:53 +00:00 |
|
|
ee1b048d07
|
when creating the train/validatio datasets, use segments if the main audio's duration is too long, and slice to make the segments if they don't exist
|
2023-03-13 04:26:00 +00:00 |
|
|
0cf9db5e69
|
oops
|
2023-03-13 01:33:45 +00:00 |
|
|
050bcefd73
|
resample to 22.5K when creating training inputs (to avoid redundant downsampling when loaded for training, even though most of my inputs are already at 22.5K), generalized resampler function to cache and reuse them, do not unload whisper when done transcribing since it gets unloaded anyways for any other non-transcription task
|
2023-03-13 01:20:55 +00:00 |
|
|
7c9c0dc584
|
forgot to clean up debug prints
|
2023-03-13 00:44:37 +00:00 |
|
|
239c984850
|
move validating audio to creating the text files instead, consider audio longer than 11 seconds invalid, consider text lengths over 200 invalid
|
2023-03-12 23:39:00 +00:00 |
|
|
51ddc205cd
|
update submodules
|
2023-03-12 18:14:36 +00:00 |
|
|
ccbf2e6aff
|
blame mrq/ai-voice-cloning#122
|
2023-03-12 17:51:52 +00:00 |
|
|
9238df0b03
|
fixed last generation settings not actually load because brain worms
|
2023-03-12 15:49:50 +00:00 |
|
|
9594a960b0
|
Disable loss ETA for now until I fix it
|
2023-03-12 15:39:54 +00:00 |
|
|
296129ba9c
|
output fixes, I'm not sure why ETA wasn't working but it works in testing
|
2023-03-12 15:17:07 +00:00 |
|
|
098d7ad635
|
uh I don't remember, small things
|
2023-03-12 14:47:48 +00:00 |
|
|
29b3d1ae1d
|
Fixed Keep X Previous States
|
2023-03-12 08:01:08 +02:00 |
|
|
61500107ab
|
Catch OOM and run whisper on cpu automatically.
|
2023-03-12 06:48:28 +02:00 |
|
|
ede9804b76
|
added option to trim silence using torchaudio's VAD
|
2023-03-11 21:41:35 +00:00 |
|
|
dea2fa9caf
|
added fields to offset start/end slices to apply in bulk when slicing
|
2023-03-11 21:34:29 +00:00 |
|
|
382a3e4104
|
rely on the whisper.json for handling a lot more things
|
2023-03-11 21:17:11 +00:00 |
|
|
9b376c381f
|
brain worm
|
2023-03-11 18:14:32 +00:00 |
|
|
94551fb9ac
|
split slicing dataset routine so it can be done after the fact
|
2023-03-11 17:27:01 +00:00 |
|
|
e3fdb79b49
|
rocm5.2 works for me desu so I bumped it back up
|
2023-03-11 17:02:56 +00:00 |
|
|
cf41492f76
|
fall back to normal behavior if theres actually no audiofiles loaded from the dataset when using it for computing latents
|
2023-03-11 16:46:03 +00:00 |
|
|
b90c164778
|
Farewell, parasite
|
2023-03-11 16:40:34 +00:00 |
|
|
2424c455cb
|
added option to not slice audio when transcribing, added option to prepare validation dataset on audio duration, added a warning if youre using whisperx and you're slicing audio
|
2023-03-11 16:32:35 +00:00 |
|
|
008a1f5f8f
|
simplified spawning the training process by having it spawn the distributed training processes in the train.py script, so it should work on Windows too
|
2023-03-11 01:37:00 +00:00 |
|
|
2feb6da0c0
|
cleanups and fixes, fix DLAS throwing errors from '''too short of sound files''' by just culling them during transcription
|
2023-03-11 01:19:49 +00:00 |
|
|
7f2da0f5fb
|
rewrote how AIVC gets training metrics (need to clean up later)
|
2023-03-10 22:35:32 +00:00 |
|
|
df0edacc60
|
fix the cleanup actually only doing 2 despite requesting more than 2, surprised no one has pointed it out
|
2023-03-10 14:04:07 +00:00 |
|
|
8e890d3023
|
forgot to fix reset settings to use the new arg-agnostic way
|
2023-03-10 13:49:39 +00:00 |
|
|
c92b006129
|
I really hate YAML
|
2023-03-10 03:48:46 +00:00 |
|
|
eb1551ee92
|
what I thought was an override and not a ternary
|
2023-03-09 23:04:02 +00:00 |
|
|
c3b43d2429
|
today I learned adamw_zero actually negates ANY LR schemes
|
2023-03-09 19:42:31 +00:00 |
|
|
cb273b8428
|
cleanup
|
2023-03-09 18:34:52 +00:00 |
|
|
7c71f7239c
|
expose options for CosineAnnealingLR_Restart (seems to be able to train very quickly due to the restarts
|
2023-03-09 14:17:01 +00:00 |
|
|
5460e191b0
|
added loss graph, because I'm going to experiment with cosine annealing LR and I need to view my loss
|
2023-03-09 05:54:08 +00:00 |
|
|
a182df8f4e
|
is
|
2023-03-09 04:33:12 +00:00 |
|
|
a01eb10960
|
(try to) unload voicefixer if it raises an error during loading voicefixer
|
2023-03-09 04:28:14 +00:00 |
|
|
dc1902b91c
|
cleanup block that makes embedding latents for random/microphone happen, remove builtin voice options from voice list to avoid duplicates
|
2023-03-09 04:23:36 +00:00 |
|
|
797882336b
|
maybe remedy an issue that crops up if you have a non-wav and non-json file in a results folder (assuming)
|
2023-03-09 04:06:07 +00:00 |
|
|
3b4f4500d1
|
when you have three separate machines running and you test one one, but you accidentally revert changes because you then test on another
|
2023-03-09 03:26:18 +00:00 |
|
|
ef75dba995
|
I hate commas make tuples
|
2023-03-09 02:43:05 +00:00 |
|
|
f795dd5c20
|
you might be wondering why so many small commits instead of rolling the HEAD back one to just combine them, i don't want to force push and roll back the paperspace i'm testing in
|
2023-03-09 02:31:32 +00:00 |
|
|
51339671ec
|
typo
|
2023-03-09 02:29:08 +00:00 |
|
|
1b18b3e335
|
forgot to save the simplified training input json first before touching any of the settings that dump to the yaml
|
2023-03-09 02:27:20 +00:00 |
|
|
0e80e311b0
|
added VRAM validation for a given batch:gradient accumulation size ratio (based emprically off of 6GiB, 16GiB, and 16x2GiB, would be nice to have more data on what's safe)
|
2023-03-09 02:08:06 +00:00 |
|
|
ef7b957fff
|
oops
|
2023-03-09 00:53:00 +00:00 |
|
|
b0baa1909a
|
forgot template
|
2023-03-09 00:32:35 +00:00 |
|
|
3f321fe664
|
big cleanup to make my life easier when i add more parameters
|
2023-03-09 00:26:47 +00:00 |
|
|
0ab091e7ff
|
oops
|
2023-03-08 16:09:29 +00:00 |
|
|
34dcb845b5
|
actually make using adamw_zero optimizer for multi-gpus work
|
2023-03-08 15:31:33 +00:00 |
|
|
ff07f707cb
|
disable validation if validation dataset not found, clamp validation batch size to validation dataset size instead of simply reusing batch size, switch to adamw_zero optimizier when training with multi-gpus (because the yaml comment said to and I think it might be why I'm absolutely having garbage luck training this japanese dataset)
|
2023-03-08 04:47:05 +00:00 |
|
|
f1788a5639
|
lazy wrap around the voicefixer block because sometimes it just an heros itself despite having a specific block to load it beforehand
|
2023-03-08 04:12:22 +00:00 |
|
|
83b5125854
|
fixed notebooks, provided paperspace notebook
|
2023-03-08 03:29:12 +00:00 |
|
|
b4098dca73
|
made validation working (will document later)
|
2023-03-08 02:58:00 +00:00 |
|
|
a7e0dc9127
|
oops
|
2023-03-08 00:51:51 +00:00 |
|
|
e862169e7f
|
set validation to save rate and validation file if exists (need to test later)
|
2023-03-07 20:38:31 +00:00 |
|
|
fe8bf7a9d1
|
added helper script to cull short enough lines from training set as a validation set (if it yields good results doing validation during training, i'll add it to the web ui)
|
2023-03-07 20:16:49 +00:00 |
|
|
7f89e8058a
|
fixed update checker for dlas+tortoise-tts
|
2023-03-07 19:33:56 +00:00 |
|
|
6d7e143f53
|
added override for large training plots
|
2023-03-07 19:29:09 +00:00 |
|
|
3718e9d0fb
|
set NaN alarm to show the iteration it happened it
|
2023-03-07 19:22:11 +00:00 |
|
|
c27ee3ce95
|
added update checking for dlas and tortoise-tts, caching voices (for a given model and voice name) so random latents will remain the same
|
2023-03-07 17:04:45 +00:00 |
|
|
166d491a98
|
fixes
|
2023-03-07 13:40:41 +00:00 |
|
|
df5ba634c0
|
brain dead
|
2023-03-07 05:43:26 +00:00 |
|
|
2726d98ee1
|
fried my brain trying to nail out bugs involving using solely ar model=auto
|
2023-03-07 05:35:21 +00:00 |
|
|
d7a5ad9fd9
|
cleaned up some model loading logic, added 'auto' mode for AR model (deduced by current voice)
|
2023-03-07 04:34:39 +00:00 |
|
|
3899f9b4e3
|
added (yet another) experimental voice latent calculation mode (when chunk size is 0 and theres a dataset generated, itll leverage it by padding to a common size then computing them, should help avoid splitting mid-phoneme)
|
2023-03-07 03:55:35 +00:00 |
|
|
5063728bb0
|
brain worms and headaches
|
2023-03-07 03:01:02 +00:00 |
|
|
0f31c34120
|
download dvae.pth for the people who managed to somehow put the web UI into a state where it never initializes TTS at all somehow
|
2023-03-07 02:47:10 +00:00 |
|
|
0f0b394445
|
moved (actually not working) setting to use BigVGAN to a dropdown to select between vocoders (for when slotting in future ones), and ability to load a new vocoder while TTS is loaded
|
2023-03-07 02:45:22 +00:00 |
|
|
e731b9ba84
|
reworked generating metadata to embed, should now store overrided settings
|
2023-03-06 23:07:16 +00:00 |
|
|
7798767fc6
|
added settings editing (will add a guide on what to do later, and an example)
|
2023-03-06 21:48:34 +00:00 |
|
|
119ac50c58
|
forgot to re-append the existing transcription when skipping existing (have to go back again and do the first 10% of my giant dataset
|
2023-03-06 16:50:55 +00:00 |
|
|
12c51b6057
|
Im not too sure if manually invoking gc actually closes all the open files from whisperx (or ROCm), but it seems to have gone away longside setting 'ulimit -Sn' to half the output of 'ulimit -Hn'
|
2023-03-06 16:39:37 +00:00 |
|
|
999878d9c6
|
and it turned out I wasn't even using the aligned segments, kmsing now that I have to *redo* my dataset again
|
2023-03-06 11:01:33 +00:00 |
|
|
14779a5020
|
Added option to skip transcribing if it exists in the output text file, because apparently whisperx will throw a "max files opened" error when using ROCm because it does not close some file descriptors if you're batch-transcribing or something, so poor little me, who's retranscribing his japanese dataset for the 305823042th time woke up to it partially done i am so mad I have to wait another few hours for it to continue when I was hoping to wake up to it done
|
2023-03-06 10:47:06 +00:00 |
|
|
0e3bbc55f8
|
added api_name for generation, added whisperx backend, relocated use whispercpp option to whisper backend list
|
2023-03-06 05:21:33 +00:00 |
|
|
788a957f79
|
stretch loss plot to target iteration just so its not so misleading with the scale
|
2023-03-06 00:44:29 +00:00 |
|
|
5be14abc21
|
UI cleanup, actually fix syncing the epoch counter (i hope), setting auto-suggest voice chunk size whatever to 0 will just split based on the average duration length, signal when a NaN info value is detected (there's some safeties in the training, but it will inevitably fuck the model)
|
2023-03-05 23:55:27 +00:00 |
|
|
287738a338
|
(should) fix reported epoch metric desyncing from defacto metric, fixed finding next milestone from wrong sign because of 2AM brain
|
2023-03-05 20:42:45 +00:00 |
|
|
206a14fdbe
|
brianworms
|
2023-03-05 20:30:27 +00:00 |
|
|
b82961ba8a
|
typo
|
2023-03-05 20:13:39 +00:00 |
|
|
b2e89d8da3
|
oops
|
2023-03-05 19:58:15 +00:00 |
|
|
8094401a6d
|
print in e-notation for LR
|
2023-03-05 19:48:24 +00:00 |
|
|
8b9c9e1bbf
|
remove redundant stats, add showing LR
|
2023-03-05 18:53:12 +00:00 |
|
|
0231550287
|
forgot to remove a debug print
|
2023-03-05 18:27:16 +00:00 |
|