Commit Graph

323 Commits

Author SHA1 Message Date
mrq
02beb1dd8e should fix #203 2023-04-13 03:14:06 +00:00
mrq
8f3e9447ba disable diarize button 2023-04-12 20:03:54 +00:00
mrq
d8b996911c a bunch of shit i had uncommited over the past while pertaining to VALL-E 2023-04-12 20:02:46 +00:00
mrq
0440eac2bc #185 2023-03-31 06:55:52 +00:00
mrq
9f64153a28 fixes #185 2023-03-31 06:03:56 +00:00
mrq
4744120be2 added VALL-E inference support (very rudimentary, gimped, but it will load a model trained on a config generated through the web UI) 2023-03-31 03:26:00 +00:00
mrq
9b01377667 only include auto in the list of models under setting, nothing else 2023-03-29 19:53:23 +00:00
mrq
f66281f10c added mixing models (shamelessly inspired from voldy's web ui) 2023-03-29 19:29:13 +00:00
mrq
c89c648b4a fixes #176 2023-03-26 11:05:50 +00:00
mrq
41d47c7c2a for real this time show those new vall-e metrics 2023-03-26 04:31:50 +00:00
mrq
c4ca04cc92 added showing reported training accuracy and eval/validation metrics to graph 2023-03-26 04:08:45 +00:00
mrq
8c647c889d now there should be feature parity between trainers 2023-03-25 04:12:03 +00:00
mrq
fd9b2e082c x_lim and y_lim for graph 2023-03-25 02:34:14 +00:00
mrq
9856db5900 actually make parsing VALL-E metrics work 2023-03-23 15:42:51 +00:00
mrq
69d84bb9e0 I forget 2023-03-23 04:53:31 +00:00
mrq
444bcdaf62 my sanitizer actually did work, it was just batch sizes leading to problems when transcribing 2023-03-23 04:41:56 +00:00
mrq
a6daf289bc when the sanitizer thingy works in testing but it doesn't outside of testing, and you have to retranscribe for the fourth time today 2023-03-23 02:37:44 +00:00
mrq
86589fff91 why does this keep happening to me 2023-03-23 01:55:16 +00:00
mrq
0ea93a7f40 more cleanup, use 24KHz for preparing for VALL-E (encodec will resample to 24Khz anyways, makes audio a little nicer), some other things 2023-03-23 01:52:26 +00:00
mrq
d2a9ab9e41 remove redundant phonemize for vall-e (oops), quantize all files and then phonemize all files for cope optimization, load alignment model once instead of for every transcription (speedup with whisperx) 2023-03-23 00:22:25 +00:00
mrq
19c0854e6a do not write current whisper.json if there's no changes 2023-03-22 22:24:07 +00:00
mrq
932eaccdf5 added whisper transcription 'sanitizing' (collapse very short transcriptions to the previous segment) (I really have to stop having several copies spanning several machines for AIVC, I keep reverting shit) 2023-03-22 22:10:01 +00:00
mrq
736cdc8926 disable diarization for whisperx as it's just a useless performance hit (I don't have anything that's multispeaker within the same audio file at the moment) 2023-03-22 20:38:58 +00:00
mrq
aa5bdafb06 ugh 2023-03-22 20:26:28 +00:00
mrq
13605f980c now whisperx should output json that aligns with what's expected 2023-03-22 20:01:30 +00:00
mrq
8877960062 fixes for whisperx batching 2023-03-22 19:53:42 +00:00
mrq
4056a27bcb begrudgingly added back whisperx integration (VAD/Diarization testing, I really, really need accurate timestamps before dumping mondo amounts of time on training a dataset) 2023-03-22 19:24:53 +00:00
mrq
b8c3c4cfe2 Fixed #167 2023-03-22 18:21:37 +00:00
mrq
f822c87344 cleanups, realigning vall-e training 2023-03-22 17:47:23 +00:00
mrq
909325bb5a ugh 2023-03-21 22:18:57 +00:00
mrq
5a5fd9ca87 Added option to unsqueeze sample batches after sampling 2023-03-21 21:34:26 +00:00
mrq
9657c1d4ce oops 2023-03-21 20:31:01 +00:00
mrq
0c2a9168f8 DLAS is PIPified (but I'm still cloning it as a submodule to make updating it easier) 2023-03-21 15:46:53 +00:00
mrq
34ef0467b9 VALL-E config edits 2023-03-20 01:22:53 +00:00
mrq
2e33bf071a forgot to not require it to be relative 2023-03-19 22:05:33 +00:00
mrq
5cb86106ce option to set results folder location 2023-03-19 22:03:41 +00:00
mrq
da9b4b5fb5 tweaks 2023-03-18 15:14:22 +00:00
mrq
f44895978d brain worms 2023-03-17 20:08:08 +00:00
mrq
f34cc382c5 yammed 2023-03-17 18:57:36 +00:00
mrq
96b7f9d2cc yammed 2023-03-17 13:08:34 +00:00
mrq
249c6019af cleanup, metrics are grabbed for vall-e trainer 2023-03-17 05:33:49 +00:00
mrq
1b72d0bba0 forgot to separate phonemes by spaces for [redacted] 2023-03-17 02:08:07 +00:00
mrq
d4c50967a6 cleaned up some prepare dataset code 2023-03-17 01:24:02 +00:00
mrq
0b62ccc112 setup bnb on windows as needed 2023-03-16 20:48:48 +00:00
mrq
1a8c5de517 unk hunting 2023-03-16 14:59:12 +00:00
mrq
46ff3c476a fixes v2 2023-03-16 14:41:40 +00:00
mrq
0408d44602 fixed reload tts being broken due to being as untouched as I am 2023-03-16 14:24:44 +00:00
mrq
aeb904a800 yammed 2023-03-16 14:23:47 +00:00
mrq
f9154c4db1 fixes 2023-03-16 14:19:56 +00:00
mrq
54f2fc792a ops 2023-03-16 05:14:15 +00:00
mrq
0a7d6f02a7 ops 2023-03-16 04:54:17 +00:00
mrq
4ac43fa3a3 I forgot I undid the thing in DLAS 2023-03-16 04:51:35 +00:00
mrq
da4f92681e oops 2023-03-16 04:35:12 +00:00
mrq
ee8270bdfb preparations for training an IPA-based finetune 2023-03-16 04:25:33 +00:00
mrq
7b80f7a42f fixed not cleaning up states while training (oops) 2023-03-15 02:48:05 +00:00
mrq
b31bf1206e oops 2023-03-15 01:51:04 +00:00
mrq
d752a22331 print a warning if automatically deduced batch size returns 1 2023-03-15 01:20:15 +00:00
mrq
f6d34e1dd3 and maybe I should have actually tested with ./models/tokenizers/ made 2023-03-15 01:09:20 +00:00
mrq
5e4f6808ce I guess I didn't test on a blank-ish slate 2023-03-15 00:54:27 +00:00
mrq
363d0b09b1 added options to pick tokenizer json and diffusion model (so I don't have to add it in later when I get bored and add in diffusion training) 2023-03-15 00:37:38 +00:00
mrq
07b684c4e7 removed redundant training data (they exist within tortoise itself anyways), added utility: view tokenized text 2023-03-14 21:51:27 +00:00
mrq
469dd47a44 fixes #131 2023-03-14 18:58:03 +00:00
mrq
84b7383428 fixes #134 2023-03-14 18:52:56 +00:00
mrq
4b952ea52a fixes #132 2023-03-14 18:46:20 +00:00
mrq
fe03ae5839 fixes 2023-03-14 17:42:42 +00:00
mrq
9d2c7fb942 cleanup 2023-03-14 16:23:29 +00:00
mrq
65fe304267 fixed broken graph displaying 2023-03-14 16:04:56 +00:00
mrq
7b16b3e88a ;) 2023-03-14 15:48:09 +00:00
mrq
54036fd780 :) 2023-03-14 05:02:14 +00:00
mrq
92a05d3c4c added PYTHONUTF8 to start/train bats 2023-03-14 02:29:11 +00:00
mrq
dadb1fca6b multichannel audio now report correct duration (surprised it took this long for me to source multichannel audio) 2023-03-13 21:24:51 +00:00
mrq
32d968a8cd (disabled by default until i validate it working) added additional transcription text normalization (something else I'm experimenting with requires it) 2023-03-13 19:07:23 +00:00
mrq
66ac8ba766 added mel LR weight (as I finally understand when to adjust the text), added text validation on dataset creation 2023-03-13 18:51:53 +00:00
mrq
ee1b048d07 when creating the train/validatio datasets, use segments if the main audio's duration is too long, and slice to make the segments if they don't exist 2023-03-13 04:26:00 +00:00
mrq
0cf9db5e69 oops 2023-03-13 01:33:45 +00:00
mrq
050bcefd73 resample to 22.5K when creating training inputs (to avoid redundant downsampling when loaded for training, even though most of my inputs are already at 22.5K), generalized resampler function to cache and reuse them, do not unload whisper when done transcribing since it gets unloaded anyways for any other non-transcription task 2023-03-13 01:20:55 +00:00
mrq
7c9c0dc584 forgot to clean up debug prints 2023-03-13 00:44:37 +00:00
mrq
239c984850 move validating audio to creating the text files instead, consider audio longer than 11 seconds invalid, consider text lengths over 200 invalid 2023-03-12 23:39:00 +00:00
mrq
51ddc205cd update submodules 2023-03-12 18:14:36 +00:00
mrq
ccbf2e6aff blame mrq/ai-voice-cloning#122 2023-03-12 17:51:52 +00:00
mrq
9238df0b03 fixed last generation settings not actually load because brain worms 2023-03-12 15:49:50 +00:00
mrq
9594a960b0 Disable loss ETA for now until I fix it 2023-03-12 15:39:54 +00:00
mrq
be8b290a1a Merge branch 'master' into save_more_user_config 2023-03-12 15:38:08 +00:00
mrq
296129ba9c output fixes, I'm not sure why ETA wasn't working but it works in testing 2023-03-12 15:17:07 +00:00
mrq
098d7ad635 uh I don't remember, small things 2023-03-12 14:47:48 +00:00
233baa4e45 updated several default configurations to not cause null/empty errors. also default samples/iterations to 16-30 ultra fast which is typically suggested. 2023-03-12 16:08:02 +02:00
29b3d1ae1d Fixed Keep X Previous States 2023-03-12 08:01:08 +02:00
9e320a34c8 Fixed Keep X Previous States 2023-03-12 08:00:03 +02:00
61500107ab Catch OOM and run whisper on cpu automatically. 2023-03-12 06:48:28 +02:00
mrq
ede9804b76 added option to trim silence using torchaudio's VAD 2023-03-11 21:41:35 +00:00
mrq
dea2fa9caf added fields to offset start/end slices to apply in bulk when slicing 2023-03-11 21:34:29 +00:00
mrq
89bb3d4419 rename transcribe button since it does more than transcribe 2023-03-11 21:18:04 +00:00
mrq
382a3e4104 rely on the whisper.json for handling a lot more things 2023-03-11 21:17:11 +00:00
mrq
9b376c381f brain worm 2023-03-11 18:14:32 +00:00
mrq
94551fb9ac split slicing dataset routine so it can be done after the fact 2023-03-11 17:27:01 +00:00
mrq
e3fdb79b49 rocm5.2 works for me desu so I bumped it back up 2023-03-11 17:02:56 +00:00
mrq
cf41492f76 fall back to normal behavior if theres actually no audiofiles loaded from the dataset when using it for computing latents 2023-03-11 16:46:03 +00:00
mrq
b90c164778 Farewell, parasite 2023-03-11 16:40:34 +00:00
mrq
2424c455cb added option to not slice audio when transcribing, added option to prepare validation dataset on audio duration, added a warning if youre using whisperx and you're slicing audio 2023-03-11 16:32:35 +00:00
tigi6346
dcdcf8516c master (#112)
Fixes Gradio bugging out when attempting to load a missing train.json.

Reviewed-on: mrq/ai-voice-cloning#112
Co-authored-by: tigi6346 <tigi6346@noreply.localhost>
Co-committed-by: tigi6346 <tigi6346@noreply.localhost>
2023-03-11 03:28:04 +00:00