Commit Graph

283 Commits (master)

Author SHA1 Message Date
mrq 444bcdaf62 my sanitizer actually did work, it was just batch sizes leading to problems when transcribing 2023-03-23 04:41:56 +07:00
mrq a6daf289bc when the sanitizer thingy works in testing but it doesn't outside of testing, and you have to retranscribe for the fourth time today 2023-03-23 02:37:44 +07:00
mrq 86589fff91 why does this keep happening to me 2023-03-23 01:55:16 +07:00
mrq 0ea93a7f40 more cleanup, use 24KHz for preparing for VALL-E (encodec will resample to 24Khz anyways, makes audio a little nicer), some other things 2023-03-23 01:52:26 +07:00
mrq d2a9ab9e41 remove redundant phonemize for vall-e (oops), quantize all files and then phonemize all files for cope optimization, load alignment model once instead of for every transcription (speedup with whisperx) 2023-03-23 00:22:25 +07:00
mrq 19c0854e6a do not write current whisper.json if there's no changes 2023-03-22 22:24:07 +07:00
mrq 932eaccdf5 added whisper transcription 'sanitizing' (collapse very short transcriptions to the previous segment) (I really have to stop having several copies spanning several machines for AIVC, I keep reverting shit) 2023-03-22 22:10:01 +07:00
mrq 736cdc8926 disable diarization for whisperx as it's just a useless performance hit (I don't have anything that's multispeaker within the same audio file at the moment) 2023-03-22 20:38:58 +07:00
mrq aa5bdafb06 ugh 2023-03-22 20:26:28 +07:00
mrq 13605f980c now whisperx should output json that aligns with what's expected 2023-03-22 20:01:30 +07:00
mrq 8877960062 fixes for whisperx batching 2023-03-22 19:53:42 +07:00
mrq 4056a27bcb begrudgingly added back whisperx integration (VAD/Diarization testing, I really, really need accurate timestamps before dumping mondo amounts of time on training a dataset) 2023-03-22 19:24:53 +07:00
mrq b8c3c4cfe2 Fixed #167 2023-03-22 18:21:37 +07:00
mrq f822c87344 cleanups, realigning vall-e training 2023-03-22 17:47:23 +07:00
mrq 909325bb5a ugh 2023-03-21 22:18:57 +07:00
mrq 5a5fd9ca87 Added option to unsqueeze sample batches after sampling 2023-03-21 21:34:26 +07:00
mrq 2e33bf071a forgot to not require it to be relative 2023-03-19 22:05:33 +07:00
mrq 5cb86106ce option to set results folder location 2023-03-19 22:03:41 +07:00
mrq da9b4b5fb5 tweaks 2023-03-18 15:14:22 +07:00
mrq f44895978d brain worms 2023-03-17 20:08:08 +07:00
mrq f34cc382c5 yammed 2023-03-17 18:57:36 +07:00
mrq 96b7f9d2cc yammed 2023-03-17 13:08:34 +07:00
mrq 249c6019af cleanup, metrics are grabbed for vall-e trainer 2023-03-17 05:33:49 +07:00
mrq 1b72d0bba0 forgot to separate phonemes by spaces for [redacted] 2023-03-17 02:08:07 +07:00
mrq d4c50967a6 cleaned up some prepare dataset code 2023-03-17 01:24:02 +07:00
mrq 0b62ccc112 setup bnb on windows as needed 2023-03-16 20:48:48 +07:00
mrq 1a8c5de517 unk hunting 2023-03-16 14:59:12 +07:00
mrq 46ff3c476a fixes v2 2023-03-16 14:41:40 +07:00
mrq 0408d44602 fixed reload tts being broken due to being as untouched as I am 2023-03-16 14:24:44 +07:00
mrq aeb904a800 yammed 2023-03-16 14:23:47 +07:00
mrq f9154c4db1 fixes 2023-03-16 14:19:56 +07:00
mrq 54f2fc792a ops 2023-03-16 05:14:15 +07:00
mrq 0a7d6f02a7 ops 2023-03-16 04:54:17 +07:00
mrq 4ac43fa3a3 I forgot I undid the thing in DLAS 2023-03-16 04:51:35 +07:00
mrq da4f92681e oops 2023-03-16 04:35:12 +07:00
mrq ee8270bdfb preparations for training an IPA-based finetune 2023-03-16 04:25:33 +07:00
mrq 7b80f7a42f fixed not cleaning up states while training (oops) 2023-03-15 02:48:05 +07:00
mrq b31bf1206e oops 2023-03-15 01:51:04 +07:00
mrq d752a22331 print a warning if automatically deduced batch size returns 1 2023-03-15 01:20:15 +07:00
mrq f6d34e1dd3 and maybe I should have actually tested with ./models/tokenizers/ made 2023-03-15 01:09:20 +07:00
mrq 5e4f6808ce I guess I didn't test on a blank-ish slate 2023-03-15 00:54:27 +07:00
mrq 363d0b09b1 added options to pick tokenizer json and diffusion model (so I don't have to add it in later when I get bored and add in diffusion training) 2023-03-15 00:37:38 +07:00
mrq 07b684c4e7 removed redundant training data (they exist within tortoise itself anyways), added utility: view tokenized text 2023-03-14 21:51:27 +07:00
mrq 4b952ea52a fixes #132 2023-03-14 18:46:20 +07:00
mrq fe03ae5839 fixes 2023-03-14 17:42:42 +07:00
mrq 9d2c7fb942 cleanup 2023-03-14 16:23:29 +07:00
mrq 65fe304267 fixed broken graph displaying 2023-03-14 16:04:56 +07:00
mrq 7b16b3e88a ;) 2023-03-14 15:48:09 +07:00
mrq 54036fd780 :) 2023-03-14 05:02:14 +07:00
mrq 92a05d3c4c added PYTHONUTF8 to start/train bats 2023-03-14 02:29:11 +07:00
mrq dadb1fca6b multichannel audio now report correct duration (surprised it took this long for me to source multichannel audio) 2023-03-13 21:24:51 +07:00
mrq 32d968a8cd (disabled by default until i validate it working) added additional transcription text normalization (something else I'm experimenting with requires it) 2023-03-13 19:07:23 +07:00
mrq 66ac8ba766 added mel LR weight (as I finally understand when to adjust the text), added text validation on dataset creation 2023-03-13 18:51:53 +07:00
mrq ee1b048d07 when creating the train/validatio datasets, use segments if the main audio's duration is too long, and slice to make the segments if they don't exist 2023-03-13 04:26:00 +07:00
mrq 0cf9db5e69 oops 2023-03-13 01:33:45 +07:00
mrq 050bcefd73 resample to 22.5K when creating training inputs (to avoid redundant downsampling when loaded for training, even though most of my inputs are already at 22.5K), generalized resampler function to cache and reuse them, do not unload whisper when done transcribing since it gets unloaded anyways for any other non-transcription task 2023-03-13 01:20:55 +07:00
mrq 7c9c0dc584 forgot to clean up debug prints 2023-03-13 00:44:37 +07:00
mrq 239c984850 move validating audio to creating the text files instead, consider audio longer than 11 seconds invalid, consider text lengths over 200 invalid 2023-03-12 23:39:00 +07:00
mrq 51ddc205cd update submodules 2023-03-12 18:14:36 +07:00
mrq ccbf2e6aff blame #122 2023-03-12 17:51:52 +07:00
mrq 9238df0b03 fixed last generation settings not actually load because brain worms 2023-03-12 15:49:50 +07:00
mrq 9594a960b0 Disable loss ETA for now until I fix it 2023-03-12 15:39:54 +07:00
mrq 296129ba9c output fixes, I'm not sure why ETA wasn't working but it works in testing 2023-03-12 15:17:07 +07:00
mrq 098d7ad635 uh I don't remember, small things 2023-03-12 14:47:48 +07:00
tigi6346 29b3d1ae1d Fixed Keep X Previous States 2023-03-12 08:01:08 +07:00
tigi6346 61500107ab Catch OOM and run whisper on cpu automatically. 2023-03-12 06:48:28 +07:00
mrq ede9804b76 added option to trim silence using torchaudio's VAD 2023-03-11 21:41:35 +07:00
mrq dea2fa9caf added fields to offset start/end slices to apply in bulk when slicing 2023-03-11 21:34:29 +07:00
mrq 382a3e4104 rely on the whisper.json for handling a lot more things 2023-03-11 21:17:11 +07:00
mrq 9b376c381f brain worm 2023-03-11 18:14:32 +07:00
mrq 94551fb9ac split slicing dataset routine so it can be done after the fact 2023-03-11 17:27:01 +07:00
mrq e3fdb79b49 rocm5.2 works for me desu so I bumped it back up 2023-03-11 17:02:56 +07:00
mrq cf41492f76 fall back to normal behavior if theres actually no audiofiles loaded from the dataset when using it for computing latents 2023-03-11 16:46:03 +07:00
mrq b90c164778 Farewell, parasite 2023-03-11 16:40:34 +07:00
mrq 2424c455cb added option to not slice audio when transcribing, added option to prepare validation dataset on audio duration, added a warning if youre using whisperx and you're slicing audio 2023-03-11 16:32:35 +07:00
mrq 008a1f5f8f simplified spawning the training process by having it spawn the distributed training processes in the train.py script, so it should work on Windows too 2023-03-11 01:37:00 +07:00
mrq 2feb6da0c0 cleanups and fixes, fix DLAS throwing errors from '''too short of sound files''' by just culling them during transcription 2023-03-11 01:19:49 +07:00
mrq 7f2da0f5fb rewrote how AIVC gets training metrics (need to clean up later) 2023-03-10 22:35:32 +07:00
mrq df0edacc60 fix the cleanup actually only doing 2 despite requesting more than 2, surprised no one has pointed it out 2023-03-10 14:04:07 +07:00
mrq 8e890d3023 forgot to fix reset settings to use the new arg-agnostic way 2023-03-10 13:49:39 +07:00
mrq c92b006129 I really hate YAML 2023-03-10 03:48:46 +07:00
mrq eb1551ee92 what I thought was an override and not a ternary 2023-03-09 23:04:02 +07:00
mrq c3b43d2429 today I learned adamw_zero actually negates ANY LR schemes 2023-03-09 19:42:31 +07:00
mrq cb273b8428 cleanup 2023-03-09 18:34:52 +07:00
mrq 7c71f7239c expose options for CosineAnnealingLR_Restart (seems to be able to train very quickly due to the restarts 2023-03-09 14:17:01 +07:00
mrq 5460e191b0 added loss graph, because I'm going to experiment with cosine annealing LR and I need to view my loss 2023-03-09 05:54:08 +07:00
mrq a182df8f4e is 2023-03-09 04:33:12 +07:00
mrq a01eb10960 (try to) unload voicefixer if it raises an error during loading voicefixer 2023-03-09 04:28:14 +07:00
mrq dc1902b91c cleanup block that makes embedding latents for random/microphone happen, remove builtin voice options from voice list to avoid duplicates 2023-03-09 04:23:36 +07:00
mrq 797882336b maybe remedy an issue that crops up if you have a non-wav and non-json file in a results folder (assuming) 2023-03-09 04:06:07 +07:00
mrq 3b4f4500d1 when you have three separate machines running and you test one one, but you accidentally revert changes because you then test on another 2023-03-09 03:26:18 +07:00
mrq ef75dba995 I hate commas make tuples 2023-03-09 02:43:05 +07:00
mrq f795dd5c20 you might be wondering why so many small commits instead of rolling the HEAD back one to just combine them, i don't want to force push and roll back the paperspace i'm testing in 2023-03-09 02:31:32 +07:00
mrq 51339671ec typo 2023-03-09 02:29:08 +07:00
mrq 1b18b3e335 forgot to save the simplified training input json first before touching any of the settings that dump to the yaml 2023-03-09 02:27:20 +07:00
mrq 0e80e311b0 added VRAM validation for a given batch:gradient accumulation size ratio (based emprically off of 6GiB, 16GiB, and 16x2GiB, would be nice to have more data on what's safe) 2023-03-09 02:08:06 +07:00
mrq ef7b957fff oops 2023-03-09 00:53:00 +07:00
mrq b0baa1909a forgot template 2023-03-09 00:32:35 +07:00
mrq 3f321fe664 big cleanup to make my life easier when i add more parameters 2023-03-09 00:26:47 +07:00
mrq 0ab091e7ff oops 2023-03-08 16:09:29 +07:00