Commit Graph

38 Commits (master)

Author SHA1 Message Date
mrq a657623cbc updated vall-e training template to use path-based speakers because it would just have a batch/epoch size of 1 otherwise; revert hardcoded 'spit processed dataset to this path' from my training rig to spit it out in a sane spot 2023-08-24 21:45:50 +07:00
mrq 0a5483e57a updated valle yaml template 2023-08-23 21:42:32 +07:00
mrq d2a9ab9e41 remove redundant phonemize for vall-e (oops), quantize all files and then phonemize all files for cope optimization, load alignment model once instead of for every transcription (speedup with whisperx) 2023-03-23 00:22:25 +07:00
mrq da96161aaa oops 2023-03-22 18:07:46 +07:00
mrq f822c87344 cleanups, realigning vall-e training 2023-03-22 17:47:23 +07:00
mrq 34ef0467b9 VALL-E config edits 2023-03-20 01:22:53 +07:00
mrq b17260cddf added japanese tokenizer (experimental) 2023-03-17 20:04:40 +07:00
mrq 249c6019af cleanup, metrics are grabbed for vall-e trainer 2023-03-17 05:33:49 +07:00
mrq 1b72d0bba0 forgot to separate phonemes by spaces for [redacted] 2023-03-17 02:08:07 +07:00
mrq d4c50967a6 cleaned up some prepare dataset code 2023-03-17 01:24:02 +07:00
mrq 1a8c5de517 unk hunting 2023-03-16 14:59:12 +07:00
mrq da4f92681e oops 2023-03-16 04:35:12 +07:00
mrq ee8270bdfb preparations for training an IPA-based finetune 2023-03-16 04:25:33 +07:00
mrq 363d0b09b1 added options to pick tokenizer json and diffusion model (so I don't have to add it in later when I get bored and add in diffusion training) 2023-03-15 00:37:38 +07:00
mrq 07b684c4e7 removed redundant training data (they exist within tortoise itself anyways), added utility: view tokenized text 2023-03-14 21:51:27 +07:00
mrq 7b16b3e88a ;) 2023-03-14 15:48:09 +07:00
mrq c85e32ff53 (: 2023-03-14 14:08:35 +07:00
mrq 54036fd780 :) 2023-03-14 05:02:14 +07:00
mrq 66ac8ba766 added mel LR weight (as I finally understand when to adjust the text), added text validation on dataset creation 2023-03-13 18:51:53 +07:00
mrq 2feb6da0c0 cleanups and fixes, fix DLAS throwing errors from '''too short of sound files''' by just culling them during transcription 2023-03-11 01:19:49 +07:00
mrq d3184004fd only God knows why the YAML spec lets you specify string values without quotes 2023-03-10 01:58:30 +07:00
mrq b8867a5fb0 added the mysterious tortoise_compat flag mentioned in DLAS repo 2023-03-09 03:41:40 +07:00
mrq b0baa1909a forgot template 2023-03-09 00:32:35 +07:00
mrq 3f321fe664 big cleanup to make my life easier when i add more parameters 2023-03-09 00:26:47 +07:00
mrq 34dcb845b5 actually make using adamw_zero optimizer for multi-gpus work 2023-03-08 15:31:33 +07:00
mrq ff07f707cb disable validation if validation dataset not found, clamp validation batch size to validation dataset size instead of simply reusing batch size, switch to adamw_zero optimizier when training with multi-gpus (because the yaml comment said to and I think it might be why I'm absolutely having garbage luck training this japanese dataset) 2023-03-08 04:47:05 +07:00
mrq b4098dca73 made validation working (will document later) 2023-03-08 02:58:00 +07:00
mrq e862169e7f set validation to save rate and validation file if exists (need to test later) 2023-03-07 20:38:31 +07:00
mrq 3e220ed306 added option to set worker size in training config generator (because the default is overkill), for whisper transcriptions, load a specialized language model if it exists (for now, only english), output transcription to web UI when done transcribing 2023-03-05 05:17:19 +07:00
mrq df24827b9a renamed mega batch factor to an actual real term: gradient accumulation factor, fixed halting training not actually killing the training process and freeing up resources, some logic cleanup for gradient accumulation (so many brain worms and wrong assumptions from testing on low batch sizes) (read the training section in the wiki for more details) 2023-03-04 15:55:06 +07:00
mrq c2726fa0d4 added new training tunable: loss_text_ce_loss weight, added option to specify source model in case you want to finetune a finetuned model (for example, train a Japanese finetune on a large dataset, then finetune for a specific voice, need to truly validate if it produces usable output), some bug fixes that came up for some reason now and not earlier 2023-03-01 01:17:38 +07:00
mrq 225dee22d4 huge success 2023-02-23 06:24:54 +07:00
mrq 8a1a48f31e Added very experimental float16 training for cards with not enough VRAM (10GiB and below, maybe) \!NOTE\! this is VERY EXPERIMETNAL, I have zero free time to validate it right now, I'll do it later 2023-02-21 19:31:57 +07:00
mrq 092dd7b2d7 added more safeties and parameters to training yaml generator, I think I tested it extensively enough 2023-02-19 16:16:44 +07:00
mrq cf758f4732 oops 2023-02-18 15:50:51 +07:00
mrq 2615cafd75 added dropdown to select autoregressive model for TTS, fixed a bug where the settings saveer constantly fires I hate gradio so much why are dropdown.change broken to contiuously fire and send an empty array 2023-02-18 14:10:26 +07:00
mrq d5c1433268 a bit of UI cleanup, import multiple audio files at once, actually shows progress when importing voices, hides audio metadata / latents if no generated settings are detected, preparing datasets shows its progress, saving a training YAML shows a message when done, training now works within the web UI, training output shows to web UI, provided notebook is cleaned up and uses a venv, etc. 2023-02-18 02:07:22 +07:00
mrq 229be0bdb8 almost 2023-02-17 15:53:50 +07:00