1
0
Commit Graph

38 Commits

Author SHA1 Message Date
mrq
a657623cbc updated vall-e training template to use path-based speakers because it would just have a batch/epoch size of 1 otherwise; revert hardcoded 'spit processed dataset to this path' from my training rig to spit it out in a sane spot 2023-08-24 21:45:50 +00:00
mrq
0a5483e57a updated valle yaml template 2023-08-23 21:42:32 +00:00
mrq
d2a9ab9e41 remove redundant phonemize for vall-e (oops), quantize all files and then phonemize all files for cope optimization, load alignment model once instead of for every transcription (speedup with whisperx) 2023-03-23 00:22:25 +00:00
mrq
da96161aaa oops 2023-03-22 18:07:46 +00:00
mrq
f822c87344 cleanups, realigning vall-e training 2023-03-22 17:47:23 +00:00
mrq
34ef0467b9 VALL-E config edits 2023-03-20 01:22:53 +00:00
mrq
b17260cddf added japanese tokenizer (experimental) 2023-03-17 20:04:40 +00:00
mrq
249c6019af cleanup, metrics are grabbed for vall-e trainer 2023-03-17 05:33:49 +00:00
mrq
1b72d0bba0 forgot to separate phonemes by spaces for [redacted] 2023-03-17 02:08:07 +00:00
mrq
d4c50967a6 cleaned up some prepare dataset code 2023-03-17 01:24:02 +00:00
mrq
1a8c5de517 unk hunting 2023-03-16 14:59:12 +00:00
mrq
da4f92681e oops 2023-03-16 04:35:12 +00:00
mrq
ee8270bdfb preparations for training an IPA-based finetune 2023-03-16 04:25:33 +00:00
mrq
363d0b09b1 added options to pick tokenizer json and diffusion model (so I don't have to add it in later when I get bored and add in diffusion training) 2023-03-15 00:37:38 +00:00
mrq
07b684c4e7 removed redundant training data (they exist within tortoise itself anyways), added utility: view tokenized text 2023-03-14 21:51:27 +00:00
mrq
7b16b3e88a ;) 2023-03-14 15:48:09 +00:00
mrq
c85e32ff53 (: 2023-03-14 14:08:35 +00:00
mrq
54036fd780 :) 2023-03-14 05:02:14 +00:00
mrq
66ac8ba766 added mel LR weight (as I finally understand when to adjust the text), added text validation on dataset creation 2023-03-13 18:51:53 +00:00
mrq
2feb6da0c0 cleanups and fixes, fix DLAS throwing errors from '''too short of sound files''' by just culling them during transcription 2023-03-11 01:19:49 +00:00
mrq
d3184004fd only God knows why the YAML spec lets you specify string values without quotes 2023-03-10 01:58:30 +00:00
mrq
b8867a5fb0 added the mysterious tortoise_compat flag mentioned in DLAS repo 2023-03-09 03:41:40 +00:00
mrq
b0baa1909a forgot template 2023-03-09 00:32:35 +00:00
mrq
3f321fe664 big cleanup to make my life easier when i add more parameters 2023-03-09 00:26:47 +00:00
mrq
34dcb845b5 actually make using adamw_zero optimizer for multi-gpus work 2023-03-08 15:31:33 +00:00
mrq
ff07f707cb disable validation if validation dataset not found, clamp validation batch size to validation dataset size instead of simply reusing batch size, switch to adamw_zero optimizier when training with multi-gpus (because the yaml comment said to and I think it might be why I'm absolutely having garbage luck training this japanese dataset) 2023-03-08 04:47:05 +00:00
mrq
b4098dca73 made validation working (will document later) 2023-03-08 02:58:00 +00:00
mrq
e862169e7f set validation to save rate and validation file if exists (need to test later) 2023-03-07 20:38:31 +00:00
mrq
3e220ed306 added option to set worker size in training config generator (because the default is overkill), for whisper transcriptions, load a specialized language model if it exists (for now, only english), output transcription to web UI when done transcribing 2023-03-05 05:17:19 +00:00
mrq
df24827b9a renamed mega batch factor to an actual real term: gradient accumulation factor, fixed halting training not actually killing the training process and freeing up resources, some logic cleanup for gradient accumulation (so many brain worms and wrong assumptions from testing on low batch sizes) (read the training section in the wiki for more details) 2023-03-04 15:55:06 +00:00
mrq
c2726fa0d4 added new training tunable: loss_text_ce_loss weight, added option to specify source model in case you want to finetune a finetuned model (for example, train a Japanese finetune on a large dataset, then finetune for a specific voice, need to truly validate if it produces usable output), some bug fixes that came up for some reason now and not earlier 2023-03-01 01:17:38 +00:00
mrq
225dee22d4 huge success 2023-02-23 06:24:54 +00:00
mrq
8a1a48f31e Added very experimental float16 training for cards with not enough VRAM (10GiB and below, maybe) \!NOTE\! this is VERY EXPERIMETNAL, I have zero free time to validate it right now, I'll do it later 2023-02-21 19:31:57 +00:00
mrq
092dd7b2d7 added more safeties and parameters to training yaml generator, I think I tested it extensively enough 2023-02-19 16:16:44 +00:00
mrq
cf758f4732 oops 2023-02-18 15:50:51 +00:00
mrq
2615cafd75 added dropdown to select autoregressive model for TTS, fixed a bug where the settings saveer constantly fires I hate gradio so much why are dropdown.change broken to contiuously fire and send an empty array 2023-02-18 14:10:26 +00:00
mrq
d5c1433268 a bit of UI cleanup, import multiple audio files at once, actually shows progress when importing voices, hides audio metadata / latents if no generated settings are detected, preparing datasets shows its progress, saving a training YAML shows a message when done, training now works within the web UI, training output shows to web UI, provided notebook is cleaned up and uses a venv, etc. 2023-02-18 02:07:22 +00:00
mrq
229be0bdb8 almost 2023-02-17 15:53:50 +00:00