ai-voice-cloning

Author	SHA1	Message	Date
mrq	a657623cbc	updated vall-e training template to use path-based speakers because it would just have a batch/epoch size of 1 otherwise; revert hardcoded 'spit processed dataset to this path' from my training rig to spit it out in a sane spot	2023-08-24 21:45:50 +00:00
mrq	0a5483e57a	updated valle yaml template	2023-08-23 21:42:32 +00:00
mrq	d2a9ab9e41	remove redundant phonemize for vall-e (oops), quantize all files and then phonemize all files for cope optimization, load alignment model once instead of for every transcription (speedup with whisperx)	2023-03-23 00:22:25 +00:00
mrq	da96161aaa	oops	2023-03-22 18:07:46 +00:00
mrq	f822c87344	cleanups, realigning vall-e training	2023-03-22 17:47:23 +00:00
mrq	34ef0467b9	VALL-E config edits	2023-03-20 01:22:53 +00:00
mrq	b17260cddf	added japanese tokenizer (experimental)	2023-03-17 20:04:40 +00:00
mrq	249c6019af	cleanup, metrics are grabbed for vall-e trainer	2023-03-17 05:33:49 +00:00
mrq	1b72d0bba0	forgot to separate phonemes by spaces for [redacted]	2023-03-17 02:08:07 +00:00
mrq	d4c50967a6	cleaned up some prepare dataset code	2023-03-17 01:24:02 +00:00
mrq	1a8c5de517	unk hunting	2023-03-16 14:59:12 +00:00
mrq	da4f92681e	oops	2023-03-16 04:35:12 +00:00
mrq	ee8270bdfb	preparations for training an IPA-based finetune	2023-03-16 04:25:33 +00:00
mrq	363d0b09b1	added options to pick tokenizer json and diffusion model (so I don't have to add it in later when I get bored and add in diffusion training)	2023-03-15 00:37:38 +00:00
mrq	07b684c4e7	removed redundant training data (they exist within tortoise itself anyways), added utility: view tokenized text	2023-03-14 21:51:27 +00:00
mrq	7b16b3e88a	;)	2023-03-14 15:48:09 +00:00
mrq	c85e32ff53	(:	2023-03-14 14:08:35 +00:00
mrq	54036fd780	:)	2023-03-14 05:02:14 +00:00
mrq	66ac8ba766	added mel LR weight (as I finally understand when to adjust the text), added text validation on dataset creation	2023-03-13 18:51:53 +00:00
mrq	2feb6da0c0	cleanups and fixes, fix DLAS throwing errors from '''too short of sound files''' by just culling them during transcription	2023-03-11 01:19:49 +00:00
mrq	d3184004fd	only God knows why the YAML spec lets you specify string values without quotes	2023-03-10 01:58:30 +00:00
mrq	b8867a5fb0	added the mysterious tortoise_compat flag mentioned in DLAS repo	2023-03-09 03:41:40 +00:00
mrq	b0baa1909a	forgot template	2023-03-09 00:32:35 +00:00
mrq	3f321fe664	big cleanup to make my life easier when i add more parameters	2023-03-09 00:26:47 +00:00
mrq	34dcb845b5	actually make using adamw_zero optimizer for multi-gpus work	2023-03-08 15:31:33 +00:00
mrq	ff07f707cb	disable validation if validation dataset not found, clamp validation batch size to validation dataset size instead of simply reusing batch size, switch to adamw_zero optimizier when training with multi-gpus (because the yaml comment said to and I think it might be why I'm absolutely having garbage luck training this japanese dataset)	2023-03-08 04:47:05 +00:00
mrq	b4098dca73	made validation working (will document later)	2023-03-08 02:58:00 +00:00
mrq	e862169e7f	set validation to save rate and validation file if exists (need to test later)	2023-03-07 20:38:31 +00:00
mrq	3e220ed306	added option to set worker size in training config generator (because the default is overkill), for whisper transcriptions, load a specialized language model if it exists (for now, only english), output transcription to web UI when done transcribing	2023-03-05 05:17:19 +00:00
mrq	df24827b9a	renamed mega batch factor to an actual real term: gradient accumulation factor, fixed halting training not actually killing the training process and freeing up resources, some logic cleanup for gradient accumulation (so many brain worms and wrong assumptions from testing on low batch sizes) (read the training section in the wiki for more details)	2023-03-04 15:55:06 +00:00
mrq	c2726fa0d4	added new training tunable: loss_text_ce_loss weight, added option to specify source model in case you want to finetune a finetuned model (for example, train a Japanese finetune on a large dataset, then finetune for a specific voice, need to truly validate if it produces usable output), some bug fixes that came up for some reason now and not earlier	2023-03-01 01:17:38 +00:00
mrq	225dee22d4	huge success	2023-02-23 06:24:54 +00:00
mrq	8a1a48f31e	Added very experimental float16 training for cards with not enough VRAM (10GiB and below, maybe) \!NOTE\! this is VERY EXPERIMETNAL, I have zero free time to validate it right now, I'll do it later	2023-02-21 19:31:57 +00:00
mrq	092dd7b2d7	added more safeties and parameters to training yaml generator, I think I tested it extensively enough	2023-02-19 16:16:44 +00:00
mrq	cf758f4732	oops	2023-02-18 15:50:51 +00:00
mrq	2615cafd75	added dropdown to select autoregressive model for TTS, fixed a bug where the settings saveer constantly fires I hate gradio so much why are dropdown.change broken to contiuously fire and send an empty array	2023-02-18 14:10:26 +00:00
mrq	d5c1433268	a bit of UI cleanup, import multiple audio files at once, actually shows progress when importing voices, hides audio metadata / latents if no generated settings are detected, preparing datasets shows its progress, saving a training YAML shows a message when done, training now works within the web UI, training output shows to web UI, provided notebook is cleaned up and uses a venv, etc.	2023-02-18 02:07:22 +00:00
mrq	229be0bdb8	almost	2023-02-17 15:53:50 +00:00

38 Commits