vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	4800e7179a	remove nan checks because it causes problems in distributed training because I'm not syncing between GPUs (and nan losses gets ignored anyways with loss scaling)	2024-12-15 09:42:54 -06:00
mrq	23d402bf01	added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......)	2024-12-05 23:05:52 -06:00
mrq	ef1c17430f	skip step on nan loss (ironically I have not had a nan loss after adding this), throw exception with invalid cfg.dataset.sample_type and sample_order combination (because I was tricked by this in my yaml and had inconsistent vram usage)	2024-11-01 20:54:53 -05:00
mrq	32287710a2	moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)	2024-08-29 13:27:16 -05:00
mrq	75b04686f8	added prom-less training / inferencing, some other things	2024-07-22 19:36:07 -05:00
mrq	1a392b69f6	local training backend should be a bit more aware of variable batch sizes, maybe	2024-06-28 22:39:05 -05:00
mrq	7cfb78fa64	enable LoRA for targetted RVQ levels (to experiment with, seems to help)	2024-06-17 21:45:03 -05:00
mrq	7047fcc6e2	actually make deepspeed work with LoRAs	2024-06-17 13:55:37 -05:00
mrq	726a4b613f	naive, rudimentary DeepSpeed support (just live with the LoRA weights living with the original weights, they can be split later)	2024-06-17 13:17:24 -05:00
mrq	45a39fb79f	very rudimentary lora support (no deepspeed support, tested training and saving but not loading yet)	2024-06-17 00:09:16 -05:00
mrq	4ade2b60ee	ugh	2024-06-06 21:57:11 -05:00
mrq	fcac9503e2	cleanup	2024-06-06 13:08:02 -05:00
mrq	934672252b	feverish cleanup	2024-06-03 21:28:49 -05:00
mrq	856545f8bb	nan loss detection (should have added it earlier), loss scaling for local backend + fp16	2024-05-11 22:23:29 -05:00
mrq	9d97eb5104	added FP8 support through `NVIDIA/TransformerEngine`, added RetNet_HF through `syncdoth/RetNet` (as an alternative to branch away from torchscale)	2024-04-08 20:14:51 -05:00
mrq	3da1518ace	added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work)	2024-01-31 21:48:36 -06:00
mrq	4abd6564d1	fixed training stats not loading from exported weights, a bit of a readme cleanup, updated example training yaml	2023-09-23 19:59:00 -05:00
mrq	e7da1eb90d	edge case	2023-09-20 19:20:17 -05:00
mrq	c0b25541e3	restructured some things with the model to remove dead weights	2023-09-20 19:10:59 -05:00
mrq	8837bc34d7	added option to specify parameters to freeze per-model in YAML (because I need to see about committing atrocities with convering an AR into an AR+NAR)	2023-09-07 18:19:51 -05:00
mrq	57db3ccfa8	shuffled VALL-E continuous as a task tts-c instead, logic fixes for it	2023-09-02 12:23:40 -05:00
mrq	e40c0d34a0	somewhat got recurrent forward working (it's as accurate as chunkwise forward: it's not accurate at all), added option to use AMP instead of blanket setting the weight's dtype	2023-09-01 20:58:29 -05:00
mrq	7f4388e591	added total samples processed and tokens processed (len of text tokens + len of target response tokens)	2023-08-28 11:02:45 -05:00
mrq	87c4bfedba	added ability to mark models as disabled for training, and hotloading them for eval/validation (useful if training only one model, or training a model per GPU)	2023-08-27 12:26:12 -05:00
mrq	2d1a9f10c0	nightmare of spaghetti that might break compat; mechanism to increase RVQ bins of an existing model without retraining, keeps sampled proms/resps at max RVQ level and trim off excess levels according to what model receives them, some other things I already forgot (I really hope no one else has weights being baked right now)	2023-08-19 15:06:33 -05:00
mrq	d89568a96e	some fixes for the local framework	2023-08-05 03:22:15 +00:00
mrq	012f54b7f1	another classic commit so i can copy it to another machine to gut out things and use the trainer bits for a side project that I should really get around to working on sooner than later	2023-08-04 14:21:30 -05:00
mrq	c85101403f	big cleanup	2023-08-03 20:26:36 -05:00

28 Commits