vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	4800e7179a	remove nan checks because it causes problems in distributed training because I'm not syncing between GPUs (and nan losses gets ignored anyways with loss scaling)	2024-12-15 09:42:54 -06:00
mrq	09804ecc16	APOLLO tweaks to make it work with deepspeed	2024-12-13 23:03:52 -06:00
mrq	64c67160a3	tweaks	2024-12-13 19:00:35 -06:00
mrq	0fbfb8bbe8	actually save the optimizer for the local engine backend because safetensors doesn't save it	2024-12-12 17:12:59 -06:00
mrq	f41251f648	more fixes for local engine backend	2024-12-12 14:38:42 -06:00
mrq	6b237ae5e3	tweaks for the local engine orchestrator (that I never caught since I always used the deepspeed backend)	2024-12-12 13:37:38 -06:00
mrq	23d402bf01	added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......)	2024-12-05 23:05:52 -06:00
mrq	dfdba3f190	oops	2024-11-20 19:21:03 -06:00
mrq	cd6e9ba2f2	oops	2024-11-20 16:27:51 -06:00
mrq	1a73ac6a20	I cannot believe it's not actually called Wand DB (added wandb logging support since I think it would have been a much better way to look at my metrics)	2024-11-20 16:10:47 -06:00
mrq	190a917b3e	I did it.	2024-11-19 12:24:33 -06:00
mrq	c83670c38c	Windows specific fixes (to-do: find libespeak-ng.dll automatically because it cannot be trusted to do it by default)	2024-11-03 19:19:15 -06:00
mrq	62fe5b0943	ughh	2024-11-01 22:36:48 -05:00
mrq	75b90be325	cleaned up unused config flags, allow less strict yaml by pruning missing keys, renamed some dataset configs to be more unified	2024-10-17 17:06:48 -05:00
mrq	31e8b7edb8	tweaks and fixes for lora stuffs	2024-09-08 18:05:21 -05:00
mrq	d319d33368	haha	2024-09-04 14:52:26 -05:00
mrq	619369236b	ugh	2024-08-30 21:10:57 -05:00
mrq	32287710a2	moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)	2024-08-29 13:27:16 -05:00
mrq	3a65cc4b22	fix issue with sft and shared tensors...	2024-08-04 19:56:21 -05:00
mrq	c09133d00f	added safetensors support (with metadata) and feed whatever torch.load/torch.save into it	2024-08-03 23:15:20 -05:00
mrq	06e948aec1	suppress warning on exit about distributed not being cleaned up (because I updated my system)	2024-07-25 16:50:47 -05:00
mrq	188d116222	some weird fixes for an equally weird regression with LoRA loading	2024-07-22 20:47:24 -05:00
mrq	75b04686f8	added prom-less training / inferencing, some other things	2024-07-22 19:36:07 -05:00
mrq	d87b492295	added rudimentary demo page creator (currently just embeds base64 wavs into the page, need to test not doing that)	2024-07-19 20:49:40 -05:00
mrq	fe0f235335	mechanism to store the model config inside the weights and load them, some other things to allow LoRA training on the RetNet (gradient checkpointing will gripe about inputs not having require_grad and nothing seems to remedy it)	2024-07-16 18:23:13 -05:00
mrq	1a392b69f6	local training backend should be a bit more aware of variable batch sizes, maybe	2024-06-28 22:39:05 -05:00
mrq	7cfb78fa64	enable LoRA for targetted RVQ levels (to experiment with, seems to help)	2024-06-17 21:45:03 -05:00
mrq	7047fcc6e2	actually make deepspeed work with LoRAs	2024-06-17 13:55:37 -05:00
mrq	1d159b1476	updated export routine to split LoRA weights from the state dict (should work with deepspeed)	2024-06-17 13:28:18 -05:00
mrq	726a4b613f	naive, rudimentary DeepSpeed support (just live with the LoRA weights living with the original weights, they can be split later)	2024-06-17 13:17:24 -05:00
mrq	bd0bc10ec0	added LoRA policy to decide what layer of the model gets adapted based on simple inclusion/exclusion terms	2024-06-17 13:05:06 -05:00
mrq	45a39fb79f	very rudimentary lora support (no deepspeed support, tested training and saving but not loading yet)	2024-06-17 00:09:16 -05:00
mrq	a7a6e0ac76	validated that inferencing works, changed some defaults (NAR benefits from greedy sampling)	2024-06-09 17:11:38 -05:00
mrq	4ade2b60ee	ugh	2024-06-06 21:57:11 -05:00
mrq	fcac9503e2	cleanup	2024-06-06 13:08:02 -05:00
mrq	e50edc3b48	added a flag to convert to a HF compatible model on export by stitching things	2024-06-03 22:34:47 -05:00
mrq	934672252b	feverish cleanup	2024-06-03 21:28:49 -05:00
mrq	c2a436d368	somehow between training sessions grad_norm = None even though it worked before	2024-06-02 08:29:27 -05:00
mrq	827cf632e7	report current loss scale and adjust grad norm by loss scale (for deepspeed)	2024-06-01 10:44:32 -05:00
mrq	856545f8bb	nan loss detection (should have added it earlier), loss scaling for local backend + fp16	2024-05-11 22:23:29 -05:00
mrq	88e9b9caff	local ddp fix	2024-05-11 17:29:01 -05:00
mrq	71e373064f	remove redundant loss, tweak readme	2024-05-11 15:02:47 -05:00
mrq	8aa1b2dabf	documentation update	2024-05-04 21:03:46 -05:00
mrq	9d97eb5104	added FP8 support through `NVIDIA/TransformerEngine`, added RetNet_HF through `syncdoth/RetNet` (as an alternative to branch away from torchscale)	2024-04-08 20:14:51 -05:00
mrq	91062361af	tweaks	2024-03-01 20:38:06 -06:00
mrq	f3c59c3e7e	cleaner replacement code (because I realized BitNet had an implementation for it too), added calculating gradient norm and performing gradient clipping in local trainer (non-deepspeed)	2024-03-01 20:18:43 -06:00
mrq	3da1518ace	added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work)	2024-01-31 21:48:36 -06:00
mrq	9c198eb75a	added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works)	2023-12-20 18:45:58 -06:00
mrq	6c51a629cc	resetting step count resets the samples processed and other metrics	2023-10-29 12:11:19 -05:00
mrq	09cda7d3f9	added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup	2023-10-16 19:30:38 -05:00

1 2

72 Commits