vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	6634d07576	added muon optimizer through kludge hacks because it necessitates a second optimizer in tandum that seems to only sometimes work with deepspeed	2025-02-23 11:22:13 -06:00
mrq	ab0abd2b12	fixes fixes fixes (a quarter of my recently processed audio returned zero'd tensors......)	2025-02-22 09:07:33 -06:00
mrq	a65c8144f4	with the amount of tweaks I keep making I could have probably had the nvidia/audio-codec-44khz model realized already......	2025-02-13 18:38:40 -06:00
mrq	e8f182b634	cleaned up loss calc code (it REALLY hates ignore_loss_for_inputs, but is fine with splitting with loss factors)	2025-02-13 09:35:27 -06:00
mrq	b52c5c5d80	this seems to work in testing	2025-02-12 16:16:04 -06:00
mrq	e029a8804d	ironically none of this cruft gets the loss lower than the original way	2025-02-12 11:17:00 -06:00
mrq	e5916ea519	for my sanity it seems having extraneous tokens in the embedding/classifier has the loss/acc a little higher than it should	2025-02-11 14:47:35 -06:00
mrq	497bdfc67b	more work (the wall is non-causal decoding......)	2024-12-22 20:11:31 -06:00
mrq	5f289db275	ugh	2024-12-22 16:15:24 -06:00
mrq	353e478e68	agony	2024-12-21 22:52:10 -06:00
mrq	4800e7179a	remove nan checks because it causes problems in distributed training because I'm not syncing between GPUs (and nan losses gets ignored anyways with loss scaling)	2024-12-15 09:42:54 -06:00
mrq	3dd31e74d1	finally figured out a clean way to handle "resuming" the tqdm bar	2024-12-14 18:44:43 -06:00
mrq	09804ecc16	APOLLO tweaks to make it work with deepspeed	2024-12-13 23:03:52 -06:00
mrq	64c67160a3	tweaks	2024-12-13 19:00:35 -06:00
mrq	0fbfb8bbe8	actually save the optimizer for the local engine backend because safetensors doesn't save it	2024-12-12 17:12:59 -06:00
mrq	f41251f648	more fixes for local engine backend	2024-12-12 14:38:42 -06:00
mrq	6b237ae5e3	tweaks for the local engine orchestrator (that I never caught since I always used the deepspeed backend)	2024-12-12 13:37:38 -06:00
mrq	9a62e3b824	APOLLO cringe (doesn't want to work with deepspeed)	2024-12-12 00:31:58 -06:00
mrq	8568a93dad	added WER/SIM-O metrics, added APOLLO but I need to test it	2024-12-10 20:13:21 -06:00
mrq	61ed662856	ACTUALLY actually fix KD-loss (the -inf in the logits was caused by cringecode)	2024-12-07 12:31:54 -06:00
mrq	23d402bf01	added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......)	2024-12-05 23:05:52 -06:00
mrq	3fc0540f49	m	2024-11-21 15:07:46 -06:00
mrq	dfdba3f190	oops	2024-11-20 19:21:03 -06:00
mrq	cd6e9ba2f2	oops	2024-11-20 16:27:51 -06:00
mrq	1a73ac6a20	I cannot believe it's not actually called Wand DB (added wandb logging support since I think it would have been a much better way to look at my metrics)	2024-11-20 16:10:47 -06:00
mrq	190a917b3e	I did it.	2024-11-19 12:24:33 -06:00
mrq	e412e98125	ugh	2024-11-14 07:34:22 -06:00
mrq	269648605e	move NAR-len rvq level 0 to separate embedding	2024-11-13 11:38:58 -06:00
mrq	48490757da	fixes	2024-11-10 20:37:50 -06:00
mrq	9cb0b6901b	unified nar.py into ar_nar.py	2024-11-10 12:19:48 -06:00
mrq	e108c54daf	new NAR-len training paradigm......	2024-11-07 11:32:11 -06:00
mrq	c83670c38c	Windows specific fixes (to-do: find libespeak-ng.dll automatically because it cannot be trusted to do it by default)	2024-11-03 19:19:15 -06:00
mrq	62fe5b0943	ughh	2024-11-01 22:36:48 -05:00
mrq	ef1c17430f	skip step on nan loss (ironically I have not had a nan loss after adding this), throw exception with invalid cfg.dataset.sample_type and sample_order combination (because I was tricked by this in my yaml and had inconsistent vram usage)	2024-11-01 20:54:53 -05:00
mrq	4049f51ba9	added option to load lora directly from the model file itself with --lora	2024-10-26 00:13:10 -05:00
mrq	ccf71dc1b6	added option to load from a model state dict directly instead of a yaml (to-do: do this for LoRAs too), automatically download the default model if none is provided	2024-10-25 22:15:15 -05:00
mrq	75b90be325	cleaned up unused config flags, allow less strict yaml by pruning missing keys, renamed some dataset configs to be more unified	2024-10-17 17:06:48 -05:00
mrq	c8d4716a9f	ugh	2024-09-18 21:40:57 -05:00
mrq	31e8b7edb8	tweaks and fixes for lora stuffs	2024-09-08 18:05:21 -05:00
mrq	413097f5f7	fixes	2024-09-05 21:42:59 -05:00
mrq	d319d33368	haha	2024-09-04 14:52:26 -05:00
mrq	619369236b	ugh	2024-08-30 21:10:57 -05:00
mrq	685f4faec0	ugh	2024-08-30 10:46:26 -05:00
mrq	32287710a2	moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)	2024-08-29 13:27:16 -05:00
mrq	b7b99a25f1	added ability to specify attention backend for CLI and webui (because im tired of editing the yaml)	2024-08-26 19:33:51 -05:00
mrq	3a65cc4b22	fix issue with sft and shared tensors...	2024-08-04 19:56:21 -05:00
mrq	d19f93a2c0	documentation update	2024-08-04 00:14:49 -05:00
mrq	2cb465018b	implicitly load either normal pickled weights or safetensors on loading the model	2024-08-03 23:34:18 -05:00
mrq	c09133d00f	added safetensors support (with metadata) and feed whatever torch.load/torch.save into it	2024-08-03 23:15:20 -05:00
mrq	6a733eb2ed	changed torch.Tensor().to(device, dtype) to just torch.tensor(..., device, dtype) because it's been bothering my autism that I'm creating tensors then converting rather than creating with the right device/dtype, some 'optimization' to compile the model but it doesnt seem to do anything useful	2024-08-03 22:10:21 -05:00

1 2 3

135 Commits