vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	4800e7179a	remove nan checks because it causes problems in distributed training because I'm not syncing between GPUs (and nan losses gets ignored anyways with loss scaling)	2024-12-15 09:42:54 -06:00
mrq	218d0e29fd	ugh (batchmean actually expects batch=seq_len, and not the actual batch)	2024-12-07 12:39:01 -06:00
mrq	61ed662856	ACTUALLY actually fix KD-loss (the -inf in the logits was caused by cringecode)	2024-12-07 12:31:54 -06:00
mrq	f97e8b0c7f	ACTUALLY do KD-loss because of an oversight with masked_select outputting 1D tensors that get softmax'd in total	2024-12-07 09:52:51 -06:00
mrq	34a66e1052	agnostified KD	2024-12-06 23:53:46 -06:00
mrq	23d402bf01	added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......)	2024-12-05 23:05:52 -06:00
mrq	88d840218d	default set cfg strength to 3.0 since the reference model is updated	2024-11-17 10:23:40 -06:00
mrq	23fdba0c98	tweaks and changes	2024-11-16 15:49:06 -06:00
mrq	f7b8b1e825	dropped subtrain dataloader since its useless to duplicate	2024-11-11 17:00:49 -06:00
mrq	48490757da	fixes	2024-11-10 20:37:50 -06:00
mrq	9def34cd66	lol	2024-11-10 12:48:41 -06:00
mrq	a9d2faf2d7	all I can do now until I wait for the model to (re)train for pure NAR	2024-11-09 22:57:34 -06:00
mrq	d606a693ff	eval fix for nar-len	2024-11-06 23:14:16 -06:00
mrq	aee08b7307	changed layerskip float16 training warning (since it didnt seem to fry on my 4xV100 system)	2024-11-03 09:58:29 -06:00
mrq	62fe5b0943	ughh	2024-11-01 22:36:48 -05:00
mrq	fb8faa295b	actually float16(+AMP) and layerskip is bad and will kill the model......	2024-11-01 18:36:44 -05:00
mrq	a96f5aee32	adjusted how i want to pass eval kwargs	2024-10-25 20:38:09 -05:00
mrq	8920e5e86b	actually have beam_width in the webUI work	2024-10-22 22:06:22 -05:00
mrq	2ea978f318	added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval	2024-10-10 13:40:25 -05:00
mrq	039482a48e	don't do eval on stt because it's so slow and I don't even bother doing any metrics against it anyways (to-do: make this a flag)	2024-09-26 18:56:57 -05:00
mrq	fa93061b3e	more fixes, moved sampler state dict to a better place, eval works again	2024-09-06 16:59:56 -05:00
mrq	d33a906119	cleanup for AR_NAR inferencing to allow both TTS and STT tasks simultaneously (need to have training eval do this to though)	2024-09-06 14:30:12 -05:00
mrq	32287710a2	moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)	2024-08-29 13:27:16 -05:00
mrq	ab673e0426	add cap for NAR-len training, to avoid any weird cases in early training where it'll just mess up and generate long lengths	2024-08-03 21:00:32 -05:00
mrq	ce8bb1e4f7	sanity cleanups with weird off-by-one-ness, cleaned up and validated vall_e.models.experimental works again	2024-07-27 15:36:05 -05:00
mrq	75b04686f8	added prom-less training / inferencing, some other things	2024-07-22 19:36:07 -05:00
mrq	692d09f9c1	eval/validation fix for SpeechX tasks	2024-07-19 09:16:37 -05:00
mrq	97e768601c	re-introducing SpeechX tasks (need to validate them all, everything works with base tts anyways)	2024-07-18 16:16:14 -05:00
mrq	f770467eb3	stuff	2024-07-01 18:13:29 -05:00
mrq	bc2a6fa756	sanity cleanup: moved experimental features under its own thing	2024-06-30 10:37:33 -05:00
mrq	a8718d35a4	nasty bandaid because some of my DAC dataset only has 8 RVQ levels instead of the full 9	2024-06-29 10:16:37 -05:00
mrq	dd40463803	limit eval size because the training batch size seems to be used for the eval dataloader, somehow (bandaid)	2024-06-29 09:11:28 -05:00
mrq	1a392b69f6	local training backend should be a bit more aware of variable batch sizes, maybe	2024-06-28 22:39:05 -05:00
mrq	234f9efc6e	ugh	2024-06-09 11:39:43 -05:00
mrq	132a02c48b	sanity cleanup, backup config yaml for each log file	2024-06-09 11:22:52 -05:00
mrq	ead3e2f0cb	ugh	2024-06-08 16:14:57 -05:00
mrq	b072f9b96b	fixes	2024-06-08 16:01:34 -05:00
mrq	7d6fff24f9	un-tensor'd quant_level marker since it doesn't need to be one (I forgot why I had it as one but nothing seems to need it as a tensor that didn't already make it one)	2024-06-07 20:46:22 -05:00
mrq	4ade2b60ee	ugh	2024-06-06 21:57:11 -05:00
mrq	880b4ecd1b	cleanup, putting some thoughts in comments before I forget about them	2024-06-05 19:50:06 -05:00
mrq	3cfc8a96bb	oops	2024-06-05 10:30:04 -05:00
mrq	48cd1054f9	madness	2024-06-04 23:48:51 -05:00
mrq	0f7f3ae754	added loss calc split and acc for experimental model	2024-06-04 22:04:40 -05:00
mrq	6d5bd0156a	fixes	2024-06-04 18:50:48 -05:00
mrq	ed3aeaf3a1	copy pasted from test to actual trainer	2024-06-04 18:40:30 -05:00
mrq	c93d5863fd	fixes	2024-06-04 00:07:00 -05:00
mrq	186b93a77e	oops	2024-06-03 22:35:55 -05:00
mrq	e50edc3b48	added a flag to convert to a HF compatible model on export by stitching things	2024-06-03 22:34:47 -05:00
mrq	934672252b	feverish cleanup	2024-06-03 21:28:49 -05:00
mrq	05cd8b797e	nevermind it breaks training	2024-05-25 18:03:43 -05:00

1 2

82 Commits