vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	0cca4eb943	disable this cringe precheck for now since it causes problems	2025-05-22 13:21:36 -05:00
mrq	f12746b091	allow defining the default model name through env var, register nemo-larger in the model name list thing	2025-05-21 16:50:59 -05:00
mrq	e46d7ef2cb	warn and ignore export when lora training because the state dict exported during training is wrong	2025-05-20 23:38:10 -05:00
mrq	fee02f4153	added option to explicitly load a lora without having to lobotomize yourself with creating a yaml just to do so	2025-05-20 23:28:29 -05:00
mrq	5018ddb107	i dont know why this managed to escape my attention	2025-05-20 15:13:21 -05:00
mrq	b2b243e7e7	addresses #9	2025-05-05 13:03:44 -05:00
mrq	5fe01ffc6c	more notes / re-enabled top-k/p samplers for new implementation	2025-04-19 14:04:34 -05:00
mrq	f8e1d110dc	when you uhh when you for once use your main rig to test and forgot to and when you port things back over	2025-04-18 20:49:00 -05:00
mrq	d9e18037cc	new implementation tweaks and fixes to make it actually better (there were a lot of badwrong things being done that harmed the output quality, will evaluate the model further)	2025-04-18 20:36:44 -05:00
mrq	98d1d8cb1e	added some more notes, tweaks (RIP DAC, it's over)	2025-04-17 20:24:40 -05:00
mrq	9e27d2e02e	huggingface zerogpu cringe	2025-04-16 15:25:45 -05:00
mrq	814146a5e0	more settings bloat because there seems to be instability with the encoder as-is	2025-04-12 12:53:44 -05:00
mrq	f144389920	the culprit was initializing the level_weights for killing newly trained models.............	2025-04-10 23:06:16 -05:00
mrq	6c6a34dd21	i can't be assed to test if the prior commit works so being explicit like this should help until i can be bothered to halt training just to test this	2025-04-07 23:13:35 -05:00
mrq	6d42c9ae23	how foolish of me, not having a softmax as float32 (maybe addresses an emergent regression where bfloat16 training shits the bed where float16+loss scaling doesnt)	2025-04-07 22:51:52 -05:00
mrq	d6cd848c32	goodbye nvidia/audio-codec-44khz, crossed fingers for DAC again	2025-04-06 21:05:29 -05:00
mrq	1e22519d94	diagnosed both hf/llama.cpp versions to probably just being a faulty export method (to-do: migrate vall_e.models.base to vall_e.export --hf)	2025-04-05 22:05:39 -05:00
mrq	c34763769a	ugh	2025-04-05 18:58:25 -05:00
mrq	b6692ce3de	ugh	2025-04-05 18:20:46 -05:00
mrq	4a909ceff8	temp fix for vall_e.cpp demask scoring regression	2025-04-05 11:04:26 -05:00
mrq	44260f7445	tweaks	2025-04-05 10:27:07 -05:00
mrq	0ede3bfc12	updated vall_e.cpp, but i could have sworn it worked much better than this......	2025-04-05 01:22:51 -05:00
mrq	28d39ef962	should not be working late	2025-04-03 23:32:58 -05:00
mrq	bfe70e9d56	ugh	2025-04-03 23:26:00 -05:00
mrq	2e93438867	reintroduced sampler_type = speaker because I think this might salvage the nemo model to have better speaker similarities	2025-04-03 19:01:10 -05:00
mrq	caad99ab78	fix for bsz>1 because I forgot the old implementation implicitly handles this	2025-04-02 17:17:37 -05:00
mrq	068dbdb785	ugh	2025-04-02 17:05:16 -05:00
mrq	0e995dbf2c	is this my last cope (falling back to explicit duration prediction, as this regression just won't go away) (also the smaller model was lobotomized because of my ROCm setup having a botched SDPA for who knows why)	2025-04-02 17:01:24 -05:00
mrq	7a0956863d	oops	2025-03-31 21:11:43 -05:00
mrq	a1184586ef	should never have trusted mse_loss, it never works	2025-03-31 20:59:13 -05:00
mrq	99f251c768	slight tweaks to condition-less NS/SR	2025-03-30 10:37:40 -05:00
mrq	478aea0e8c	tweaks	2025-03-28 19:49:54 -05:00
mrq	6ae282e090	re-added noise dataloader sampler whatever for the old implementation's other tasks that require it	2025-03-28 15:07:06 -05:00
mrq	90b3509404	I'll just cope and say I cannot apply segmented attention masks to the smaller model as it's too trained on not doing it, and the regression came from dumb python aliasing rules	2025-03-27 13:27:51 -05:00
mrq	2fd82a7a22	cannot get segmented mask to actually work without gradients exploding (need to find a different way to do duration prediction...)	2025-03-27 00:51:41 -05:00
mrq	4d777b5618	add remark that segmented attention actually might be broken (for some reason this only emerged recently, need to investigate)	2025-03-26 12:08:47 -05:00
mrq	09e9438941	ugh	2025-03-25 23:24:01 -05:00
mrq	8641c87611	nothing could go wrong part 2 (reverted and rewrote commits since there was a nasty regression)	2025-03-25 23:06:16 -05:00
mrq	aa8b32d97e	added more notes (although I could have sworn I have had more notes that i can't recall)	2025-03-25 18:53:06 -05:00
mrq	df5b870908	added remark about not using sliding attention	2025-03-22 12:44:34 -05:00
mrq	02a8bcbe29	fixed errant index error (although it makes me wonder if my segmented masking is still flawed)	2025-03-21 23:41:34 -05:00
mrq	d1d91295b3	add segmented sliding attention, also found a bug with prom-less segments in the attention mask generation.........	2025-03-21 19:05:49 -05:00
mrq	589cfb0e18	yuge speedup because of a dumb oversight	2025-03-20 17:39:41 -05:00
mrq	8068f24e35	cleaned up parallel nar, i think it's slightly faster but even the smallest model is still slower than ar+nar-len-llama-8...	2025-03-20 15:56:15 -05:00
mrq	9a7458cf17	fixed inferencing since I did delete the len_emb, some more notes on the model since it seems I just had bad experimental settings	2025-03-19 22:41:48 -05:00
mrq	61de653ad9	now causal training should work again	2025-03-19 14:20:19 -05:00
mrq	85b9dd47c1	ugh	2025-03-19 13:31:50 -05:00
mrq	81acd565b3	re-enable these	2025-03-18 20:59:33 -05:00
mrq	5479d2eacc	more tweaks to the new implementation (properly trim the len stuff to save some params, decoder to d_ffn expansion to 2 to maybe also make it faster, etc.)	2025-03-18 19:34:37 -05:00
mrq	9a8a8e3195	off by one bateman	2025-03-18 08:40:43 -05:00

1 2 3 4 5 ...

874 Commits