vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	5fe01ffc6c	more notes / re-enabled top-k/p samplers for new implementation	2025-04-19 14:04:34 -05:00
mrq	d9e18037cc	new implementation tweaks and fixes to make it actually better (there were a lot of badwrong things being done that harmed the output quality, will evaluate the model further)	2025-04-18 20:36:44 -05:00
mrq	98d1d8cb1e	added some more notes, tweaks (RIP DAC, it's over)	2025-04-17 20:24:40 -05:00
mrq	6d42c9ae23	how foolish of me, not having a softmax as float32 (maybe addresses an emergent regression where bfloat16 training shits the bed where float16+loss scaling doesnt)	2025-04-07 22:51:52 -05:00
mrq	d6cd848c32	goodbye nvidia/audio-codec-44khz, crossed fingers for DAC again	2025-04-06 21:05:29 -05:00
mrq	2e93438867	reintroduced sampler_type = speaker because I think this might salvage the nemo model to have better speaker similarities	2025-04-03 19:01:10 -05:00
mrq	0e995dbf2c	is this my last cope (falling back to explicit duration prediction, as this regression just won't go away) (also the smaller model was lobotomized because of my ROCm setup having a botched SDPA for who knows why)	2025-04-02 17:01:24 -05:00
mrq	6ae282e090	re-added noise dataloader sampler whatever for the old implementation's other tasks that require it	2025-03-28 15:07:06 -05:00
mrq	90b3509404	I'll just cope and say I cannot apply segmented attention masks to the smaller model as it's too trained on not doing it, and the regression came from dumb python aliasing rules	2025-03-27 13:27:51 -05:00
mrq	2fd82a7a22	cannot get segmented mask to actually work without gradients exploding (need to find a different way to do duration prediction...)	2025-03-27 00:51:41 -05:00
mrq	4d777b5618	add remark that segmented attention actually might be broken (for some reason this only emerged recently, need to investigate)	2025-03-26 12:08:47 -05:00
mrq	8641c87611	nothing could go wrong part 2 (reverted and rewrote commits since there was a nasty regression)	2025-03-25 23:06:16 -05:00
mrq	aa8b32d97e	added more notes (although I could have sworn I have had more notes that i can't recall)	2025-03-25 18:53:06 -05:00
mrq	df5b870908	added remark about not using sliding attention	2025-03-22 12:44:34 -05:00
mrq	9a7458cf17	fixed inferencing since I did delete the len_emb, some more notes on the model since it seems I just had bad experimental settings	2025-03-19 22:41:48 -05:00
mrq	81acd565b3	re-enable these	2025-03-18 20:59:33 -05:00
mrq	b0dba9db07	this may bite me in the ass	2025-03-17 21:46:50 -05:00
mrq	2dfef693c4	comments for clarity	2025-03-16 11:30:23 -05:00
mrq	9cfbf94b1c	config-ify the len_loss_factor	2025-03-14 20:30:48 -05:00
mrq	ba5f3d19b4	use the FSQ-targeted encoder/decodede whole-ly as it works for EnCodec too, as the RVQ-targeted encoder/decoder doesnt (and some notes)	2025-03-12 22:47:19 -05:00
mrq	5c512717a6	len prediction for new model (and remove logit normalization since it kills inferencing)	2025-03-11 20:33:09 -05:00

21 Commits