vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	f770467eb3	stuff	2024-07-01 18:13:29 -05:00
mrq	dced595391	more cleanup	2024-06-30 11:00:12 -05:00
mrq	bc2a6fa756	sanity cleanup: moved experimental features under its own thing	2024-06-30 10:37:33 -05:00
mrq	b21f74a5c5	added summing of external embeddings (at this point i dont think any amount of cope bandaids will get DAC to train nicely, I think the RVQ levels the NAR tends add too much noise if they're not accurate)	2024-06-29 23:42:30 -05:00
mrq	793ccb16fb	ugh	2024-06-29 22:14:35 -05:00
mrq	2808f881c8	cleaned up subjugated audio embedding into a flag, flag can also have it include the original, underlying embedding as well (it seems to do better when set to inclusive)	2024-06-29 21:46:35 -05:00
mrq	ec5eaebcbc	experimental method of using DACs quantizer ""embeddings"" to see if it helps with model quality	2024-06-29 19:46:11 -05:00
mrq	a8718d35a4	nasty bandaid because some of my DAC dataset only has 8 RVQ levels instead of the full 9	2024-06-29 10:16:37 -05:00
mrq	591d3ac848	have eval dataloader use eval batch size for batchedordersampler	2024-06-28 22:44:00 -05:00
mrq	83075c1505	sort duration buckets to ensure that paths sorted-by-duration are actually sorted by duration (because i didnt know that python dicts can have non-strings as keys), added batching samples based on total duration to ensure best training throughput	2024-06-28 22:28:54 -05:00
mrq	8a986eb480	load exported LoRA weights if exists (to-do: make a better LoRA loading mechanism)	2024-06-18 21:45:46 -05:00
mrq	2bfe786ebd	ban stop token for NAR levels (because sometimes it gets sampled and causes problems)	2024-06-17 22:14:43 -05:00
mrq	7cfb78fa64	enable LoRA for targetted RVQ levels (to experiment with, seems to help)	2024-06-17 21:45:03 -05:00
mrq	7047fcc6e2	actually make deepspeed work with LoRAs	2024-06-17 13:55:37 -05:00
mrq	1d159b1476	updated export routine to split LoRA weights from the state dict (should work with deepspeed)	2024-06-17 13:28:18 -05:00
mrq	bd0bc10ec0	added LoRA policy to decide what layer of the model gets adapted based on simple inclusion/exclusion terms	2024-06-17 13:05:06 -05:00
mrq	be051d9544	added other LoRA method using parametrization rather than linear injection	2024-06-17 09:58:34 -05:00
mrq	45a39fb79f	very rudimentary lora support (no deepspeed support, tested training and saving but not loading yet)	2024-06-17 00:09:16 -05:00
mrq	19410a919e	ugh	2024-06-15 12:29:03 -05:00
mrq	d343bde09b	residual_in_fp32=False for mamba arch backends because it breaks the classifier (output projection / lm head / what-have-you) under AMP	2024-06-15 12:08:03 -05:00
mrq	ccb14c06ef	mamba2-hf using `vasqu/mamba2-torch` because it lets me use mamba2 without triton ops (training with my 4xV100s are not happy with mamba2 because of triton)	2024-06-14 19:42:17 -05:00
mrq	83eab4fa59	actually going for the suggested "2x layers, no intermediate scaling" is wrong for VALL-E, directly copying the normal transformer structure fixes mamba2 performance in the test trainer	2024-06-13 20:08:22 -05:00
mrq	26da24fd8d	mamba updated to fix that pesky NaN error during training	2024-06-13 12:38:33 -05:00
mrq	bcf3910a17	the NAR only dream is dead (it just won't work)	2024-06-12 19:49:47 -05:00
mrq	a9353cf9fa	ugh	2024-06-12 00:14:29 -05:00
mrq	cca542a4c0	ugh	2024-06-11 23:59:28 -05:00
mrq	65a8960305	option to split classifier per-level instead of sharing one (at this point I'm just scrambling to try and cope with training a DAC model, the NAR is being a pain)	2024-06-11 22:28:59 -05:00
mrq	a7a6e0ac76	validated that inferencing works, changed some defaults (NAR benefits from greedy sampling)	2024-06-09 17:11:38 -05:00
mrq	132a02c48b	sanity cleanup, backup config yaml for each log file	2024-06-09 11:22:52 -05:00
mrq	80f9530840	ugh	2024-06-09 01:43:44 -05:00
mrq	5c732b72ee	ugh	2024-06-08 20:34:00 -05:00
mrq	8d068fa3f9	reticulating splines	2024-06-08 20:30:15 -05:00
mrq	b072f9b96b	fixes	2024-06-08 16:01:34 -05:00
mrq	58fb0a84db	added experimental NAR only model (inferences text length, need more experimenting), AudioEmbedding logic cleanup (I still think it's being done wrong)	2024-06-08 15:42:02 -05:00
mrq	7d6fff24f9	un-tensor'd quant_level marker since it doesn't need to be one (I forgot why I had it as one but nothing seems to need it as a tensor that didn't already make it one)	2024-06-07 20:46:22 -05:00
mrq	b0158a61d5	fixed some logic errors with training (grabbing wrong quant level...)	2024-06-07 20:34:36 -05:00
mrq	eafa622be2	I forgot the actual reason I was cleaning things up was to re-include prom loss calculation (I realized the reason I did this was because of an prom embedding oversight, it seems to work now)	2024-06-07 20:29:25 -05:00
mrq	f9f309281a	ugh	2024-06-06 20:55:27 -05:00
mrq	a5c90348d9	head hurt	2024-06-06 20:51:31 -05:00
mrq	516b0894d7	m	2024-06-06 19:41:26 -05:00
mrq	ee25d2e62e	removed the need to supply targ_list + different AudioEmbedding + other things	2024-06-06 18:52:41 -05:00
mrq	fcac9503e2	cleanup	2024-06-06 13:08:02 -05:00
mrq	b2194b859a	re-added loading multiple models because I'm now entertaining having split AR/NAR models again (and need a way to load both at once)	2024-06-06 09:48:43 -05:00
mrq	b05a905b95	ugh	2024-06-05 21:02:05 -05:00
mrq	4073656293	oops	2024-06-05 20:53:10 -05:00
mrq	ff6fe6f1bc	cleanup	2024-06-05 20:30:43 -05:00
mrq	880b4ecd1b	cleanup, putting some thoughts in comments before I forget about them	2024-06-05 19:50:06 -05:00
mrq	3cfc8a96bb	oops	2024-06-05 10:30:04 -05:00
mrq	48cd1054f9	madness	2024-06-04 23:48:51 -05:00
mrq	9e3f2e300f	experimental "just have a token for what rvq level we're on" that seems to help all models (mamba almost works, but it might just have to be relegated as a pure AR model)	2024-06-04 23:23:31 -05:00

1 2 3 4

183 Commits