vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	9e3f2e300f	experimental "just have a token for what rvq level we're on" that seems to help all models (mamba almost works, but it might just have to be relegated as a pure AR model)	2024-06-04 23:23:31 -05:00
mrq	e0886c5a78	re-added mamba as a possible non-experimental arch backend (test trainer will set it as AR only, doing any NAR tasks lobotomizes it)	2024-06-04 22:41:22 -05:00
mrq	c93d5863fd	fixes	2024-06-04 00:07:00 -05:00
mrq	7feeb944a0	probably insane with even entertaining going this route	2024-06-03 20:26:27 -05:00
mrq	e15c6c74c3	correctness	2024-05-30 20:50:45 -05:00
mrq	ddbacde0d1	DAC just doesn't work well enough......	2024-05-25 11:07:52 -05:00
mrq	e3ef89f5aa	100x better for subtrain/eval to be by group instead	2024-05-19 16:40:14 -05:00
mrq	458b95d196	added option to split between text loss and audio loss (to-do: document this better), because it may or may not be a problem with LLaMA-backed models because my loss hovers around 3.9 / 56% accuracy despite sounding decent at the moment	2024-05-19 11:23:56 -05:00
mrq	3337c69e5a	leverage between xformers and `torch.backends.cuda.sdp_kernel` for attention	2024-05-11 17:14:05 -05:00
mrq	2109712e5b	resolve deprecation warning that doesn't show on my old training rig but does on my new one	2024-05-09 23:25:44 -05:00
mrq	0d5d545a40	crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc.	2024-05-09 20:28:20 -05:00
mrq	33b7f81b94	small cleanups	2024-05-04 22:37:22 -05:00
mrq	ffa200eec7	added option to specify frames per second for the given audio representation (Encodec is 75Hz, DAC is 41Hz (at 24K sources))	2024-05-04 12:05:41 -05:00
mrq	c494894261	simple DDP wrapper (for my NVlink test)	2024-05-04 11:48:26 -05:00
mrq	a7b43b98b5	renamed cfg.bitsandbytes to cfg.optimizations (and having it serve as cfg.optimizations.bitsandbytes)	2024-05-02 20:08:59 -05:00
mrq	b5d1456a09	backwards compat for my shitty old weights (was testing if disabling AudioEmbedding summing magically made things better (it did not))	2024-04-29 22:14:01 -05:00
mrq	caad7ee3c9	final tweaks, hopefully	2024-04-28 22:28:29 -05:00
mrq	b251669536	forgot to fix up the test trainer	2024-04-21 14:58:04 -05:00
mrq	4f5c9e518a	actually use the passed-through sample rate from encode for DAC because it does its own resampling I guess	2024-04-18 13:32:41 -05:00
mrq	5ff2b4aab5	finally swallowing the Descript-Audio-Codec pill (I guess I'm going to have to regenerate my entire dataset)	2024-04-17 20:39:35 -05:00
mrq	b0bd88833c	refractor cleanup, had a revelation on how I can handle a batch of varying tasks	2024-04-16 21:04:48 -05:00
mrq	aa1e25fbf5	backwards compat for old YAMLs with `models`, option to set flash attention 2 for Llama (and derivatives), included `syncdoth/RetNet`s torchscale retnet for shits and grins, etc.	2024-04-16 10:02:31 -05:00
mrq	545162195b	deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things	2024-04-15 19:54:32 -05:00
mrq	d69a00e389	Properly pass retention_mask for retnet-HF, attempt to fix recurrent forward for retnet (doesn't work still)	2024-04-14 13:12:50 -05:00
mrq	f0c4baeb25	added Adagrad (experimenting with it), added 'extended' model size (16 layers instead of 12, experimenting with it)	2024-04-09 22:04:01 -05:00
mrq	4d75ee066c	actually do the Linear replacement with TE's Linear	2024-04-09 14:41:13 -05:00
mrq	9d97eb5104	added FP8 support through `NVIDIA/TransformerEngine`, added RetNet_HF through `syncdoth/RetNet` (as an alternative to branch away from torchscale)	2024-04-08 20:14:51 -05:00
mrq	7075c2a5f0	added an option to allow injecting embeddings from another model, because it dawned upon me how valuable embeddings from a good model can be for subsequent trainings (defined under cfg.models._embeddings as a relative path to the yaml)	2024-04-04 19:11:49 -05:00
mrq	91062361af	tweaks	2024-03-01 20:38:06 -06:00
mrq	f3c59c3e7e	cleaner replacement code (because I realized BitNet had an implementation for it too), added calculating gradient norm and performing gradient clipping in local trainer (non-deepspeed)	2024-03-01 20:18:43 -06:00
mrq	47435207f7	Added cfg.bitsandbytes.replace as a less intrusive alternative to cfg.bitsandbytes.inject to replace all Linear modules in a model	2024-03-01 19:20:10 -06:00
mrq	35d78a2bb0	Yet Another Underlying Transformer Implementation (BitNet, will give it a few days to see how it fares)	2024-02-29 20:29:17 -06:00
mrq	3da1518ace	added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work)	2024-01-31 21:48:36 -06:00
mrq	e799665759	experimental weighting of prom/resp embeds	2024-01-25 12:18:48 -06:00
mrq	c690aa509d	fixes and compat (MoE-fying an existing model and retraining from there just ruins it after a second of audio...)	2023-12-25 21:20:32 -06:00
mrq	0db3203b21	added LLaMA/Mixtral (if experts>1) model arches, utilize XMoE's loss as well, set MoE frequency to 1 to make every layer MoE'd for RetNet, etc. (going to do tests without burning out again to see how things go)	2023-12-22 19:27:36 -06:00
mrq	9c198eb75a	added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works)	2023-12-20 18:45:58 -06:00
mrq	ed54f4ebec	un 'experimental' the better target sequence preparation	2023-10-22 09:06:59 -05:00
mrq	09cda7d3f9	added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup	2023-10-16 19:30:38 -05:00
mrq	a539f6889f	mucked around with the loss calculation, this seems better?	2023-10-13 18:22:21 -05:00
mrq	08bae355eb	actually use langs from the dataloader	2023-10-11 21:21:50 -05:00
mrq	8740cdefc6	added initial support for languages (still testing, marked as model version 3), added experimental 'context extend by limiting the resp context' (untested)	2023-10-11 20:38:40 -05:00
mrq	7facacf7c9	separated samplers into its own file, don't bother copying the logits back to the GPU after sampling, it's not necessary	2023-10-11 12:25:31 -05:00
mrq	e727b6e5c1	changed dynamic temperature trigger to be a min-(n)ar-temp value between [0,(n)ar-temp), flags to set min temp, checkbox in web UI to request it	2023-10-10 17:02:33 -05:00
mrq	87db03dd93	trim the input prompt to 3 seconds when training NAR tasks (marked as experimental; the paper mentions doing so, but I don't know how much this would harm the retention heads)	2023-10-09 22:03:58 -05:00
mrq	27483e56f0	disabled preparing of SpeechX tasks, added dynamic temperature testing (to-do: test it, credited in the function)	2023-10-09 13:01:40 -05:00
mrq	777ba43305	oops	2023-10-03 15:01:37 -05:00
mrq	d12877ee09	added option to set probability of selecting the AR during training under a monolithic AR+NAR, added some more to-dos while I have them in mind	2023-10-02 16:52:42 -05:00
mrq	c0b25541e3	restructured some things with the model to remove dead weights	2023-09-20 19:10:59 -05:00
mrq	a6bfe43590	added mirostat sampling (given a partially trained model, it got far decent output than I expected, need to test on a better trained model)	2023-09-18 18:55:41 -05:00

1 2

68 Commits