vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	5120ffdda7	god it would be nice to know the best way to handle audio embeddings, because I genuinely don't know without skimming through papers or devoting X amount of GPU hours in training	2024-04-29 18:24:05 -05:00
mrq	6a11bc9cb6	update tokenizer because, for some reason, it had the wrong order for the special tokens to where eos = unk	2024-04-29 09:09:26 -05:00
mrq	57810e4ba4	metadata only path (might drop HDF5 since its giving file sizes twice as large as my actual unpacked dataset)	2024-04-28 23:03:09 -05:00
mrq	caad7ee3c9	final tweaks, hopefully	2024-04-28 22:28:29 -05:00
mrq	ffc334cf58	added dataset transcription helper script (now I don't ever have to touch ai-voice-cloning) (to-do: unify scripts into the module)	2024-04-21 17:43:20 -05:00
mrq	b251669536	forgot to fix up the test trainer	2024-04-21 14:58:04 -05:00
mrq	071fb97777	dataset preparation script updates, caved and am using HF tokenizer now	2024-04-21 14:49:18 -05:00
mrq	a8ffa88844	it slipped my mind that technically DAC can be used at any sample rate, since it models waveforms; make it a config YAML option to allow this behavior	2024-04-19 18:36:54 -05:00
mrq	00804a47e9	Forgot to copy intermediary dataset conversion script	2024-04-18 21:34:28 -05:00
mrq	8214aa23d7	converting over to a different intermediary dataset format	2024-04-18 21:24:06 -05:00
mrq	4f5c9e518a	actually use the passed-through sample rate from encode for DAC because it does its own resampling I guess	2024-04-18 13:32:41 -05:00
mrq	2e9e6e68f7	Forgot I need to use the DAC's 44K model because 24K model has 32 codebooks instead of 9.	2024-04-17 20:59:25 -05:00
mrq	5ff2b4aab5	finally swallowing the Descript-Audio-Codec pill (I guess I'm going to have to regenerate my entire dataset)	2024-04-17 20:39:35 -05:00
mrq	b0bd88833c	refractor cleanup, had a revelation on how I can handle a batch of varying tasks	2024-04-16 21:04:48 -05:00
mrq	467fa1c5ee	wrapper fixes	2024-04-16 10:19:02 -05:00
mrq	aa1e25fbf5	backwards compat for old YAMLs with `models`, option to set flash attention 2 for Llama (and derivatives), included `syncdoth/RetNet`s torchscale retnet for shits and grins, etc.	2024-04-16 10:02:31 -05:00
mrq	545162195b	deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things	2024-04-15 19:54:32 -05:00
mrq	d69a00e389	Properly pass retention_mask for retnet-HF, attempt to fix recurrent forward for retnet (doesn't work still)	2024-04-14 13:12:50 -05:00
mrq	789bb5d11b	add an optional label override for model loading (used for easy testing between 12/16/20/24 layered model)	2024-04-13 12:43:35 -05:00
mrq	f0c4baeb25	added Adagrad (experimenting with it), added 'extended' model size (16 layers instead of 12, experimenting with it)	2024-04-09 22:04:01 -05:00
mrq	4d75ee066c	actually do the Linear replacement with TE's Linear	2024-04-09 14:41:13 -05:00
mrq	9d97eb5104	added FP8 support through `NVIDIA/TransformerEngine`, added RetNet_HF through `syncdoth/RetNet` (as an alternative to branch away from torchscale)	2024-04-08 20:14:51 -05:00
mrq	7075c2a5f0	added an option to allow injecting embeddings from another model, because it dawned upon me how valuable embeddings from a good model can be for subsequent trainings (defined under cfg.models._embeddings as a relative path to the yaml)	2024-04-04 19:11:49 -05:00
mrq	91062361af	tweaks	2024-03-01 20:38:06 -06:00
mrq	f3c59c3e7e	cleaner replacement code (because I realized BitNet had an implementation for it too), added calculating gradient norm and performing gradient clipping in local trainer (non-deepspeed)	2024-03-01 20:18:43 -06:00
mrq	47435207f7	Added cfg.bitsandbytes.replace as a less intrusive alternative to cfg.bitsandbytes.inject to replace all Linear modules in a model	2024-03-01 19:20:10 -06:00
mrq	0427d8d076	logger broke for some reason, added flag to just tqdm.write instead, make cfg.bitsandbytes.bitnet==True yamls denoted since I'm sure they're not interoperable	2024-03-01 10:32:35 -06:00
mrq	35d78a2bb0	Yet Another Underlying Transformer Implementation (BitNet, will give it a few days to see how it fares)	2024-02-29 20:29:17 -06:00
mrq	3da1518ace	added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work)	2024-01-31 21:48:36 -06:00
mrq	cce929e136	nasty hotfix for transformer's Mixtral throwing an error when batch sizes > 1	2024-01-26 19:41:12 -06:00
mrq	e799665759	experimental weighting of prom/resp embeds	2024-01-25 12:18:48 -06:00
mrq	c690aa509d	fixes and compat (MoE-fying an existing model and retraining from there just ruins it after a second of audio...)	2023-12-25 21:20:32 -06:00
mrq	e513d2ef19	experts weren't forwarded into constructer (wasted a few days of training garbage)	2023-12-23 16:08:17 -06:00
mrq	0db3203b21	added LLaMA/Mixtral (if experts>1) model arches, utilize XMoE's loss as well, set MoE frequency to 1 to make every layer MoE'd for RetNet, etc. (going to do tests without burning out again to see how things go)	2023-12-22 19:27:36 -06:00
mrq	9c198eb75a	added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works)	2023-12-20 18:45:58 -06:00
mrq	6c51a629cc	resetting step count resets the samples processed and other metrics	2023-10-29 12:11:19 -05:00
mrq	0aa2a3cc07	evaluation/validation passes language ID during training (oops)	2023-10-29 12:00:40 -05:00
mrq	ed54f4ebec	un 'experimental' the better target sequence preparation	2023-10-22 09:06:59 -05:00
mrq	9a6040383e	make validation samplers ignore sampler type	2023-10-22 09:01:47 -05:00
mrq	32d4271ca8	fixed issue with training from scratch (oops)	2023-10-21 09:55:38 -05:00
mrq	3195026dba	fixed issue with the 'add another target audio to artificially create longer sequences' for HDF5 just duplicating the utterance initially sampled	2023-10-18 20:38:33 -05:00
mrq	09cda7d3f9	added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup	2023-10-16 19:30:38 -05:00
mrq	a539f6889f	mucked around with the loss calculation, this seems better?	2023-10-13 18:22:21 -05:00
mrq	fb467b19ba	exposed rolling resp context to the web UI, added passing in language to inferencing command line	2023-10-12 23:21:01 -05:00
mrq	298fd9a5f9	fixed issue with webui	2023-10-12 22:49:25 -05:00
mrq	65f500083d	tweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to work	2023-10-12 22:21:43 -05:00
mrq	08bae355eb	actually use langs from the dataloader	2023-10-11 21:21:50 -05:00
mrq	3af19d79fd	oops	2023-10-11 20:49:54 -05:00
mrq	8740cdefc6	added initial support for languages (still testing, marked as model version 3), added experimental 'context extend by limiting the resp context' (untested)	2023-10-11 20:38:40 -05:00
mrq	6045cbce94	added experimental option to append utterances for training target (emphasis on experimental)	2023-10-11 17:32:45 -05:00

1 2 3 4 5 ...

289 Commits