vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	a5c90348d9	head hurt	2024-06-06 20:51:31 -05:00
mrq	516b0894d7	m	2024-06-06 19:41:26 -05:00
mrq	ee25d2e62e	removed the need to supply targ_list + different AudioEmbedding + other things	2024-06-06 18:52:41 -05:00
mrq	fcac9503e2	cleanup	2024-06-06 13:08:02 -05:00
mrq	b2194b859a	re-added loading multiple models because I'm now entertaining having split AR/NAR models again (and need a way to load both at once)	2024-06-06 09:48:43 -05:00
mrq	b05a905b95	ugh	2024-06-05 21:02:05 -05:00
mrq	4073656293	oops	2024-06-05 20:53:10 -05:00
mrq	ff6fe6f1bc	cleanup	2024-06-05 20:30:43 -05:00
mrq	880b4ecd1b	cleanup, putting some thoughts in comments before I forget about them	2024-06-05 19:50:06 -05:00
mrq	3cfc8a96bb	oops	2024-06-05 10:30:04 -05:00
mrq	48cd1054f9	madness	2024-06-04 23:48:51 -05:00
mrq	9e3f2e300f	experimental "just have a token for what rvq level we're on" that seems to help all models (mamba almost works, but it might just have to be relegated as a pure AR model)	2024-06-04 23:23:31 -05:00
mrq	e0886c5a78	re-added mamba as a possible non-experimental arch backend (test trainer will set it as AR only, doing any NAR tasks lobotomizes it)	2024-06-04 22:41:22 -05:00
mrq	687c71e028	disable accuracy calc because it breaks with actual batched training even though it shouldn't	2024-06-04 22:13:44 -05:00
mrq	d005e24953	oops	2024-06-04 22:10:04 -05:00
mrq	0f7f3ae754	added loss calc split and acc for experimental model	2024-06-04 22:04:40 -05:00
mrq	014e565c4b	tweaks	2024-06-04 20:41:13 -05:00
mrq	6d5bd0156a	fixes	2024-06-04 18:50:48 -05:00
mrq	ed3aeaf3a1	copy pasted from test to actual trainer	2024-06-04 18:40:30 -05:00
mrq	0aa01ba31a	forgot one crucial detail (you need the previous RVQ level to keep coherence between all RVQ levels) (experimental deinterleaved is a bit crusty though)	2024-06-04 18:30:30 -05:00
mrq	2ffad5cb6f	typo	2024-06-04 14:20:57 -05:00
mrq	406ff7bbe1	re-implemented config.model.interleave for the HF-compat experimental method	2024-06-04 14:19:52 -05:00
mrq	c93d5863fd	fixes	2024-06-04 00:07:00 -05:00
mrq	186b93a77e	oops	2024-06-03 22:35:55 -05:00
mrq	e50edc3b48	added a flag to convert to a HF compatible model on export by stitching things	2024-06-03 22:34:47 -05:00
mrq	934672252b	feverish cleanup	2024-06-03 21:28:49 -05:00
mrq	7feeb944a0	probably insane with even entertaining going this route	2024-06-03 20:26:27 -05:00
mrq	c2a436d368	somehow between training sessions grad_norm = None even though it worked before	2024-06-02 08:29:27 -05:00
mrq	c1fcd889d5	reverted automatically disabling split loss calc, since it seems that it's actually cacling loss on prom causes the oddities, maybe	2024-06-01 12:34:59 -05:00
mrq	8cf176ab46	ugh	2024-06-01 10:46:42 -05:00
mrq	827cf632e7	report current loss scale and adjust grad norm by loss scale (for deepspeed)	2024-06-01 10:44:32 -05:00
mrq	d0ebce6bac	ugh	2024-06-01 10:30:13 -05:00
mrq	39bc019142	actually save per-rank sampler states	2024-06-01 09:46:32 -05:00
mrq	74df2f5332	split sampler dict by global_rank, also handle splitting dataset paths by global_rank if sampler_type == path (because I do not trust DistributedSampler) (need to test)	2024-06-01 09:29:49 -05:00
mrq	31785f4eeb	actually don't default to compute split losses, test bitnet model doesn't seem to be doing things right (despite debug printouts showing theyre roughly the same logit/loss sequences, could just be bitnet linears being not up to par on actual models)	2024-06-01 09:12:51 -05:00
mrq	e9c87060df	oops	2024-05-31 22:22:28 -05:00
mrq	b482ca19ff	added model config option to set KV head count for MQA/GQA instead of MHA for llama-based models (i think its very negligible both ways on such a small model size)	2024-05-31 19:32:37 -05:00
mrq	e15c6c74c3	correctness	2024-05-30 20:50:45 -05:00
mrq	da473295b7	better way to compute per-segment losses	2024-05-28 19:29:54 -05:00
mrq	6c49ad06a3	forgot to reinclude mult by loss factors	2024-05-27 20:40:21 -05:00
mrq	b82f0d5c0c	finally nailed the issue that caused logging to break on one machine but not another (bitnet includes zetascale which is a parasite that will break logging)	2024-05-27 19:47:58 -05:00
mrq	c0ac84c795	uh	2024-05-27 19:05:56 -05:00
mrq	197d517181	ugh	2024-05-27 17:09:35 -05:00
mrq	5af6f41c94	added loss calcs against prom (requires the right settings for not shit results, disabled by default)	2024-05-27 08:43:00 -05:00
mrq	05cd8b797e	nevermind it breaks training	2024-05-25 18:03:43 -05:00
mrq	85f9684720	some cleanup	2024-05-25 17:46:52 -05:00
mrq	d760924719	added kludgy eval only so I don't have to start training, type eval, stop training, then delete the logs for that session	2024-05-25 17:39:51 -05:00
mrq	ddbacde0d1	DAC just doesn't work well enough......	2024-05-25 11:07:52 -05:00
mrq	e3ef89f5aa	100x better for subtrain/eval to be by group instead	2024-05-19 16:40:14 -05:00
mrq	458b95d196	added option to split between text loss and audio loss (to-do: document this better), because it may or may not be a problem with LLaMA-backed models because my loss hovers around 3.9 / 56% accuracy despite sounding decent at the moment	2024-05-19 11:23:56 -05:00
mrq	74e531d391	ugh	2024-05-18 12:02:56 -05:00
mrq	4bc7e5a6d1	fix loading without needing an hdf5 dataset already prepped (and some other incidental speedups during dataloader prep)	2024-05-18 07:14:26 -05:00
mrq	d88a5ca183	ugh	2024-05-16 07:25:33 -05:00
mrq	d9aabfa3ae	final tweaks, hopefully, again	2024-05-15 23:04:19 -05:00
mrq	8d79f78e0a	god I need to replace omegaconf	2024-05-12 14:01:52 -05:00
mrq	5eb5db7f7f	just don't use DAC 24Khz, it's bad	2024-05-12 13:41:17 -05:00
mrq	230da8b559	should be the final things to scramble around for, DAC's 24KHz model is unusable for this, but both encodec's 24KHz and DAC's 44KHz work	2024-05-12 13:22:08 -05:00
mrq	2437a86efa	ugh	2024-05-12 13:02:15 -05:00
mrq	4f1593c8db	a bunch of shit to salvage my old encodec-quantized audio because dac-encoded audio just does not want to converge	2024-05-12 10:17:29 -05:00
mrq	917eeb40d2	ughhh	2024-05-12 08:22:39 -05:00
mrq	9910c75d5a	checkpointing for bitnet impl	2024-05-12 07:52:54 -05:00
mrq	14709ac67f	ughh	2024-05-12 07:30:59 -05:00
mrq	3774fcbdee	ugh	2024-05-11 22:58:38 -05:00
mrq	856545f8bb	nan loss detection (should have added it earlier), loss scaling for local backend + fp16	2024-05-11 22:23:29 -05:00
mrq	a755eb3c62	ugh	2024-05-11 17:34:45 -05:00
mrq	88e9b9caff	local ddp fix	2024-05-11 17:29:01 -05:00
mrq	3337c69e5a	leverage between xformers and `torch.backends.cuda.sdp_kernel` for attention	2024-05-11 17:14:05 -05:00
mrq	d33c7bb7cf	ugh	2024-05-11 16:47:19 -05:00
mrq	0b6499601b	sanitizing	2024-05-11 16:31:05 -05:00
mrq	71e373064f	remove redundant loss, tweak readme	2024-05-11 15:02:47 -05:00
mrq	04a80d6b55	maybe it's better to be more explicit in deepspeed configs	2024-05-11 13:57:43 -05:00
mrq	4d93a16ef7	might just be better to explicitly define prompt duration ranges, especially under a "train small contexts then increase it" training paradigm	2024-05-11 09:50:54 -05:00
mrq	bd0a36ba8d	I swear I keep seeing tqdm flicker back a number	2024-05-10 18:36:01 -05:00
mrq	2109712e5b	resolve deprecation warning that doesn't show on my old training rig but does on my new one	2024-05-09 23:25:44 -05:00
mrq	1547de5020	haha...	2024-05-09 23:15:52 -05:00
mrq	b7bd885651	some possible sanity with deepspeed config	2024-05-09 22:48:42 -05:00
mrq	c4b696ebeb	oops	2024-05-09 22:33:40 -05:00
mrq	c22a177cf8	forgot to pass warmup to schedule free	2024-05-09 22:18:49 -05:00
mrq	b6131565ad	autotune?	2024-05-09 21:25:40 -05:00
mrq	6ed6ab8c03	a bit more cleanup for deepspeed ds_cfg creation	2024-05-09 21:00:26 -05:00
mrq	0d5d545a40	crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc.	2024-05-09 20:28:20 -05:00
mrq	c6e0f905b5	final tweaks (again) before training restarts	2024-05-08 02:11:38 -05:00
mrq	215800484d	correcting my wrong of assuming I could just use raw 24Khz audio in the 44Khz DAC without too much of an issue (there are issues)	2024-05-04 23:49:15 -05:00
mrq	9f738fbd5b	seems I actually don't need RVQ bins 9-32 with the 24Khz DAC model........ (time to requantize my audio...)	2024-05-04 23:09:18 -05:00
mrq	33b7f81b94	small cleanups	2024-05-04 22:37:22 -05:00
mrq	8aa1b2dabf	documentation update	2024-05-04 21:03:46 -05:00
mrq	253441b750	forgot to disable verbose flag	2024-05-04 13:13:52 -05:00
mrq	3dca1125f5	implemented xformers in HF's Llama (because theres no flash attention for Volta cards)	2024-05-04 13:07:45 -05:00
mrq	277dcec484	apparently I got an error for trying to serialize an errant tensor that made its way into the json, this could be remedied easily with recursively traversing the dict and coercing any objects to primitives, but I'm tired and I just want to start training and nap	2024-05-04 12:33:43 -05:00
mrq	ffa200eec7	added option to specify frames per second for the given audio representation (Encodec is 75Hz, DAC is 41Hz (at 24K sources))	2024-05-04 12:05:41 -05:00
mrq	c494894261	simple DDP wrapper (for my NVlink test)	2024-05-04 11:48:26 -05:00
mrq	a7b43b98b5	renamed cfg.bitsandbytes to cfg.optimizations (and having it serve as cfg.optimizations.bitsandbytes)	2024-05-02 20:08:59 -05:00
mrq	b5d1456a09	backwards compat for my shitty old weights (was testing if disabling AudioEmbedding summing magically made things better (it did not))	2024-04-29 22:14:01 -05:00
mrq	5120ffdda7	god it would be nice to know the best way to handle audio embeddings, because I genuinely don't know without skimming through papers or devoting X amount of GPU hours in training	2024-04-29 18:24:05 -05:00
mrq	6a11bc9cb6	update tokenizer because, for some reason, it had the wrong order for the special tokens to where eos = unk	2024-04-29 09:09:26 -05:00
mrq	57810e4ba4	metadata only path (might drop HDF5 since its giving file sizes twice as large as my actual unpacked dataset)	2024-04-28 23:03:09 -05:00
mrq	caad7ee3c9	final tweaks, hopefully	2024-04-28 22:28:29 -05:00
mrq	ffc334cf58	added dataset transcription helper script (now I don't ever have to touch ai-voice-cloning) (to-do: unify scripts into the module)	2024-04-21 17:43:20 -05:00
mrq	b251669536	forgot to fix up the test trainer	2024-04-21 14:58:04 -05:00
mrq	071fb97777	dataset preparation script updates, caved and am using HF tokenizer now	2024-04-21 14:49:18 -05:00
mrq	a8ffa88844	it slipped my mind that technically DAC can be used at any sample rate, since it models waveforms; make it a config YAML option to allow this behavior	2024-04-19 18:36:54 -05:00
mrq	8214aa23d7	converting over to a different intermediary dataset format	2024-04-18 21:24:06 -05:00
mrq	4f5c9e518a	actually use the passed-through sample rate from encode for DAC because it does its own resampling I guess	2024-04-18 13:32:41 -05:00
mrq	2e9e6e68f7	Forgot I need to use the DAC's 44K model because 24K model has 32 codebooks instead of 9.	2024-04-17 20:59:25 -05:00
mrq	5ff2b4aab5	finally swallowing the Descript-Audio-Codec pill (I guess I'm going to have to regenerate my entire dataset)	2024-04-17 20:39:35 -05:00
mrq	b0bd88833c	refractor cleanup, had a revelation on how I can handle a batch of varying tasks	2024-04-16 21:04:48 -05:00
mrq	467fa1c5ee	wrapper fixes	2024-04-16 10:19:02 -05:00
mrq	aa1e25fbf5	backwards compat for old YAMLs with `models`, option to set flash attention 2 for Llama (and derivatives), included `syncdoth/RetNet`s torchscale retnet for shits and grins, etc.	2024-04-16 10:02:31 -05:00
mrq	545162195b	deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things	2024-04-15 19:54:32 -05:00
mrq	d69a00e389	Properly pass retention_mask for retnet-HF, attempt to fix recurrent forward for retnet (doesn't work still)	2024-04-14 13:12:50 -05:00
mrq	789bb5d11b	add an optional label override for model loading (used for easy testing between 12/16/20/24 layered model)	2024-04-13 12:43:35 -05:00
mrq	f0c4baeb25	added Adagrad (experimenting with it), added 'extended' model size (16 layers instead of 12, experimenting with it)	2024-04-09 22:04:01 -05:00
mrq	4d75ee066c	actually do the Linear replacement with TE's Linear	2024-04-09 14:41:13 -05:00
mrq	9d97eb5104	added FP8 support through `NVIDIA/TransformerEngine`, added RetNet_HF through `syncdoth/RetNet` (as an alternative to branch away from torchscale)	2024-04-08 20:14:51 -05:00
mrq	7075c2a5f0	added an option to allow injecting embeddings from another model, because it dawned upon me how valuable embeddings from a good model can be for subsequent trainings (defined under cfg.models._embeddings as a relative path to the yaml)	2024-04-04 19:11:49 -05:00
mrq	91062361af	tweaks	2024-03-01 20:38:06 -06:00
mrq	f3c59c3e7e	cleaner replacement code (because I realized BitNet had an implementation for it too), added calculating gradient norm and performing gradient clipping in local trainer (non-deepspeed)	2024-03-01 20:18:43 -06:00
mrq	47435207f7	Added cfg.bitsandbytes.replace as a less intrusive alternative to cfg.bitsandbytes.inject to replace all Linear modules in a model	2024-03-01 19:20:10 -06:00
mrq	0427d8d076	logger broke for some reason, added flag to just tqdm.write instead, make cfg.bitsandbytes.bitnet==True yamls denoted since I'm sure they're not interoperable	2024-03-01 10:32:35 -06:00
mrq	35d78a2bb0	Yet Another Underlying Transformer Implementation (BitNet, will give it a few days to see how it fares)	2024-02-29 20:29:17 -06:00
mrq	3da1518ace	added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work)	2024-01-31 21:48:36 -06:00
mrq	cce929e136	nasty hotfix for transformer's Mixtral throwing an error when batch sizes > 1	2024-01-26 19:41:12 -06:00
mrq	e799665759	experimental weighting of prom/resp embeds	2024-01-25 12:18:48 -06:00
mrq	c690aa509d	fixes and compat (MoE-fying an existing model and retraining from there just ruins it after a second of audio...)	2023-12-25 21:20:32 -06:00
mrq	e513d2ef19	experts weren't forwarded into constructer (wasted a few days of training garbage)	2023-12-23 16:08:17 -06:00
mrq	0db3203b21	added LLaMA/Mixtral (if experts>1) model arches, utilize XMoE's loss as well, set MoE frequency to 1 to make every layer MoE'd for RetNet, etc. (going to do tests without burning out again to see how things go)	2023-12-22 19:27:36 -06:00
mrq	9c198eb75a	added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works)	2023-12-20 18:45:58 -06:00
mrq	6c51a629cc	resetting step count resets the samples processed and other metrics	2023-10-29 12:11:19 -05:00
mrq	0aa2a3cc07	evaluation/validation passes language ID during training (oops)	2023-10-29 12:00:40 -05:00
mrq	ed54f4ebec	un 'experimental' the better target sequence preparation	2023-10-22 09:06:59 -05:00
mrq	9a6040383e	make validation samplers ignore sampler type	2023-10-22 09:01:47 -05:00
mrq	32d4271ca8	fixed issue with training from scratch (oops)	2023-10-21 09:55:38 -05:00
mrq	3195026dba	fixed issue with the 'add another target audio to artificially create longer sequences' for HDF5 just duplicating the utterance initially sampled	2023-10-18 20:38:33 -05:00
mrq	09cda7d3f9	added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup	2023-10-16 19:30:38 -05:00
mrq	a539f6889f	mucked around with the loss calculation, this seems better?	2023-10-13 18:22:21 -05:00
mrq	fb467b19ba	exposed rolling resp context to the web UI, added passing in language to inferencing command line	2023-10-12 23:21:01 -05:00
mrq	298fd9a5f9	fixed issue with webui	2023-10-12 22:49:25 -05:00
mrq	65f500083d	tweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to work	2023-10-12 22:21:43 -05:00
mrq	08bae355eb	actually use langs from the dataloader	2023-10-11 21:21:50 -05:00
mrq	3af19d79fd	oops	2023-10-11 20:49:54 -05:00
mrq	8740cdefc6	added initial support for languages (still testing, marked as model version 3), added experimental 'context extend by limiting the resp context' (untested)	2023-10-11 20:38:40 -05:00
mrq	6045cbce94	added experimental option to append utterances for training target (emphasis on experimental)	2023-10-11 17:32:45 -05:00
mrq	7facacf7c9	separated samplers into its own file, don't bother copying the logits back to the GPU after sampling, it's not necessary	2023-10-11 12:25:31 -05:00
mrq	100dd164e6	apply phoneme cleanup in inferencing as well	2023-10-10 19:21:19 -05:00
mrq	b4405c98ea	remove double spaces in the text phonemes (might have caused problems.........)	2023-10-10 19:18:24 -05:00
mrq	47b3077415	fixed mirostat issue	2023-10-10 18:09:49 -05:00
mrq	99e980d323	documentation and more better-er attribution	2023-10-10 17:15:16 -05:00
mrq	e727b6e5c1	changed dynamic temperature trigger to be a min-(n)ar-temp value between [0,(n)ar-temp), flags to set min temp, checkbox in web UI to request it	2023-10-10 17:02:33 -05:00
mrq	ec25f56bd9	used torch.max fixes things, somehow, for dynamic temp sampling	2023-10-10 16:42:24 -05:00
mrq	87db03dd93	trim the input prompt to 3 seconds when training NAR tasks (marked as experimental; the paper mentions doing so, but I don't know how much this would harm the retention heads)	2023-10-09 22:03:58 -05:00

1 2 3 4 5 ...

372 Commits