vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	d04f6911b4	oops	2024-08-08 19:38:55 -05:00
mrq	949339a3fa	do not include SDPA attention if there's no available SDPA backends	2024-08-06 20:42:39 -05:00
mrq	7cdfa3dc0c	updated process_datasets.py, added argparsing so I can mostly stop manually editing things, and some other cleanup	2024-08-05 15:59:25 -05:00
mrq	debcc93e7e	add adapted MixtralAttention for when I make a bad decision to actually train a MoE	2024-08-04 22:03:22 -05:00
mrq	10aaf840e7	added export option to convert Llama to MixtralMoE for another dumb experiment	2024-08-04 20:25:06 -05:00
mrq	3a65cc4b22	fix issue with sft and shared tensors...	2024-08-04 19:56:21 -05:00
mrq	23f3b56fda	oops	2024-08-04 08:18:57 -05:00
mrq	6a733eb2ed	changed torch.Tensor().to(device, dtype) to just torch.tensor(..., device, dtype) because it's been bothering my autism that I'm creating tensors then converting rather than creating with the right device/dtype, some 'optimization' to compile the model but it doesnt seem to do anything useful	2024-08-03 22:10:21 -05:00
mrq	d0a5c7eca2	more coping with the NAR len	2024-08-03 20:23:36 -05:00
mrq	11fa3da665	some cleanup, fixed the wrapper attention to explicitly use other sdpa backends	2024-08-03 19:51:00 -05:00
mrq	9564ecda43	wrapper attention class for other sdpa backends + xformers seems to have broke...	2024-08-03 15:12:11 -05:00
mrq	9e1989be1b	tweaked initial NAR pass's initial token embeddings to use a different value, or osmething	2024-08-03 09:01:37 -05:00
mrq	26f74c5739	somehow fixed non-unified position IDs for the NAR-len	2024-08-03 08:43:42 -05:00
mrq	66407e5bdb	tweaks for the NAR-len model, maybe	2024-08-03 08:40:39 -05:00
mrq	97c5241bef	fixes, throw an exception when using NAR only model with non-unified position IDs, since for some reason it outputs garbage for the NAR	2024-08-02 22:25:49 -05:00
mrq	443422ecb5	ugh, finally got some form of offloading working (need to test if it works on different GPUs, but GPU and CPU offloading seems to work in the test trainer)	2024-08-01 22:43:39 -05:00
mrq	c9ec6b28ef	it actually wasn't working because Engines.__init__() automatically moves the entire module to the requested device, which was being called after offloading the model in the test trainer (and it seems I cant do it without injecting a bunch of shit in modeling_llama.py)	2024-08-01 20:56:28 -05:00
mrq	b4c895114c	naive model offloading support (handles automatically splitting parts of the model to requested device per memory constraints, either inferred or requested in the yaml, input tensors are automatically migrated to the right device, it SEEMS to work for training under the test trainer when split between GPU and CPU) (this was specifically only because that Flux imagegen model released so I can test it there)	2024-08-01 20:12:06 -05:00
mrq	387358bc8a	fixes for the NAR-len model, and documentation some config options, and a better way to handle resizing modules on state_dict load	2024-07-31 20:35:09 -05:00
mrq	07f8e2ad06	added option to set the causal size (how many tokens to sample per AR step), but requires the model to be trained for this (which explains why recurrent chunk sampling just doesn't work for the retnet tests, obvious in hindsight)	2024-07-30 20:53:51 -05:00
mrq	ebf848d249	possible speedup for samplers that require a list of previous tokens (the DRY sampler made me realize that I should copy the tolist() thing from the rep pen sampler for everything else)	2024-07-29 20:23:26 -05:00
mrq	55b0121b1a	trying (and failing) to nail a weird regression in fancier attentions	2024-07-29 19:53:37 -05:00
mrq	c2f5b916fc	added what I think is DRY sampling	2024-07-29 19:15:07 -05:00
mrq	ce8bb1e4f7	sanity cleanups with weird off-by-one-ness, cleaned up and validated vall_e.models.experimental works again	2024-07-27 15:36:05 -05:00
mrq	06e948aec1	suppress warning on exit about distributed not being cleaned up (because I updated my system)	2024-07-25 16:50:47 -05:00
mrq	1acb0e9c84	added experimental training setting to perform token dropout to MAYBE compensate for errors from the preceding RVQ level (two types: token error offset, token dropout embedding replace)	2024-07-24 19:35:17 -05:00
mrq	188d116222	some weird fixes for an equally weird regression with LoRA loading	2024-07-22 20:47:24 -05:00
mrq	75b04686f8	added prom-less training / inferencing, some other things	2024-07-22 19:36:07 -05:00
mrq	e19aa643a6	cleaned up demo page creation, added option to pass in RVQ level sampling distribution for training	2024-07-21 19:12:03 -05:00
mrq	d87b492295	added rudimentary demo page creator (currently just embeds base64 wavs into the page, need to test not doing that)	2024-07-19 20:49:40 -05:00
mrq	d53038a9e4	actually have split classifiers working	2024-07-19 15:33:31 -05:00
mrq	28a674e0f1	fixes...	2024-07-18 23:25:32 -05:00
mrq	39f961abcd	test trainer (vall_e.models.ar_nar) tests some SpeechX features	2024-07-18 18:46:45 -05:00
mrq	83a0954f85	fixes for re-introducing SpeechX tasks (need to actually validate if these all do the right things)	2024-07-18 17:16:32 -05:00
mrq	97e768601c	re-introducing SpeechX tasks (need to validate them all, everything works with base tts anyways)	2024-07-18 16:16:14 -05:00
mrq	c2b8035e74	oops, kept forgetting to actually pass in lang/tone tokens (despite not really using these at the moment)	2024-07-18 14:18:34 -05:00
mrq	22fe53508c	added experimental disjointed position IDs (because I think this might help because technically a sequence is made up of several parts, and the position embeddings shouldn't be unified)	2024-07-16 19:52:41 -05:00
mrq	fe0f235335	mechanism to store the model config inside the weights and load them, some other things to allow LoRA training on the RetNet (gradient checkpointing will gripe about inputs not having require_grad and nothing seems to remedy it)	2024-07-16 18:23:13 -05:00
mrq	3acc54df22	allow loading a different model within the web ui (apparently I did not have the web UI in the documentation)	2024-07-15 19:59:48 -05:00
mrq	7b210d9738	sanity cleanup	2024-07-04 15:58:08 -05:00
mrq	f770467eb3	stuff	2024-07-01 18:13:29 -05:00
mrq	dced595391	more cleanup	2024-06-30 11:00:12 -05:00
mrq	bc2a6fa756	sanity cleanup: moved experimental features under its own thing	2024-06-30 10:37:33 -05:00
mrq	b21f74a5c5	added summing of external embeddings (at this point i dont think any amount of cope bandaids will get DAC to train nicely, I think the RVQ levels the NAR tends add too much noise if they're not accurate)	2024-06-29 23:42:30 -05:00
mrq	793ccb16fb	ugh	2024-06-29 22:14:35 -05:00
mrq	2808f881c8	cleaned up subjugated audio embedding into a flag, flag can also have it include the original, underlying embedding as well (it seems to do better when set to inclusive)	2024-06-29 21:46:35 -05:00
mrq	ec5eaebcbc	experimental method of using DACs quantizer ""embeddings"" to see if it helps with model quality	2024-06-29 19:46:11 -05:00
mrq	a8718d35a4	nasty bandaid because some of my DAC dataset only has 8 RVQ levels instead of the full 9	2024-06-29 10:16:37 -05:00
mrq	591d3ac848	have eval dataloader use eval batch size for batchedordersampler	2024-06-28 22:44:00 -05:00
mrq	83075c1505	sort duration buckets to ensure that paths sorted-by-duration are actually sorted by duration (because i didnt know that python dicts can have non-strings as keys), added batching samples based on total duration to ensure best training throughput	2024-06-28 22:28:54 -05:00
mrq	8a986eb480	load exported LoRA weights if exists (to-do: make a better LoRA loading mechanism)	2024-06-18 21:45:46 -05:00
mrq	2bfe786ebd	ban stop token for NAR levels (because sometimes it gets sampled and causes problems)	2024-06-17 22:14:43 -05:00
mrq	7cfb78fa64	enable LoRA for targetted RVQ levels (to experiment with, seems to help)	2024-06-17 21:45:03 -05:00
mrq	7047fcc6e2	actually make deepspeed work with LoRAs	2024-06-17 13:55:37 -05:00
mrq	1d159b1476	updated export routine to split LoRA weights from the state dict (should work with deepspeed)	2024-06-17 13:28:18 -05:00
mrq	bd0bc10ec0	added LoRA policy to decide what layer of the model gets adapted based on simple inclusion/exclusion terms	2024-06-17 13:05:06 -05:00
mrq	be051d9544	added other LoRA method using parametrization rather than linear injection	2024-06-17 09:58:34 -05:00
mrq	45a39fb79f	very rudimentary lora support (no deepspeed support, tested training and saving but not loading yet)	2024-06-17 00:09:16 -05:00
mrq	19410a919e	ugh	2024-06-15 12:29:03 -05:00
mrq	d343bde09b	residual_in_fp32=False for mamba arch backends because it breaks the classifier (output projection / lm head / what-have-you) under AMP	2024-06-15 12:08:03 -05:00
mrq	ccb14c06ef	mamba2-hf using `vasqu/mamba2-torch` because it lets me use mamba2 without triton ops (training with my 4xV100s are not happy with mamba2 because of triton)	2024-06-14 19:42:17 -05:00
mrq	83eab4fa59	actually going for the suggested "2x layers, no intermediate scaling" is wrong for VALL-E, directly copying the normal transformer structure fixes mamba2 performance in the test trainer	2024-06-13 20:08:22 -05:00
mrq	26da24fd8d	mamba updated to fix that pesky NaN error during training	2024-06-13 12:38:33 -05:00
mrq	bcf3910a17	the NAR only dream is dead (it just won't work)	2024-06-12 19:49:47 -05:00
mrq	a9353cf9fa	ugh	2024-06-12 00:14:29 -05:00
mrq	cca542a4c0	ugh	2024-06-11 23:59:28 -05:00
mrq	65a8960305	option to split classifier per-level instead of sharing one (at this point I'm just scrambling to try and cope with training a DAC model, the NAR is being a pain)	2024-06-11 22:28:59 -05:00
mrq	a7a6e0ac76	validated that inferencing works, changed some defaults (NAR benefits from greedy sampling)	2024-06-09 17:11:38 -05:00
mrq	132a02c48b	sanity cleanup, backup config yaml for each log file	2024-06-09 11:22:52 -05:00
mrq	80f9530840	ugh	2024-06-09 01:43:44 -05:00
mrq	5c732b72ee	ugh	2024-06-08 20:34:00 -05:00
mrq	8d068fa3f9	reticulating splines	2024-06-08 20:30:15 -05:00
mrq	b072f9b96b	fixes	2024-06-08 16:01:34 -05:00
mrq	58fb0a84db	added experimental NAR only model (inferences text length, need more experimenting), AudioEmbedding logic cleanup (I still think it's being done wrong)	2024-06-08 15:42:02 -05:00
mrq	7d6fff24f9	un-tensor'd quant_level marker since it doesn't need to be one (I forgot why I had it as one but nothing seems to need it as a tensor that didn't already make it one)	2024-06-07 20:46:22 -05:00
mrq	b0158a61d5	fixed some logic errors with training (grabbing wrong quant level...)	2024-06-07 20:34:36 -05:00
mrq	eafa622be2	I forgot the actual reason I was cleaning things up was to re-include prom loss calculation (I realized the reason I did this was because of an prom embedding oversight, it seems to work now)	2024-06-07 20:29:25 -05:00
mrq	f9f309281a	ugh	2024-06-06 20:55:27 -05:00
mrq	a5c90348d9	head hurt	2024-06-06 20:51:31 -05:00
mrq	516b0894d7	m	2024-06-06 19:41:26 -05:00
mrq	ee25d2e62e	removed the need to supply targ_list + different AudioEmbedding + other things	2024-06-06 18:52:41 -05:00
mrq	fcac9503e2	cleanup	2024-06-06 13:08:02 -05:00
mrq	b2194b859a	re-added loading multiple models because I'm now entertaining having split AR/NAR models again (and need a way to load both at once)	2024-06-06 09:48:43 -05:00
mrq	b05a905b95	ugh	2024-06-05 21:02:05 -05:00
mrq	4073656293	oops	2024-06-05 20:53:10 -05:00
mrq	ff6fe6f1bc	cleanup	2024-06-05 20:30:43 -05:00
mrq	880b4ecd1b	cleanup, putting some thoughts in comments before I forget about them	2024-06-05 19:50:06 -05:00
mrq	3cfc8a96bb	oops	2024-06-05 10:30:04 -05:00
mrq	48cd1054f9	madness	2024-06-04 23:48:51 -05:00
mrq	9e3f2e300f	experimental "just have a token for what rvq level we're on" that seems to help all models (mamba almost works, but it might just have to be relegated as a pure AR model)	2024-06-04 23:23:31 -05:00
mrq	e0886c5a78	re-added mamba as a possible non-experimental arch backend (test trainer will set it as AR only, doing any NAR tasks lobotomizes it)	2024-06-04 22:41:22 -05:00
mrq	687c71e028	disable accuracy calc because it breaks with actual batched training even though it shouldn't	2024-06-04 22:13:44 -05:00
mrq	d005e24953	oops	2024-06-04 22:10:04 -05:00
mrq	0f7f3ae754	added loss calc split and acc for experimental model	2024-06-04 22:04:40 -05:00
mrq	014e565c4b	tweaks	2024-06-04 20:41:13 -05:00
mrq	6d5bd0156a	fixes	2024-06-04 18:50:48 -05:00
mrq	ed3aeaf3a1	copy pasted from test to actual trainer	2024-06-04 18:40:30 -05:00
mrq	0aa01ba31a	forgot one crucial detail (you need the previous RVQ level to keep coherence between all RVQ levels) (experimental deinterleaved is a bit crusty though)	2024-06-04 18:30:30 -05:00
mrq	2ffad5cb6f	typo	2024-06-04 14:20:57 -05:00
mrq	406ff7bbe1	re-implemented config.model.interleave for the HF-compat experimental method	2024-06-04 14:19:52 -05:00
mrq	c93d5863fd	fixes	2024-06-04 00:07:00 -05:00
mrq	934672252b	feverish cleanup	2024-06-03 21:28:49 -05:00
mrq	7feeb944a0	probably insane with even entertaining going this route	2024-06-03 20:26:27 -05:00
mrq	b482ca19ff	added model config option to set KV head count for MQA/GQA instead of MHA for llama-based models (i think its very negligible both ways on such a small model size)	2024-05-31 19:32:37 -05:00
mrq	e15c6c74c3	correctness	2024-05-30 20:50:45 -05:00
mrq	da473295b7	better way to compute per-segment losses	2024-05-28 19:29:54 -05:00
mrq	6c49ad06a3	forgot to reinclude mult by loss factors	2024-05-27 20:40:21 -05:00
mrq	b82f0d5c0c	finally nailed the issue that caused logging to break on one machine but not another (bitnet includes zetascale which is a parasite that will break logging)	2024-05-27 19:47:58 -05:00
mrq	c0ac84c795	uh	2024-05-27 19:05:56 -05:00
mrq	197d517181	ugh	2024-05-27 17:09:35 -05:00
mrq	5af6f41c94	added loss calcs against prom (requires the right settings for not shit results, disabled by default)	2024-05-27 08:43:00 -05:00
mrq	ddbacde0d1	DAC just doesn't work well enough......	2024-05-25 11:07:52 -05:00
mrq	e3ef89f5aa	100x better for subtrain/eval to be by group instead	2024-05-19 16:40:14 -05:00
mrq	458b95d196	added option to split between text loss and audio loss (to-do: document this better), because it may or may not be a problem with LLaMA-backed models because my loss hovers around 3.9 / 56% accuracy despite sounding decent at the moment	2024-05-19 11:23:56 -05:00
mrq	917eeb40d2	ughhh	2024-05-12 08:22:39 -05:00
mrq	9910c75d5a	checkpointing for bitnet impl	2024-05-12 07:52:54 -05:00
mrq	14709ac67f	ughh	2024-05-12 07:30:59 -05:00
mrq	a755eb3c62	ugh	2024-05-11 17:34:45 -05:00
mrq	88e9b9caff	local ddp fix	2024-05-11 17:29:01 -05:00
mrq	3337c69e5a	leverage between xformers and `torch.backends.cuda.sdp_kernel` for attention	2024-05-11 17:14:05 -05:00
mrq	d33c7bb7cf	ugh	2024-05-11 16:47:19 -05:00
mrq	0b6499601b	sanitizing	2024-05-11 16:31:05 -05:00
mrq	2109712e5b	resolve deprecation warning that doesn't show on my old training rig but does on my new one	2024-05-09 23:25:44 -05:00
mrq	1547de5020	haha...	2024-05-09 23:15:52 -05:00
mrq	0d5d545a40	crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc.	2024-05-09 20:28:20 -05:00
mrq	33b7f81b94	small cleanups	2024-05-04 22:37:22 -05:00
mrq	253441b750	forgot to disable verbose flag	2024-05-04 13:13:52 -05:00
mrq	3dca1125f5	implemented xformers in HF's Llama (because theres no flash attention for Volta cards)	2024-05-04 13:07:45 -05:00
mrq	ffa200eec7	added option to specify frames per second for the given audio representation (Encodec is 75Hz, DAC is 41Hz (at 24K sources))	2024-05-04 12:05:41 -05:00
mrq	c494894261	simple DDP wrapper (for my NVlink test)	2024-05-04 11:48:26 -05:00
mrq	a7b43b98b5	renamed cfg.bitsandbytes to cfg.optimizations (and having it serve as cfg.optimizations.bitsandbytes)	2024-05-02 20:08:59 -05:00
mrq	b5d1456a09	backwards compat for my shitty old weights (was testing if disabling AudioEmbedding summing magically made things better (it did not))	2024-04-29 22:14:01 -05:00
mrq	5120ffdda7	god it would be nice to know the best way to handle audio embeddings, because I genuinely don't know without skimming through papers or devoting X amount of GPU hours in training	2024-04-29 18:24:05 -05:00
mrq	caad7ee3c9	final tweaks, hopefully	2024-04-28 22:28:29 -05:00
mrq	b251669536	forgot to fix up the test trainer	2024-04-21 14:58:04 -05:00
mrq	4f5c9e518a	actually use the passed-through sample rate from encode for DAC because it does its own resampling I guess	2024-04-18 13:32:41 -05:00
mrq	5ff2b4aab5	finally swallowing the Descript-Audio-Codec pill (I guess I'm going to have to regenerate my entire dataset)	2024-04-17 20:39:35 -05:00
mrq	b0bd88833c	refractor cleanup, had a revelation on how I can handle a batch of varying tasks	2024-04-16 21:04:48 -05:00
mrq	467fa1c5ee	wrapper fixes	2024-04-16 10:19:02 -05:00
mrq	aa1e25fbf5	backwards compat for old YAMLs with `models`, option to set flash attention 2 for Llama (and derivatives), included `syncdoth/RetNet`s torchscale retnet for shits and grins, etc.	2024-04-16 10:02:31 -05:00
mrq	545162195b	deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things	2024-04-15 19:54:32 -05:00
mrq	d69a00e389	Properly pass retention_mask for retnet-HF, attempt to fix recurrent forward for retnet (doesn't work still)	2024-04-14 13:12:50 -05:00
mrq	f0c4baeb25	added Adagrad (experimenting with it), added 'extended' model size (16 layers instead of 12, experimenting with it)	2024-04-09 22:04:01 -05:00
mrq	4d75ee066c	actually do the Linear replacement with TE's Linear	2024-04-09 14:41:13 -05:00
mrq	9d97eb5104	added FP8 support through `NVIDIA/TransformerEngine`, added RetNet_HF through `syncdoth/RetNet` (as an alternative to branch away from torchscale)	2024-04-08 20:14:51 -05:00
mrq	7075c2a5f0	added an option to allow injecting embeddings from another model, because it dawned upon me how valuable embeddings from a good model can be for subsequent trainings (defined under cfg.models._embeddings as a relative path to the yaml)	2024-04-04 19:11:49 -05:00
mrq	91062361af	tweaks	2024-03-01 20:38:06 -06:00
mrq	f3c59c3e7e	cleaner replacement code (because I realized BitNet had an implementation for it too), added calculating gradient norm and performing gradient clipping in local trainer (non-deepspeed)	2024-03-01 20:18:43 -06:00
mrq	47435207f7	Added cfg.bitsandbytes.replace as a less intrusive alternative to cfg.bitsandbytes.inject to replace all Linear modules in a model	2024-03-01 19:20:10 -06:00
mrq	35d78a2bb0	Yet Another Underlying Transformer Implementation (BitNet, will give it a few days to see how it fares)	2024-02-29 20:29:17 -06:00
mrq	3da1518ace	added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work)	2024-01-31 21:48:36 -06:00
mrq	cce929e136	nasty hotfix for transformer's Mixtral throwing an error when batch sizes > 1	2024-01-26 19:41:12 -06:00
mrq	e799665759	experimental weighting of prom/resp embeds	2024-01-25 12:18:48 -06:00
mrq	c690aa509d	fixes and compat (MoE-fying an existing model and retraining from there just ruins it after a second of audio...)	2023-12-25 21:20:32 -06:00
mrq	e513d2ef19	experts weren't forwarded into constructer (wasted a few days of training garbage)	2023-12-23 16:08:17 -06:00
mrq	0db3203b21	added LLaMA/Mixtral (if experts>1) model arches, utilize XMoE's loss as well, set MoE frequency to 1 to make every layer MoE'd for RetNet, etc. (going to do tests without burning out again to see how things go)	2023-12-22 19:27:36 -06:00
mrq	9c198eb75a	added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works)	2023-12-20 18:45:58 -06:00
mrq	ed54f4ebec	un 'experimental' the better target sequence preparation	2023-10-22 09:06:59 -05:00
mrq	9a6040383e	make validation samplers ignore sampler type	2023-10-22 09:01:47 -05:00
mrq	09cda7d3f9	added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup	2023-10-16 19:30:38 -05:00
mrq	a539f6889f	mucked around with the loss calculation, this seems better?	2023-10-13 18:22:21 -05:00
mrq	65f500083d	tweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to work	2023-10-12 22:21:43 -05:00
mrq	08bae355eb	actually use langs from the dataloader	2023-10-11 21:21:50 -05:00
mrq	3af19d79fd	oops	2023-10-11 20:49:54 -05:00
mrq	8740cdefc6	added initial support for languages (still testing, marked as model version 3), added experimental 'context extend by limiting the resp context' (untested)	2023-10-11 20:38:40 -05:00
mrq	7facacf7c9	separated samplers into its own file, don't bother copying the logits back to the GPU after sampling, it's not necessary	2023-10-11 12:25:31 -05:00
mrq	47b3077415	fixed mirostat issue	2023-10-10 18:09:49 -05:00
mrq	99e980d323	documentation and more better-er attribution	2023-10-10 17:15:16 -05:00
mrq	e727b6e5c1	changed dynamic temperature trigger to be a min-(n)ar-temp value between [0,(n)ar-temp), flags to set min temp, checkbox in web UI to request it	2023-10-10 17:02:33 -05:00
mrq	ec25f56bd9	used torch.max fixes things, somehow, for dynamic temp sampling	2023-10-10 16:42:24 -05:00
mrq	87db03dd93	trim the input prompt to 3 seconds when training NAR tasks (marked as experimental; the paper mentions doing so, but I don't know how much this would harm the retention heads)	2023-10-09 22:03:58 -05:00
mrq	26fbb92ec6	reduced dynamic temperature threshold to > 1.0, as it seems to not quite be useful for audio LMs, sped up any sampling that touches logits by copying them to CPU first, as accessing tensors on the GPU is slow as balls)	2023-10-09 14:46:17 -05:00
mrq	27483e56f0	disabled preparing of SpeechX tasks, added dynamic temperature testing (to-do: test it, credited in the function)	2023-10-09 13:01:40 -05:00
mrq	2deb995cc9	updated setup script	2023-10-06 20:08:28 -05:00
mrq	63cc9cf37a	added compat flags for torchscale because the maintainer for torchscale broke compat for existing models	2023-10-05 16:39:46 -05:00
mrq	777ba43305	oops	2023-10-03 15:01:37 -05:00
mrq	d12877ee09	added option to set probability of selecting the AR during training under a monolithic AR+NAR, added some more to-dos while I have them in mind	2023-10-02 16:52:42 -05:00
mrq	c0b25541e3	restructured some things with the model to remove dead weights	2023-09-20 19:10:59 -05:00
mrq	a6bfe43590	added mirostat sampling (given a partially trained model, it got far decent output than I expected, need to test on a better trained model)	2023-09-18 18:55:41 -05:00
mrq	2567e082b5	UGH	2023-09-16 00:26:13 -05:00
mrq	22ffaf3a33	have loss for the NAR not-ignore the text prompt, I imagine this should help the NAR and explain why it's always had a bit of an issue with training	2023-09-15 19:08:44 -05:00
mrq	4aef798135	added picking final candidate based on sum of score instead of first candidate (this changes nothing).	2023-09-13 13:19:11 -05:00
mrq	23a5fdd645	implemented a naive beam search (I really should be taking a break)	2023-09-12 21:28:07 -05:00
mrq	a6ae344e5b	some comments	2023-09-12 16:04:45 -05:00
mrq	d07c63b9d8	unified more things with training the AR+NAR monolothic model	2023-09-12 15:54:41 -05:00
mrq	40ef34e1ca	this embedding class definitely works, and migrating from the previous embedding weights seems to work.	2023-09-11 14:13:42 -05:00
mrq	a1f250ffac	set default max_levels for NAR to 0 and implicitly set it to max resps levels because the previous way was implicitly assuming all models were outputting at 1+7 RVQ bins.	2023-09-10 20:33:33 -05:00
mrq	671dca88ee	throw error when no reference audio is provided in the web UI because someone keeps doing that in the HF space	2023-09-10 15:50:50 -05:00
mrq	ba71020318	added option to limit (or exceed) inferenced RVQ-bin levels through the NAR	2023-09-10 13:50:13 -05:00
mrq	10c34c5b98	added a length-based decay factor for repetition penalty	2023-09-08 21:02:00 -05:00
mrq	b922f35b6b	added documentation on how these new sampling parameters are very iffy and you really need to know what you are doing to use them because this is audio generation and not text generation	2023-09-08 20:43:36 -05:00
mrq	14c78bae39	added lots of sampling options (top-k/top-p, repetition penalty, length penalty)	2023-09-08 20:30:54 -05:00
mrq	f69aad9c65	some day I'll get it right	2023-09-08 15:36:26 -05:00
mrq	b2907ae7e0	seems that my PromEmbedding/RespEmbedding doesn't actually work all that well, naively using dedicated MultiEmbeddings for AR/NAR in the monolithic model is the best way to go	2023-09-08 01:03:24 -05:00
mrq	c47fc3274e	added backwards compat flag	2023-09-07 17:12:17 -05:00
mrq	ab5134f385	tweaks and fixes	2023-09-07 17:08:38 -05:00
mrq	b2c2dec291	added homebrewed per-RVQ-bin embedding solutions	2023-09-07 16:48:02 -05:00
mrq	e7a67410d1	oops	2023-09-07 09:14:03 -05:00
mrq	712808494f	added support for optional prodigy optimizer (https://github.com/konstmish/prodigy ) although it consumes a lot more VRAM per parameter	2023-09-06 20:33:16 -05:00
mrq	7ce06432fd	fixed the AR+NAR dual model, the resp_emb has to be split up (classifier might too)	2023-09-06 19:33:39 -05:00

... 2 3 4 5 6 ...

373 Commits