vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	54547b74d8	experimental implementation of STT (need to actually test on a model, test trainer seems to work)	2024-09-05 20:43:20 -05:00
mrq	32287710a2	moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)	2024-08-29 13:27:16 -05:00
mrq	6a733eb2ed	changed torch.Tensor().to(device, dtype) to just torch.tensor(..., device, dtype) because it's been bothering my autism that I'm creating tensors then converting rather than creating with the right device/dtype, some 'optimization' to compile the model but it doesnt seem to do anything useful	2024-08-03 22:10:21 -05:00
mrq	11fa3da665	some cleanup, fixed the wrapper attention to explicitly use other sdpa backends	2024-08-03 19:51:00 -05:00
mrq	66407e5bdb	tweaks for the NAR-len model, maybe	2024-08-03 08:40:39 -05:00
mrq	97c5241bef	fixes, throw an exception when using NAR only model with non-unified position IDs, since for some reason it outputs garbage for the NAR	2024-08-02 22:25:49 -05:00
mrq	387358bc8a	fixes for the NAR-len model, and documentation some config options, and a better way to handle resizing modules on state_dict load	2024-07-31 20:35:09 -05:00
mrq	ce8bb1e4f7	sanity cleanups with weird off-by-one-ness, cleaned up and validated vall_e.models.experimental works again	2024-07-27 15:36:05 -05:00
mrq	e19aa643a6	cleaned up demo page creation, added option to pass in RVQ level sampling distribution for training	2024-07-21 19:12:03 -05:00
mrq	d87b492295	added rudimentary demo page creator (currently just embeds base64 wavs into the page, need to test not doing that)	2024-07-19 20:49:40 -05:00
mrq	97e768601c	re-introducing SpeechX tasks (need to validate them all, everything works with base tts anyways)	2024-07-18 16:16:14 -05:00
mrq	3acc54df22	allow loading a different model within the web ui (apparently I did not have the web UI in the documentation)	2024-07-15 19:59:48 -05:00
mrq	7b210d9738	sanity cleanup	2024-07-04 15:58:08 -05:00
mrq	dced595391	more cleanup	2024-06-30 11:00:12 -05:00
mrq	bc2a6fa756	sanity cleanup: moved experimental features under its own thing	2024-06-30 10:37:33 -05:00
mrq	bcf3910a17	the NAR only dream is dead (it just won't work)	2024-06-12 19:49:47 -05:00
mrq	a9353cf9fa	ugh	2024-06-12 00:14:29 -05:00
mrq	cca542a4c0	ugh	2024-06-11 23:59:28 -05:00
mrq	132a02c48b	sanity cleanup, backup config yaml for each log file	2024-06-09 11:22:52 -05:00
mrq	8d068fa3f9	reticulating splines	2024-06-08 20:30:15 -05:00
mrq	545162195b	deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things	2024-04-15 19:54:32 -05:00
mrq	08bae355eb	actually use langs from the dataloader	2023-10-11 21:21:50 -05:00
mrq	8740cdefc6	added initial support for languages (still testing, marked as model version 3), added experimental 'context extend by limiting the resp context' (untested)	2023-10-11 20:38:40 -05:00
mrq	7facacf7c9	separated samplers into its own file, don't bother copying the logits back to the GPU after sampling, it's not necessary	2023-10-11 12:25:31 -05:00
mrq	e727b6e5c1	changed dynamic temperature trigger to be a min-(n)ar-temp value between [0,(n)ar-temp), flags to set min temp, checkbox in web UI to request it	2023-10-10 17:02:33 -05:00
mrq	87db03dd93	trim the input prompt to 3 seconds when training NAR tasks (marked as experimental; the paper mentions doing so, but I don't know how much this would harm the retention heads)	2023-10-09 22:03:58 -05:00
mrq	c0b25541e3	restructured some things with the model to remove dead weights	2023-09-20 19:10:59 -05:00
mrq	a6bfe43590	added mirostat sampling (given a partially trained model, it got far decent output than I expected, need to test on a better trained model)	2023-09-18 18:55:41 -05:00
mrq	2567e082b5	UGH	2023-09-16 00:26:13 -05:00
mrq	23a5fdd645	implemented a naive beam search (I really should be taking a break)	2023-09-12 21:28:07 -05:00
mrq	40ef34e1ca	this embedding class definitely works, and migrating from the previous embedding weights seems to work.	2023-09-11 14:13:42 -05:00
mrq	a1f250ffac	set default max_levels for NAR to 0 and implicitly set it to max resps levels because the previous way was implicitly assuming all models were outputting at 1+7 RVQ bins.	2023-09-10 20:33:33 -05:00
mrq	ba71020318	added option to limit (or exceed) inferenced RVQ-bin levels through the NAR	2023-09-10 13:50:13 -05:00
mrq	10c34c5b98	added a length-based decay factor for repetition penalty	2023-09-08 21:02:00 -05:00
mrq	14c78bae39	added lots of sampling options (top-k/top-p, repetition penalty, length penalty)	2023-09-08 20:30:54 -05:00
mrq	ab5134f385	tweaks and fixes	2023-09-07 17:08:38 -05:00
mrq	b2c2dec291	added homebrewed per-RVQ-bin embedding solutions	2023-09-07 16:48:02 -05:00
mrq	e7a67410d1	oops	2023-09-07 09:14:03 -05:00
mrq	7ce06432fd	fixed the AR+NAR dual model, the resp_emb has to be split up (classifier might too)	2023-09-06 19:33:39 -05:00
mrq	100ca6b7d0	added option to use SGD optimizer through the YAML, added option to pass in additional optimizer parameters through the YAML, added experimental unified AR+NAR model (does not seem fruitful in testing)	2023-09-06 18:58:35 -05:00
mrq	2f9cd0842f	merged dedicated interleaved AR code with the normal AR code	2023-09-03 22:46:08 -05:00
mrq	8a6c203277	added per-speaker samplers	2023-09-03 21:27:13 -05:00
mrq	2f06166ddd	cleanups	2023-09-01 21:33:51 -05:00
mrq	e40c0d34a0	somewhat got recurrent forward working (it's as accurate as chunkwise forward: it's not accurate at all), added option to use AMP instead of blanket setting the weight's dtype	2023-09-01 20:58:29 -05:00
mrq	2bc2d08b09	(need to verify) added modifying model size and config bool to align with VALL-E continuous' methodology	2023-09-01 17:19:34 -05:00
mrq	165a1154e0	Undo naive=False test flag, this shouldn't have made its way in	2023-08-26 22:00:43 -05:00
mrq	2d1a9f10c0	nightmare of spaghetti that might break compat; mechanism to increase RVQ bins of an existing model without retraining, keeps sampled proms/resps at max RVQ level and trim off excess levels according to what model receives them, some other things I already forgot (I really hope no one else has weights being baked right now)	2023-08-19 15:06:33 -05:00
mrq	2a71486cb6	preparing for SpeechX extensions	2023-08-18 20:58:07 -05:00
mrq	c85101403f	big cleanup	2023-08-03 20:26:36 -05:00
mrq	7a06b27a9c	Tweaks	2023-08-02 22:06:39 +00:00

50 Commits