vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	71731ed785	added prefixing with silence (was to test something, currently hidden under cfg.experimental=True)	2024-10-18 17:19:52 -05:00
mrq	6b04c13c56	print warning if audio promtpless inferencing with low AR temp (it really doesn't like low temps / greedy sampling)	2024-10-18 17:01:40 -05:00
mrq	c8f31db1de	default to greedy sample AR (i should probably test this more but it seems to pass my harvard sentences and tongue twisters)	2024-10-18 16:58:56 -05:00
mrq	fc8dfd8617	made greedy AR sampling viable (and preferable), with caveats (per comment in vall_e.models.ar_nar)	2024-10-18 16:55:00 -05:00
mrq	8b6095f681	saner defaults, maybe	2024-10-17 14:37:21 -05:00
mrq	48461833c2	ugh	2024-10-15 19:30:43 -05:00
mrq	eea70f5698	kludge fix for an oversight in the model when trying to train for longer input prompt durations......	2024-10-15 19:25:03 -05:00
mrq	04e983b86b	modified demo page to be more modular with demoing comparisons, actually provide a path to use modified naive attention, entropix sampling is not tied to an experimental yaml flag now	2024-10-12 11:27:55 -05:00
mrq	d0ab7d755a	added min-p (really does not seem useful since it's very sensitive), more tweaks to entropix	2024-10-11 22:36:06 -05:00
mrq	75a4c866d6	more demo page tweaks, added arg to force enable/disable LoRAs for inferencing (to-do: setup arg flags to handle this, and checkbox in web UI)	2024-10-10 19:04:12 -05:00
mrq	2ea978f318	added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval	2024-10-10 13:40:25 -05:00
mrq	4a8e3ccf06	README tweaks, added --input-prompt-prefix as an experiment (its literally better to just not do this, but i'll retain it in case i have a revelation on how to improve it)	2024-10-04 18:57:19 -05:00
mrq	4f3c7a37c8	also do text similarities (dont know what use I'll have for this)	2024-09-10 16:45:59 -05:00
mrq	1c615a0f52	helper script (vall_e.emb.similar) to figure out the best way to compute similarity scores for audio (iunno how to go about it desu)	2024-09-10 16:34:23 -05:00
mrq	54203c059d	validated rep pen for STT (sometimes needed to wrangle the model)	2024-09-08 08:30:30 -05:00
mrq	a6ad0577b8	cleanup the resultant text from STT	2024-09-06 18:44:25 -05:00
mrq	4bd9bb39c8	webui for STT (still need to bake the model to handle it better, a few hours so far has it generate what looks like a normal transcription but does not correlate to the audio right now)	2024-09-06 15:13:04 -05:00
mrq	94cf81d38c	tweak	2024-09-05 23:21:18 -05:00
mrq	32287710a2	moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)	2024-08-29 13:27:16 -05:00
mrq	b7b99a25f1	added ability to specify attention backend for CLI and webui (because im tired of editing the yaml)	2024-08-26 19:33:51 -05:00
mrq	d7c6be6f78	fix weird regression in handling checkpoints when backend is local, but deepspeed checkpoints are in (it was handled with LoRA loading but not real loading...)	2024-07-30 22:15:56 -05:00
mrq	c2f5b916fc	added what I think is DRY sampling	2024-07-29 19:15:07 -05:00
mrq	75b04686f8	added prom-less training / inferencing, some other things	2024-07-22 19:36:07 -05:00
mrq	d87b492295	added rudimentary demo page creator (currently just embeds base64 wavs into the page, need to test not doing that)	2024-07-19 20:49:40 -05:00
mrq	3acc54df22	allow loading a different model within the web ui (apparently I did not have the web UI in the documentation)	2024-07-15 19:59:48 -05:00
mrq	bc2a6fa756	sanity cleanup: moved experimental features under its own thing	2024-06-30 10:37:33 -05:00
mrq	8fffb94964	backport fix from tortoise_tts with local trainer + loading state when training lora	2024-06-25 13:41:29 -05:00
mrq	bcf3910a17	the NAR only dream is dead (it just won't work)	2024-06-12 19:49:47 -05:00
mrq	a7a6e0ac76	validated that inferencing works, changed some defaults (NAR benefits from greedy sampling)	2024-06-09 17:11:38 -05:00
mrq	da8242d086	finally got around to removing omegaconf	2024-06-07 20:23:53 -05:00
mrq	b2194b859a	re-added loading multiple models because I'm now entertaining having split AR/NAR models again (and need a way to load both at once)	2024-06-06 09:48:43 -05:00
mrq	ddbacde0d1	DAC just doesn't work well enough......	2024-05-25 11:07:52 -05:00
mrq	ffa200eec7	added option to specify frames per second for the given audio representation (Encodec is 75Hz, DAC is 41Hz (at 24K sources))	2024-05-04 12:05:41 -05:00
mrq	b5d1456a09	backwards compat for my shitty old weights (was testing if disabling AudioEmbedding summing magically made things better (it did not))	2024-04-29 22:14:01 -05:00
mrq	071fb97777	dataset preparation script updates, caved and am using HF tokenizer now	2024-04-21 14:49:18 -05:00
mrq	545162195b	deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things	2024-04-15 19:54:32 -05:00
mrq	3da1518ace	added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work)	2024-01-31 21:48:36 -06:00
mrq	c690aa509d	fixes and compat (MoE-fying an existing model and retraining from there just ruins it after a second of audio...)	2023-12-25 21:20:32 -06:00
mrq	fb467b19ba	exposed rolling resp context to the web UI, added passing in language to inferencing command line	2023-10-12 23:21:01 -05:00
mrq	65f500083d	tweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to work	2023-10-12 22:21:43 -05:00
mrq	8740cdefc6	added initial support for languages (still testing, marked as model version 3), added experimental 'context extend by limiting the resp context' (untested)	2023-10-11 20:38:40 -05:00
mrq	100dd164e6	apply phoneme cleanup in inferencing as well	2023-10-10 19:21:19 -05:00
mrq	e727b6e5c1	changed dynamic temperature trigger to be a min-(n)ar-temp value between [0,(n)ar-temp), flags to set min temp, checkbox in web UI to request it	2023-10-10 17:02:33 -05:00
mrq	893a610fad	cleanup, use deepspeed inferencing pathway if requested	2023-10-09 15:24:04 -05:00
mrq	26fbb92ec6	reduced dynamic temperature threshold to > 1.0, as it seems to not quite be useful for audio LMs, sped up any sampling that touches logits by copying them to CPU first, as accessing tensors on the GPU is slow as balls)	2023-10-09 14:46:17 -05:00
mrq	c0b25541e3	restructured some things with the model to remove dead weights	2023-09-20 19:10:59 -05:00
mrq	a6bfe43590	added mirostat sampling (given a partially trained model, it got far decent output than I expected, need to test on a better trained model)	2023-09-18 18:55:41 -05:00
mrq	23a5fdd645	implemented a naive beam search (I really should be taking a break)	2023-09-12 21:28:07 -05:00
mrq	ba71020318	added option to limit (or exceed) inferenced RVQ-bin levels through the NAR	2023-09-10 13:50:13 -05:00
mrq	4f61f5c889	added option to set the trim length for an input prompt	2023-09-09 18:04:44 -05:00

1 2

69 Commits