vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	682e4387dc	oops (fixed proms being erased from a config oversight)	2024-07-25 12:39:57 -05:00
mrq	1acb0e9c84	added experimental training setting to perform token dropout to MAYBE compensate for errors from the preceding RVQ level (two types: token error offset, token dropout embedding replace)	2024-07-24 19:35:17 -05:00
mrq	611a1c4bdc	might help	2024-07-22 20:57:01 -05:00
mrq	188d116222	some weird fixes for an equally weird regression with LoRA loading	2024-07-22 20:47:24 -05:00
mrq	e33c4b0cb1	oops	2024-07-22 19:38:39 -05:00
mrq	75b04686f8	added prom-less training / inferencing, some other things	2024-07-22 19:36:07 -05:00
mrq	491ae2a684	some insanity for sanity checks (some phonemes from phonemizing japanese are not in my tokenizer...)	2024-07-22 00:30:40 -05:00
mrq	ad024f400f	actually pass language into dataset process script, fix coercing japanese into hiragana because espeak does not like kanji	2024-07-21 23:21:37 -05:00
mrq	3e5ca3a201	more demo page tweaks	2024-07-21 19:31:13 -05:00
mrq	7366f36f81	oops	2024-07-21 19:17:25 -05:00
mrq	e19aa643a6	cleaned up demo page creation, added option to pass in RVQ level sampling distribution for training	2024-07-21 19:12:03 -05:00
mrq	ba7ee8c0ee	added demo link to readme	2024-07-19 21:22:30 -05:00
mrq	9ec88d9444	validated passing URI path for assets instead of base64 encoding them	2024-07-19 21:07:17 -05:00
mrq	d87b492295	added rudimentary demo page creator (currently just embeds base64 wavs into the page, need to test not doing that)	2024-07-19 20:49:40 -05:00
mrq	d53038a9e4	actually have split classifiers working	2024-07-19 15:33:31 -05:00
mrq	692d09f9c1	eval/validation fix for SpeechX tasks	2024-07-19 09:16:37 -05:00
mrq	28a674e0f1	fixes...	2024-07-18 23:25:32 -05:00
mrq	39f961abcd	test trainer (vall_e.models.ar_nar) tests some SpeechX features	2024-07-18 18:46:45 -05:00
mrq	83a0954f85	fixes for re-introducing SpeechX tasks (need to actually validate if these all do the right things)	2024-07-18 17:16:32 -05:00
mrq	bccbb77a1a	added option to either naively concat codes to concat audio waveforms (prior behavior) or to decode => concat => encode instead (although this only currently happens for prom sampling if an utternace is too small)	2024-07-18 16:48:41 -05:00
mrq	97e768601c	re-introducing SpeechX tasks (need to validate them all, everything works with base tts anyways)	2024-07-18 16:16:14 -05:00
mrq	c2b8035e74	oops, kept forgetting to actually pass in lang/tone tokens (despite not really using these at the moment)	2024-07-18 14:18:34 -05:00
mrq	22fe53508c	added experimental disjointed position IDs (because I think this might help because technically a sequence is made up of several parts, and the position embeddings shouldn't be unified)	2024-07-16 19:52:41 -05:00
mrq	fe0f235335	mechanism to store the model config inside the weights and load them, some other things to allow LoRA training on the RetNet (gradient checkpointing will gripe about inputs not having require_grad and nothing seems to remedy it)	2024-07-16 18:23:13 -05:00
mrq	3acc54df22	allow loading a different model within the web ui (apparently I did not have the web UI in the documentation)	2024-07-15 19:59:48 -05:00
mrq	7b210d9738	sanity cleanup	2024-07-04 15:58:08 -05:00
mrq	1ecf2793f4	(commented-out) support for facebookresearch/AudioDec, but support really didn't wow me (so I commented it out until I figure out why my output audio is super crusty with AudioDec)	2024-07-04 15:40:51 -05:00
mrq	db62e55a38	oops, I forgot to use the new thing for audio_backend	2024-07-04 14:54:11 -05:00
mrq	f770467eb3	stuff	2024-07-01 18:13:29 -05:00
mrq	312a8e3ead	add shuffle to samplers that can support it	2024-06-30 11:36:46 -05:00
mrq	396af541c5	ugh	2024-06-30 11:11:58 -05:00
mrq	dced595391	more cleanup	2024-06-30 11:00:12 -05:00
mrq	bc2a6fa756	sanity cleanup: moved experimental features under its own thing	2024-06-30 10:37:33 -05:00
mrq	b21f74a5c5	added summing of external embeddings (at this point i dont think any amount of cope bandaids will get DAC to train nicely, I think the RVQ levels the NAR tends add too much noise if they're not accurate)	2024-06-29 23:42:30 -05:00
mrq	793ccb16fb	ugh	2024-06-29 22:14:35 -05:00
mrq	2808f881c8	cleaned up subjugated audio embedding into a flag, flag can also have it include the original, underlying embedding as well (it seems to do better when set to inclusive)	2024-06-29 21:46:35 -05:00
mrq	ec5eaebcbc	experimental method of using DACs quantizer ""embeddings"" to see if it helps with model quality	2024-06-29 19:46:11 -05:00
mrq	a8718d35a4	nasty bandaid because some of my DAC dataset only has 8 RVQ levels instead of the full 9	2024-06-29 10:16:37 -05:00
mrq	c4dd523b6f	change from chunk-slicing paths for distributed dataloader to instead interleave	2024-06-29 10:10:35 -05:00
mrq	dd40463803	limit eval size because the training batch size seems to be used for the eval dataloader, somehow (bandaid)	2024-06-29 09:11:28 -05:00
mrq	591d3ac848	have eval dataloader use eval batch size for batchedordersampler	2024-06-28 22:44:00 -05:00
mrq	1a392b69f6	local training backend should be a bit more aware of variable batch sizes, maybe	2024-06-28 22:39:05 -05:00
mrq	83075c1505	sort duration buckets to ensure that paths sorted-by-duration are actually sorted by duration (because i didnt know that python dicts can have non-strings as keys), added batching samples based on total duration to ensure best training throughput	2024-06-28 22:28:54 -05:00
mrq	5176ced35f	readme tweaks	2024-06-28 21:02:54 -05:00
mrq	8fffb94964	backport fix from tortoise_tts with local trainer + loading state when training lora	2024-06-25 13:41:29 -05:00
mrq	62a53eed64	fixed deducing tokenizer path, added option to default to naive tokenizer (for old models, like ar+nar-retnet-8)	2024-06-18 22:11:14 -05:00
mrq	8a986eb480	load exported LoRA weights if exists (to-do: make a better LoRA loading mechanism)	2024-06-18 21:45:46 -05:00
mrq	2bfe786ebd	ban stop token for NAR levels (because sometimes it gets sampled and causes problems)	2024-06-17 22:14:43 -05:00
mrq	7cfb78fa64	enable LoRA for targetted RVQ levels (to experiment with, seems to help)	2024-06-17 21:45:03 -05:00
mrq	7047fcc6e2	actually make deepspeed work with LoRAs	2024-06-17 13:55:37 -05:00

1 2 3 4 5 ...

468 Commits