vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	ce8bb1e4f7	sanity cleanups with weird off-by-one-ness, cleaned up and validated vall_e.models.experimental works again	2024-07-27 15:36:05 -05:00
mrq	682e4387dc	oops (fixed proms being erased from a config oversight)	2024-07-25 12:39:57 -05:00
mrq	75b04686f8	added prom-less training / inferencing, some other things	2024-07-22 19:36:07 -05:00
mrq	491ae2a684	some insanity for sanity checks (some phonemes from phonemizing japanese are not in my tokenizer...)	2024-07-22 00:30:40 -05:00
mrq	e19aa643a6	cleaned up demo page creation, added option to pass in RVQ level sampling distribution for training	2024-07-21 19:12:03 -05:00
mrq	d87b492295	added rudimentary demo page creator (currently just embeds base64 wavs into the page, need to test not doing that)	2024-07-19 20:49:40 -05:00
mrq	28a674e0f1	fixes...	2024-07-18 23:25:32 -05:00
mrq	39f961abcd	test trainer (vall_e.models.ar_nar) tests some SpeechX features	2024-07-18 18:46:45 -05:00
mrq	83a0954f85	fixes for re-introducing SpeechX tasks (need to actually validate if these all do the right things)	2024-07-18 17:16:32 -05:00
mrq	bccbb77a1a	added option to either naively concat codes to concat audio waveforms (prior behavior) or to decode => concat => encode instead (although this only currently happens for prom sampling if an utternace is too small)	2024-07-18 16:48:41 -05:00
mrq	97e768601c	re-introducing SpeechX tasks (need to validate them all, everything works with base tts anyways)	2024-07-18 16:16:14 -05:00
mrq	3acc54df22	allow loading a different model within the web ui (apparently I did not have the web UI in the documentation)	2024-07-15 19:59:48 -05:00
mrq	312a8e3ead	add shuffle to samplers that can support it	2024-06-30 11:36:46 -05:00
mrq	bc2a6fa756	sanity cleanup: moved experimental features under its own thing	2024-06-30 10:37:33 -05:00
mrq	793ccb16fb	ugh	2024-06-29 22:14:35 -05:00
mrq	c4dd523b6f	change from chunk-slicing paths for distributed dataloader to instead interleave	2024-06-29 10:10:35 -05:00
mrq	dd40463803	limit eval size because the training batch size seems to be used for the eval dataloader, somehow (bandaid)	2024-06-29 09:11:28 -05:00
mrq	591d3ac848	have eval dataloader use eval batch size for batchedordersampler	2024-06-28 22:44:00 -05:00
mrq	83075c1505	sort duration buckets to ensure that paths sorted-by-duration are actually sorted by duration (because i didnt know that python dicts can have non-strings as keys), added batching samples based on total duration to ensure best training throughput	2024-06-28 22:28:54 -05:00
mrq	8fffb94964	backport fix from tortoise_tts with local trainer + loading state when training lora	2024-06-25 13:41:29 -05:00
mrq	19410a919e	ugh	2024-06-15 12:29:03 -05:00
mrq	d343bde09b	residual_in_fp32=False for mamba arch backends because it breaks the classifier (output projection / lm head / what-have-you) under AMP	2024-06-15 12:08:03 -05:00
mrq	31f71fa134	sampler update (some brainworm just never actually had a sampler for sample_type=path)	2024-06-14 16:55:40 -05:00
mrq	b3b67f34ac	added option to sort paths by durations to better group equally lengthed sequences together (and there was maybe a logic error from creating the samplers and then interleave-reordering paths, desyncing them, maybe)	2024-06-13 22:37:34 -05:00
mrq	cca542a4c0	ugh	2024-06-11 23:59:28 -05:00
mrq	65a8960305	option to split classifier per-level instead of sharing one (at this point I'm just scrambling to try and cope with training a DAC model, the NAR is being a pain)	2024-06-11 22:28:59 -05:00
mrq	234f9efc6e	ugh	2024-06-09 11:39:43 -05:00
mrq	132a02c48b	sanity cleanup, backup config yaml for each log file	2024-06-09 11:22:52 -05:00
mrq	4ade2b60ee	ugh	2024-06-06 21:57:11 -05:00
mrq	014e565c4b	tweaks	2024-06-04 20:41:13 -05:00
mrq	6d5bd0156a	fixes	2024-06-04 18:50:48 -05:00
mrq	ed3aeaf3a1	copy pasted from test to actual trainer	2024-06-04 18:40:30 -05:00
mrq	0aa01ba31a	forgot one crucial detail (you need the previous RVQ level to keep coherence between all RVQ levels) (experimental deinterleaved is a bit crusty though)	2024-06-04 18:30:30 -05:00
mrq	406ff7bbe1	re-implemented config.model.interleave for the HF-compat experimental method	2024-06-04 14:19:52 -05:00
mrq	c93d5863fd	fixes	2024-06-04 00:07:00 -05:00
mrq	934672252b	feverish cleanup	2024-06-03 21:28:49 -05:00
mrq	8cf176ab46	ugh	2024-06-01 10:46:42 -05:00
mrq	d0ebce6bac	ugh	2024-06-01 10:30:13 -05:00
mrq	74df2f5332	split sampler dict by global_rank, also handle splitting dataset paths by global_rank if sampler_type == path (because I do not trust DistributedSampler) (need to test)	2024-06-01 09:29:49 -05:00
mrq	ddbacde0d1	DAC just doesn't work well enough......	2024-05-25 11:07:52 -05:00
mrq	e3ef89f5aa	100x better for subtrain/eval to be by group instead	2024-05-19 16:40:14 -05:00
mrq	4bc7e5a6d1	fix loading without needing an hdf5 dataset already prepped (and some other incidental speedups during dataloader prep)	2024-05-18 07:14:26 -05:00
mrq	d88a5ca183	ugh	2024-05-16 07:25:33 -05:00
mrq	d9aabfa3ae	final tweaks, hopefully, again	2024-05-15 23:04:19 -05:00
mrq	2437a86efa	ugh	2024-05-12 13:02:15 -05:00
mrq	4f1593c8db	a bunch of shit to salvage my old encodec-quantized audio because dac-encoded audio just does not want to converge	2024-05-12 10:17:29 -05:00
mrq	14709ac67f	ughh	2024-05-12 07:30:59 -05:00
mrq	3774fcbdee	ugh	2024-05-11 22:58:38 -05:00
mrq	4d93a16ef7	might just be better to explicitly define prompt duration ranges, especially under a "train small contexts then increase it" training paradigm	2024-05-11 09:50:54 -05:00
mrq	0d5d545a40	crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc.	2024-05-09 20:28:20 -05:00

1 2 3

117 Commits