vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	230da8b559	should be the final things to scramble around for, DAC's 24KHz model is unusable for this, but both encodec's 24KHz and DAC's 44KHz work	2024-05-12 13:22:08 -05:00
mrq	2437a86efa	ugh	2024-05-12 13:02:15 -05:00
mrq	4f1593c8db	a bunch of shit to salvage my old encodec-quantized audio because dac-encoded audio just does not want to converge	2024-05-12 10:17:29 -05:00
mrq	14709ac67f	ughh	2024-05-12 07:30:59 -05:00
mrq	c4b696ebeb	oops	2024-05-09 22:33:40 -05:00
mrq	0d5d545a40	crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc.	2024-05-09 20:28:20 -05:00
mrq	c6e0f905b5	final tweaks (again) before training restarts	2024-05-08 02:11:38 -05:00
mrq	215800484d	correcting my wrong of assuming I could just use raw 24Khz audio in the 44Khz DAC without too much of an issue (there are issues)	2024-05-04 23:49:15 -05:00
mrq	9f738fbd5b	seems I actually don't need RVQ bins 9-32 with the 24Khz DAC model........ (time to requantize my audio...)	2024-05-04 23:09:18 -05:00
mrq	a8ffa88844	it slipped my mind that technically DAC can be used at any sample rate, since it models waveforms; make it a config YAML option to allow this behavior	2024-04-19 18:36:54 -05:00
mrq	8214aa23d7	converting over to a different intermediary dataset format	2024-04-18 21:24:06 -05:00
mrq	4f5c9e518a	actually use the passed-through sample rate from encode for DAC because it does its own resampling I guess	2024-04-18 13:32:41 -05:00
mrq	2e9e6e68f7	Forgot I need to use the DAC's 44K model because 24K model has 32 codebooks instead of 9.	2024-04-17 20:59:25 -05:00
mrq	5ff2b4aab5	finally swallowing the Descript-Audio-Codec pill (I guess I'm going to have to regenerate my entire dataset)	2024-04-17 20:39:35 -05:00
mrq	545162195b	deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things	2024-04-15 19:54:32 -05:00
mrq	09cda7d3f9	added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup	2023-10-16 19:30:38 -05:00
mrq	2bc2d08b09	(need to verify) added modifying model size and config bool to align with VALL-E continuous' methodology	2023-09-01 17:19:34 -05:00
mrq	78378ed1ce	overhauled dataloading code to be marginally faster, mostly cleaned up, and can leverage a metadata json to help things out	2023-08-26 19:53:23 -05:00
mrq	22904a8639	more oversights fixed because I've been using a cached dataloader forever now and didn't catch these problems	2023-08-24 10:25:33 -05:00
mrq	4585824cd3	tweaks, including exporting on save/quit	2023-08-23 16:43:03 -05:00
mrq	7b1b82e0e5	inferencing cleanup	2023-08-20 21:36:02 -05:00
mrq	2d1a9f10c0	nightmare of spaghetti that might break compat; mechanism to increase RVQ bins of an existing model without retraining, keeps sampled proms/resps at max RVQ level and trim off excess levels according to what model receives them, some other things I already forgot (I really hope no one else has weights being baked right now)	2023-08-19 15:06:33 -05:00
mrq	77292c42f9	tested the training preparation for tasks ns, sr, and tse (I don't expect it to go well with only 2 RVQ bins)	2023-08-18 23:55:40 -05:00
mrq	bbb0563b3d	pseudocode polyfill stub some other flavor of working on adding the tasks	2023-08-18 22:22:13 -05:00
mrq	fb4e816823	oops	2023-08-18 21:11:19 -05:00
mrq	d7deaf6def	distributed training works now (hopefully)	2023-08-13 22:07:45 -05:00
mrq	608c1970eb	ops	2023-08-03 20:36:19 -05:00
mrq	f6597e2dfe	adjustments	2023-08-02 18:36:26 -05:00
mrq	bf8cedc9dd	Rewrite init	2023-08-02 21:53:35 +00:00

29 Commits