vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	d31f27119a	regex replace out the (lang) markers in espeak, updated tokenizer vocab as lazily as possible to not have unk tokens	2024-09-21 12:29:28 -05:00
mrq	491ae2a684	some insanity for sanity checks (some phonemes from phonemizing japanese are not in my tokenizer...)	2024-07-22 00:30:40 -05:00
mrq	3e5ca3a201	more demo page tweaks	2024-07-21 19:31:13 -05:00
mrq	e19aa643a6	cleaned up demo page creation, added option to pass in RVQ level sampling distribution for training	2024-07-21 19:12:03 -05:00
mrq	d87b492295	added rudimentary demo page creator (currently just embeds base64 wavs into the page, need to test not doing that)	2024-07-19 20:49:40 -05:00
mrq	39f961abcd	test trainer (vall_e.models.ar_nar) tests some SpeechX features	2024-07-18 18:46:45 -05:00
mrq	f770467eb3	stuff	2024-07-01 18:13:29 -05:00
mrq	396af541c5	ugh	2024-06-30 11:11:58 -05:00
mrq	dced595391	more cleanup	2024-06-30 11:00:12 -05:00
mrq	dd40463803	limit eval size because the training batch size seems to be used for the eval dataloader, somehow (bandaid)	2024-06-29 09:11:28 -05:00
mrq	934672252b	feverish cleanup	2024-06-03 21:28:49 -05:00
mrq	7feeb944a0	probably insane with even entertaining going this route	2024-06-03 20:26:27 -05:00
mrq	458b95d196	added option to split between text loss and audio loss (to-do: document this better), because it may or may not be a problem with LLaMA-backed models because my loss hovers around 3.9 / 56% accuracy despite sounding decent at the moment	2024-05-19 11:23:56 -05:00
mrq	d9aabfa3ae	final tweaks, hopefully, again	2024-05-15 23:04:19 -05:00
mrq	8d79f78e0a	god I need to replace omegaconf	2024-05-12 14:01:52 -05:00
mrq	215800484d	correcting my wrong of assuming I could just use raw 24Khz audio in the 44Khz DAC without too much of an issue (there are issues)	2024-05-04 23:49:15 -05:00
mrq	783db3d2c5	forgot to commit the DAC test utterance	2024-05-04 09:46:51 -05:00
mrq	6a11bc9cb6	update tokenizer because, for some reason, it had the wrong order for the special tokens to where eos = unk	2024-04-29 09:09:26 -05:00
mrq	071fb97777	dataset preparation script updates, caved and am using HF tokenizer now	2024-04-21 14:49:18 -05:00
mrq	2e9e6e68f7	Forgot I need to use the DAC's 44K model because 24K model has 32 codebooks instead of 9.	2024-04-17 20:59:25 -05:00
mrq	2deb995cc9	updated setup script	2023-10-06 20:08:28 -05:00
mrq	4abd6564d1	fixed training stats not loading from exported weights, a bit of a readme cleanup, updated example training yaml	2023-09-23 19:59:00 -05:00
mrq	2d1a9f10c0	nightmare of spaghetti that might break compat; mechanism to increase RVQ bins of an existing model without retraining, keeps sampled proms/resps at max RVQ level and trim off excess levels according to what model receives them, some other things I already forgot (I really hope no one else has weights being baked right now)	2023-08-19 15:06:33 -05:00
mrq	f7f6d3bf6d	validated that SpeechX tasks cse and nse works, added a method to test each task by invoking `python3 -m vall_e.data --action=tasks --tasks='sr,se,cse,nse'`	2023-08-19 09:50:07 -05:00
mrq	8f42c578c9	setting up for allowing training for a partial amount of the speechx tasks (do NOT try this at home yet without a proper model, as performance is predecated on having a solid base vall-e model for the tasks	2023-08-19 00:16:08 -05:00
mrq	508677fcd5	repaired auraloss loss calc during eval/val	2023-08-18 21:19:47 -05:00
mrq	fb4e816823	oops	2023-08-18 21:11:19 -05:00
mrq	5fa86182b5	oops	2023-08-14 10:50:40 -05:00
mrq	d7deaf6def	distributed training works now (hopefully)	2023-08-13 22:07:45 -05:00
mrq	c85101403f	big cleanup	2023-08-03 20:26:36 -05:00
mrq	7a06b27a9c	Tweaks	2023-08-02 22:06:39 +00:00
mrq	d88e43800b	adjustments	2023-08-02 22:01:49 +00:00
mrq	bf8cedc9dd	Rewrite init	2023-08-02 21:53:35 +00:00

33 Commits