vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	cddf8ca814	sort batches to try and reduce number of padded tokens in batched inference (also commented out F5 samples getting added to the demo page because I would have to regenerate them)	2024-12-11 22:45:38 -06:00
mrq	0c69e798f7	template cleanup	2024-12-11 20:06:55 -06:00
mrq	8568a93dad	added WER/SIM-O metrics, added APOLLO but I need to test it	2024-12-10 20:13:21 -06:00
mrq	a6c745bafb	chinese (mandarin?) support added (I guess I don't need pinyin, but tone markers are handled), korean validated, vocab adjusted	2024-12-09 14:26:19 -06:00
mrq	a032ff588f	doc update, added automatically deducing language from a given text, also checks if the input is already phonemized text to allow direct control without being cringe (procrastinating adding WER/SIM-O)	2024-12-07 22:34:25 -06:00
mrq	6845c447c9	added more harvard sentences to load from a text file	2024-11-21 13:18:11 -06:00
mrq	a9d2faf2d7	all I can do now until I wait for the model to (re)train for pure NAR	2024-11-09 22:57:34 -06:00
mrq	a96f5aee32	adjusted how i want to pass eval kwargs	2024-10-25 20:38:09 -05:00
mrq	1a02cd5bce	modify demo template to say F5 instead of YourTTS, swap LoRA comparison around to make the lora'd the base file, and the no-lora the suffix'd file	2024-10-21 19:52:02 -05:00
mrq	75a4c866d6	more demo page tweaks, added arg to force enable/disable LoRAs for inferencing (to-do: setup arg flags to handle this, and checkbox in web UI)	2024-10-10 19:04:12 -05:00
mrq	96d05be73c	demo page tweaks	2024-10-10 13:52:37 -05:00
mrq	9da630f73a	swap order of demo entries, as the model prioritizes adhering to the speaker prompt more (instead of trying to match the ground truth magically)	2024-09-25 23:31:24 -05:00
mrq	2266d34818	oops	2024-09-21 16:06:01 -05:00
mrq	d31f27119a	regex replace out the (lang) markers in espeak, updated tokenizer vocab as lazily as possible to not have unk tokens	2024-09-21 12:29:28 -05:00
mrq	491ae2a684	some insanity for sanity checks (some phonemes from phonemizing japanese are not in my tokenizer...)	2024-07-22 00:30:40 -05:00
mrq	3e5ca3a201	more demo page tweaks	2024-07-21 19:31:13 -05:00
mrq	e19aa643a6	cleaned up demo page creation, added option to pass in RVQ level sampling distribution for training	2024-07-21 19:12:03 -05:00
mrq	d87b492295	added rudimentary demo page creator (currently just embeds base64 wavs into the page, need to test not doing that)	2024-07-19 20:49:40 -05:00
mrq	39f961abcd	test trainer (vall_e.models.ar_nar) tests some SpeechX features	2024-07-18 18:46:45 -05:00
mrq	f770467eb3	stuff	2024-07-01 18:13:29 -05:00
mrq	396af541c5	ugh	2024-06-30 11:11:58 -05:00
mrq	dced595391	more cleanup	2024-06-30 11:00:12 -05:00
mrq	dd40463803	limit eval size because the training batch size seems to be used for the eval dataloader, somehow (bandaid)	2024-06-29 09:11:28 -05:00
mrq	934672252b	feverish cleanup	2024-06-03 21:28:49 -05:00
mrq	7feeb944a0	probably insane with even entertaining going this route	2024-06-03 20:26:27 -05:00
mrq	458b95d196	added option to split between text loss and audio loss (to-do: document this better), because it may or may not be a problem with LLaMA-backed models because my loss hovers around 3.9 / 56% accuracy despite sounding decent at the moment	2024-05-19 11:23:56 -05:00
mrq	d9aabfa3ae	final tweaks, hopefully, again	2024-05-15 23:04:19 -05:00
mrq	8d79f78e0a	god I need to replace omegaconf	2024-05-12 14:01:52 -05:00
mrq	215800484d	correcting my wrong of assuming I could just use raw 24Khz audio in the 44Khz DAC without too much of an issue (there are issues)	2024-05-04 23:49:15 -05:00
mrq	783db3d2c5	forgot to commit the DAC test utterance	2024-05-04 09:46:51 -05:00
mrq	6a11bc9cb6	update tokenizer because, for some reason, it had the wrong order for the special tokens to where eos = unk	2024-04-29 09:09:26 -05:00
mrq	071fb97777	dataset preparation script updates, caved and am using HF tokenizer now	2024-04-21 14:49:18 -05:00
mrq	2e9e6e68f7	Forgot I need to use the DAC's 44K model because 24K model has 32 codebooks instead of 9.	2024-04-17 20:59:25 -05:00
mrq	2deb995cc9	updated setup script	2023-10-06 20:08:28 -05:00
mrq	4abd6564d1	fixed training stats not loading from exported weights, a bit of a readme cleanup, updated example training yaml	2023-09-23 19:59:00 -05:00
mrq	2d1a9f10c0	nightmare of spaghetti that might break compat; mechanism to increase RVQ bins of an existing model without retraining, keeps sampled proms/resps at max RVQ level and trim off excess levels according to what model receives them, some other things I already forgot (I really hope no one else has weights being baked right now)	2023-08-19 15:06:33 -05:00
mrq	f7f6d3bf6d	validated that SpeechX tasks cse and nse works, added a method to test each task by invoking `python3 -m vall_e.data --action=tasks --tasks='sr,se,cse,nse'`	2023-08-19 09:50:07 -05:00
mrq	8f42c578c9	setting up for allowing training for a partial amount of the speechx tasks (do NOT try this at home yet without a proper model, as performance is predecated on having a solid base vall-e model for the tasks	2023-08-19 00:16:08 -05:00
mrq	508677fcd5	repaired auraloss loss calc during eval/val	2023-08-18 21:19:47 -05:00
mrq	fb4e816823	oops	2023-08-18 21:11:19 -05:00
mrq	5fa86182b5	oops	2023-08-14 10:50:40 -05:00
mrq	d7deaf6def	distributed training works now (hopefully)	2023-08-13 22:07:45 -05:00
mrq	c85101403f	big cleanup	2023-08-03 20:26:36 -05:00
mrq	7a06b27a9c	Tweaks	2023-08-02 22:06:39 +00:00
mrq	d88e43800b	adjustments	2023-08-02 22:01:49 +00:00
mrq	bf8cedc9dd	Rewrite init	2023-08-02 21:53:35 +00:00

46 Commits