vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	6ae282e090	re-added noise dataloader sampler whatever for the old implementation's other tasks that require it	2025-03-28 15:07:06 -05:00
mrq	8641c87611	nothing could go wrong part 2 (reverted and rewrote commits since there was a nasty regression)	2025-03-25 23:06:16 -05:00
mrq	c5475ebc91	another dataloader optimization	2025-03-15 20:18:58 -05:00
mrq	bee2688dea	ugh	2025-03-15 16:50:21 -05:00
mrq	2053580838	updated dataloader to hopefully reduce RAM usage	2025-03-15 13:14:37 -05:00
mrq	00d1fed217	another optimization (within the dataloader because the similar utterance sampler was mondo slow)	2025-03-08 17:10:50 -06:00
mrq	dbd34b6430	add specialized calc_loss because schizo	2025-03-07 18:44:11 -06:00
mrq	3f1070f575	tweaks	2025-03-02 22:36:25 -06:00
mrq	b97faa8173	fixes...	2025-02-28 18:53:07 -06:00
mrq	06ef3daf3c	require minimum of 1 second durations for training because of my slop code auto-transposing that I don't wanna fix right now	2025-02-26 22:00:33 -06:00
mrq	2ea387c08a	segregated experimental changes into its own streamlined file to avoid breaking the existing model, and it can pivot to the cleaned up code if it actually works (nothing is working)	2025-02-26 21:26:13 -06:00
mrq	d33ccd188a	ugh	2025-02-23 12:31:07 -06:00
mrq	67a6009555	(finally) added parallel AR for cfg.model.version >= 7 (nvidia/audio-codec-44khz is being a pain and it might require training purely AR first......)	2025-02-23 08:31:03 -06:00
mrq	15b3c20e19	also throw exception for zero'd out tensor during training (I am very paranoid now)	2025-02-22 14:09:41 -06:00
mrq	ab0abd2b12	fixes fixes fixes (a quarter of my recently processed audio returned zero'd tensors......)	2025-02-22 09:07:33 -06:00
mrq	e8f182b634	cleaned up loss calc code (it REALLY hates ignore_loss_for_inputs, but is fine with splitting with loss factors)	2025-02-13 09:35:27 -06:00
mrq	e029a8804d	ironically none of this cruft gets the loss lower than the original way	2025-02-12 11:17:00 -06:00
mrq	e5916ea519	for my sanity it seems having extraneous tokens in the embedding/classifier has the loss/acc a little higher than it should	2025-02-11 14:47:35 -06:00
mrq	d4a6709fb4	stopgap cringe to get this training session working (it does not seem fruitful)	2025-02-11 13:45:09 -06:00
mrq	d6a679ca5c	tweaks	2025-02-10 20:53:08 -06:00
mrq	b3f9b76fd9	invalidate a path if loading via metadata and entry is not in hdf5 (to avoid reparsing my metadata since I'm using a partial copy of my dataset at the moment)	2025-02-10 14:43:15 -06:00
mrq	47eb498046	more tweaks	2025-02-06 23:26:26 -06:00
mrq	3ab11bdc7b	oops	2025-01-05 23:53:17 -06:00
mrq	2e6a7625e4	experimental	2025-01-05 12:47:03 -06:00
mrq	9b0d2ccbe1		2024-12-26 21:42:17 -06:00
mrq	d85273609e	corrected export.py's --hf	2024-12-20 15:17:13 -06:00
mrq	53230efd74	changed prompt_inject_noise to prompt_inject_noise_p so I can have another reason to do this post-training	2024-12-19 19:28:50 -06:00
mrq	8838babcba	sanity checks (and I realized that the model actually had langs set to 4 in the yaml for KO/ZH so................	2024-12-19 19:08:57 -06:00
mrq	7617b6485f	instead just compute a bunch of stuff on the transcriptions to store later in different names so I can just retrieve what I want, also added tongue twisters for nefarious reasons	2024-12-18 23:43:11 -06:00
mrq	4775edaa41	added text cleaning/normalization for wer purposes but it amounts to nothing desu	2024-12-18 19:58:53 -06:00
mrq	ed152f78df	tweaks to prompt duration to allow me to divorce how i use it for training with how I'm using it for the demo page, and demo page tweaks to make my life easier	2024-12-17 19:33:04 -06:00
mrq	9a62e3b824	APOLLO cringe (doesn't want to work with deepspeed)	2024-12-12 00:31:58 -06:00
mrq	20b87bfbd0	store metrics and only recalculate them if the output file is newer than the metrics file	2024-12-11 20:55:43 -06:00
mrq	6468e5d124	lol	2024-12-11 19:10:32 -06:00
mrq	8568a93dad	added WER/SIM-O metrics, added APOLLO but I need to test it	2024-12-10 20:13:21 -06:00
mrq	a6c745bafb	chinese (mandarin?) support added (I guess I don't need pinyin, but tone markers are handled), korean validated, vocab adjusted	2024-12-09 14:26:19 -06:00
mrq	1d460b9fe3	logic fixes, I feel like output is better? (also NAR can have a temperature, I imagine it couldn't because it was having a causal masked passed to it for the longest time before I caught it a month ago)	2024-12-08 14:52:47 -06:00
mrq	4e21df8092	oops	2024-12-04 21:24:22 -06:00
mrq	93d27be539	rolling context finally (use last N utterances as the prefix for the next gen), option to split input text prompt by sentences instead of lines (or no splitting)	2024-12-04 20:31:44 -06:00
mrq	dcaf38b359	fixed training tqdm being stubborn	2024-11-23 09:45:23 -06:00
mrq	24d888c47c	temporarily dropping support for xformers because it's breaking when using an attention mask (which i dont remember commenting it out when being passed), default to not use wandb because it's being a pain when doing tests and not actual sessionsS)	2024-11-22 11:29:12 -06:00
mrq	6845c447c9	added more harvard sentences to load from a text file	2024-11-21 13:18:11 -06:00
mrq	2b29790173	oops	2024-11-18 14:12:26 -06:00
mrq	4a71981456	normalize sampler index by batch size (if not using batched sampler), add option to cap out utterances for a speaker, some other things	2024-11-18 12:46:50 -06:00
mrq	39096f8ff3	redid loss calculation to be cleaner, and position ID generation, and other things (I might need to train the NAR-len from scratch and not resume from an existing checkpoint.........)	2024-11-14 22:17:47 -06:00
mrq	e412e98125	ugh	2024-11-14 07:34:22 -06:00
mrq	c00fc18b62	actually use the right embedding for nar-len	2024-11-13 18:04:04 -06:00
mrq	976ee87f6f	resume iteration step in tqdm trainer, warn to logger if the sampler state dict was invalidated	2024-11-13 09:09:28 -06:00
mrq	0f2584eba7	new meme sampler PogChamp new meme sampler PogChamp (it sort of helps?)	2024-11-12 22:30:09 -06:00
mrq	2495a7ef67	Fixed STT in the web UI	2024-11-12 12:49:53 -06:00

1 2 3 4 5

216 Commits