vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	59f56ad099	cleaup	2024-12-24 23:14:32 -06:00
mrq	82e8592f2a	working vall_e.cpp	2024-12-24 17:54:48 -06:00
mrq	497bdfc67b	more work (the wall is non-causal decoding......)	2024-12-22 20:11:31 -06:00
mrq	5f289db275	ugh	2024-12-22 16:15:24 -06:00
mrq	0d4329d2e3	sanity cleanup	2024-12-22 15:05:45 -06:00
mrq	353e478e68	agony	2024-12-21 22:52:10 -06:00
mrq	5788db849b	added extremely barebones vall_e.cpp so I can stop having to juggle this file around so much	2024-12-21 10:57:02 -06:00
mrq	91caf00212	ugh	2024-12-20 17:13:37 -06:00
mrq	d85273609e	corrected export.py's --hf	2024-12-20 15:17:13 -06:00
mrq	59bf6b8b33	exposed additional task (ns, sr, vc) (vc is experimental)	2024-12-20 11:15:29 -06:00
mrq	53230efd74	changed prompt_inject_noise to prompt_inject_noise_p so I can have another reason to do this post-training	2024-12-19 19:28:50 -06:00
mrq	e7e7f48043	livid	2024-12-19 19:25:27 -06:00
mrq	8838babcba	sanity checks (and I realized that the model actually had langs set to 4 in the yaml for KO/ZH so................	2024-12-19 19:08:57 -06:00
mrq	7617b6485f	instead just compute a bunch of stuff on the transcriptions to store later in different names so I can just retrieve what I want, also added tongue twisters for nefarious reasons	2024-12-18 23:43:11 -06:00
mrq	4775edaa41	added text cleaning/normalization for wer purposes but it amounts to nothing desu	2024-12-18 19:58:53 -06:00
mrq	9090c34f10	cringe script to process seed-tts-eval's eval dataset into something i can easily use	2024-12-17 22:47:12 -06:00
mrq	ed152f78df	tweaks to prompt duration to allow me to divorce how i use it for training with how I'm using it for the demo page, and demo page tweaks to make my life easier	2024-12-17 19:33:04 -06:00
mrq	7129582303	actually do proper wer/cer calculation by un-normalizing the scores	2024-12-17 14:22:30 -06:00
mrq	c2c6d912ac	actually do speaker verification	2024-12-17 10:11:14 -06:00
mrq	c2e17e287b	really shoddy voice conversion implementation (it sort of works...)	2024-12-16 22:54:53 -06:00
mrq	8515038968	imagine my disappointment when the epoch finished just for it to throw an exception	2024-12-16 18:28:01 -06:00
mrq	4a65ac9eb7	oops	2024-12-15 17:21:51 -06:00
mrq	cd4a5f427c	KO/ZH model soon	2024-12-15 17:01:14 -06:00
mrq	4800e7179a	remove nan checks because it causes problems in distributed training because I'm not syncing between GPUs (and nan losses gets ignored anyways with loss scaling)	2024-12-15 09:42:54 -06:00
mrq	2ba6b483dc	ugh	2024-12-14 22:43:51 -06:00
mrq	3dd31e74d1	finally figured out a clean way to handle "resuming" the tqdm bar	2024-12-14 18:44:43 -06:00
mrq	35389481ee	move lazy-stored ortho matrix to the grad device for apollo because agony	2024-12-13 23:22:26 -06:00
mrq	09804ecc16	APOLLO tweaks to make it work with deepspeed	2024-12-13 23:03:52 -06:00
mrq	64c67160a3	tweaks	2024-12-13 19:00:35 -06:00
mrq	0fbfb8bbe8	actually save the optimizer for the local engine backend because safetensors doesn't save it	2024-12-12 17:12:59 -06:00
mrq	f41251f648	more fixes for local engine backend	2024-12-12 14:38:42 -06:00
mrq	6b237ae5e3	tweaks for the local engine orchestrator (that I never caught since I always used the deepspeed backend)	2024-12-12 13:37:38 -06:00
mrq	9a62e3b824	APOLLO cringe (doesn't want to work with deepspeed)	2024-12-12 00:31:58 -06:00
mrq	cddf8ca814	sort batches to try and reduce number of padded tokens in batched inference (also commented out F5 samples getting added to the demo page because I would have to regenerate them)	2024-12-11 22:45:38 -06:00
mrq	20b87bfbd0	store metrics and only recalculate them if the output file is newer than the metrics file	2024-12-11 20:55:43 -06:00
mrq	0c69e798f7	template cleanup	2024-12-11 20:06:55 -06:00
mrq	7e54e897f7	also shifted to transformer's pipeline for transcribing	2024-12-11 19:57:53 -06:00
mrq	b81a98799b	uplifting transformer's WavLM stuff to do speaker verification instead	2024-12-11 19:30:05 -06:00
mrq	6468e5d124	lol	2024-12-11 19:10:32 -06:00
mrq	6f1ee0c6fa	Added CER, transcription/similarity model args in demo	2024-12-10 21:00:51 -06:00
mrq	8568a93dad	added WER/SIM-O metrics, added APOLLO but I need to test it	2024-12-10 20:13:21 -06:00
mrq	a6c745bafb	chinese (mandarin?) support added (I guess I don't need pinyin, but tone markers are handled), korean validated, vocab adjusted	2024-12-09 14:26:19 -06:00
mrq	3ef8894290	oops	2024-12-08 15:24:21 -06:00
mrq	1d460b9fe3	logic fixes, I feel like output is better? (also NAR can have a temperature, I imagine it couldn't because it was having a causal masked passed to it for the longest time before I caught it a month ago)	2024-12-08 14:52:47 -06:00
mrq	0c5a458b00	deduce language per line to allow for a cheap way to allow for cross-lingual switching, kinda	2024-12-07 22:57:29 -06:00
mrq	a032ff588f	doc update, added automatically deducing language from a given text, also checks if the input is already phonemized text to allow direct control without being cringe (procrastinating adding WER/SIM-O)	2024-12-07 22:34:25 -06:00
mrq	5d80a2d0d4	fixed NAR-len issues with non-english maybe (langs weren't being passed), added interface to inference in batches through tts.batched_inference (no support for rolling context/prefixes because there's no way to do that), demo page uses batched inferencing now	2024-12-07 19:21:05 -06:00
mrq	1f54bf5b40	revert sageattn back to optional dependency because it's not on windows, force resize_modules on by default because I broke something	2024-12-07 17:09:39 -06:00
mrq	218d0e29fd	ugh (batchmean actually expects batch=seq_len, and not the actual batch)	2024-12-07 12:39:01 -06:00
mrq	61ed662856	ACTUALLY actually fix KD-loss (the -inf in the logits was caused by cringecode)	2024-12-07 12:31:54 -06:00

1 2 3 4 5 ...

664 Commits