vall-e

ecker/vall-e

Fork 0

Commit Graph

Select branches

Hide Pull Requests

master

#1

#1

#2

#25

#25

7617b6485f instead just compute a bunch of stuff on the transcriptions to store later in different names so I can just retrieve what I want, also added tongue twisters for nefarious reasons mrq 2024-12-18 23:43:11 -0600
4775edaa41 added text cleaning/normalization for wer purposes but it amounts to nothing desu mrq 2024-12-18 19:58:53 -0600
9f2bd7f6e4 ugh mrq 2024-12-17 23:17:12 -0600
9090c34f10 cringe script to process seed-tts-eval's eval dataset into something i can easily use mrq 2024-12-17 22:47:12 -0600
ed152f78df tweaks to prompt duration to allow me to divorce how i use it for training with how I'm using it for the demo page, and demo page tweaks to make my life easier mrq 2024-12-17 19:33:04 -0600
7129582303 actually do proper wer/cer calculation by un-normalizing the scores mrq 2024-12-17 14:22:30 -0600
c2c6d912ac actually do speaker verification mrq 2024-12-17 10:11:14 -0600
c2e17e287b really shoddy voice conversion implementation (it sort of works...) mrq 2024-12-16 22:54:53 -0600
8515038968 imagine my disappointment when the epoch finished just for it to throw an exception mrq 2024-12-16 18:28:01 -0600
4a65ac9eb7 oops mrq 2024-12-15 17:21:51 -0600
cd4a5f427c KO/ZH model soon mrq 2024-12-15 17:01:14 -0600
4800e7179a remove nan checks because it causes problems in distributed training because I'm not syncing between GPUs (and nan losses gets ignored anyways with loss scaling) mrq 2024-12-15 09:42:54 -0600
2ba6b483dc ugh mrq 2024-12-14 22:43:51 -0600
3dd31e74d1 finally figured out a clean way to handle "resuming" the tqdm bar mrq 2024-12-14 18:44:43 -0600
35389481ee move lazy-stored ortho matrix to the grad device for apollo because agony mrq 2024-12-13 23:22:26 -0600
09804ecc16 APOLLO tweaks to make it work with deepspeed mrq 2024-12-13 23:03:52 -0600
64c67160a3 tweaks mrq 2024-12-13 19:00:35 -0600
0fbfb8bbe8 actually save the optimizer for the local engine backend because safetensors doesn't save it mrq 2024-12-12 17:12:59 -0600
f41251f648 more fixes for local engine backend mrq 2024-12-12 14:38:42 -0600
6b237ae5e3 tweaks for the local engine orchestrator (that I never caught since I always used the deepspeed backend) mrq 2024-12-12 13:37:38 -0600
9a62e3b824 APOLLO cringe (doesn't want to work with deepspeed) mrq 2024-12-12 00:31:58 -0600
cddf8ca814 sort batches to try and reduce number of padded tokens in batched inference (also commented out F5 samples getting added to the demo page because I would have to regenerate them) mrq 2024-12-11 22:45:38 -0600
20b87bfbd0 store metrics and only recalculate them if the output file is newer than the metrics file mrq 2024-12-11 20:55:43 -0600
0c69e798f7 template cleanup mrq 2024-12-11 20:06:55 -0600
7e54e897f7 also shifted to transformer's pipeline for transcribing mrq 2024-12-11 19:57:53 -0600
b81a98799b uplifting transformer's WavLM stuff to do speaker verification instead mrq 2024-12-11 19:30:05 -0600
6468e5d124 lol mrq 2024-12-11 19:10:32 -0600
6f1ee0c6fa Added CER, transcription/similarity model args in demo mrq 2024-12-10 21:00:51 -0600
8568a93dad added WER/SIM-O metrics, added APOLLO but I need to test it mrq 2024-12-10 20:13:21 -0600
fc5e6d8599 fixes to process_emilia.py script mrq 2024-12-09 14:38:09 -0600
a6c745bafb chinese (mandarin?) support added (I guess I don't need pinyin, but tone markers are handled), korean validated, vocab adjusted mrq 2024-12-09 14:26:19 -0600
3ef8894290 oops mrq 2024-12-08 15:24:21 -0600
1d460b9fe3 logic fixes, I feel like output is better? (also NAR can have a temperature, I imagine it couldn't because it was having a causal masked passed to it for the longest time before I caught it a month ago) mrq 2024-12-08 14:52:47 -0600
0c5a458b00 deduce language per line to allow for a cheap way to allow for cross-lingual switching, kinda mrq 2024-12-07 22:57:29 -0600
a032ff588f doc update, added automatically deducing language from a given text, also checks if the input is already phonemized text to allow direct control without being cringe (procrastinating adding WER/SIM-O) mrq 2024-12-07 22:34:25 -0600
5d80a2d0d4 fixed NAR-len issues with non-english maybe (langs weren't being passed), added interface to inference in batches through tts.batched_inference (no support for rolling context/prefixes because there's no way to do that), demo page uses batched inferencing now mrq 2024-12-07 19:21:05 -0600
1f54bf5b40 revert sageattn back to optional dependency because it's not on windows, force resize_modules on by default because I broke something mrq 2024-12-07 17:09:39 -0600
218d0e29fd ugh (batchmean actually expects batch=seq_len, and not the actual batch) mrq 2024-12-07 12:39:01 -0600
61ed662856 ACTUALLY actually fix KD-loss (the -inf in the logits was caused by cringecode) mrq 2024-12-07 12:31:54 -0600
f97e8b0c7f ACTUALLY do KD-loss because of an oversight with masked_select outputting 1D tensors that get softmax'd in total mrq 2024-12-07 09:52:51 -0600
34a66e1052 agnostified KD mrq 2024-12-06 23:53:46 -0600
953d3eb030 ugh mrq 2024-12-06 22:35:30 -0600
42fafbaaca actually fixed knowledge distillation because of errant -inf logits causing problems and needed to be filtered (and splitting text language / output audio language because it helps) mrq 2024-12-06 21:55:20 -0600
23d402bf01 added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......) mrq 2024-12-05 23:05:52 -0600
4e21df8092 oops mrq 2024-12-04 21:24:22 -0600
c66a53492c forgot to add NTLK as a dependency, promoted sageattn as a default dependency since it works fine enough and seems agnostic mrq 2024-12-04 20:33:25 -0600
93d27be539 rolling context finally (use last N utterances as the prefix for the next gen), option to split input text prompt by sentences instead of lines (or no splitting) mrq 2024-12-04 20:31:44 -0600
9dff68c0c5 NAR-len tweaks (remasks a small amount of tokens per step, it seems to help with reducing the number of steps needed some of the time?, disable CFG for the first half to speed things up) mrq 2024-12-04 09:30:29 -0600
cf97560e70 minimum CFG of 3 for NAR-len because it seems the model will auto-default to NAR-len now mrq 2024-12-03 19:40:05 -0600
ca31da0a95 sageattn (forgot to bother with testing this the other day, seems ifne) mrq 2024-12-03 15:14:57 -0600
31ab90d84a cringe code to convert to LlamaForCausalLM-happy weights + tokenizer dict (still need to write logic to actually use these weights for proper inferencing) mrq 2024-12-03 10:18:58 -0600
84a05acb6d touch ups in docs mrq 2024-12-02 19:10:42 -0600
dcaf38b359 fixed training tqdm being stubborn mrq 2024-11-23 09:45:23 -0600
41d7c30ea5 added much cleaner non-causal mask generation mrq 2024-11-22 19:43:32 -0600
c99a74e834 actually generate a causal mask because it seems sometimes it does not actually generate one because it makes assumptions mrq 2024-11-22 18:30:24 -0600
ccee5fc11c that was actually all pointless since sdpa always had an attention mask fed to it and does not need is_causal to implicitly generate one mrq 2024-11-22 16:51:50 -0600
4aa685e749 what has science done mrq 2024-11-22 16:45:40 -0600
147219a5e0 huge oversight in the attention masking......... (i realized I have not been providing a non-causal mask to non-causal tasks) mrq 2024-11-22 13:44:43 -0600
24d888c47c temporarily dropping support for xformers because it's breaking when using an attention mask (which i dont remember commenting it out when being passed), default to not use wandb because it's being a pain when doing tests and not actual sessionsS) mrq 2024-11-22 11:29:12 -0600
8aafae91fd dont use timeembedding mrq 2024-11-21 23:14:52 -0600
2cef97e43f cleanup mrq 2024-11-21 23:08:43 -0600
3fc0540f49 m mrq 2024-11-21 15:07:46 -0600
6845c447c9 added more harvard sentences to load from a text file mrq 2024-11-21 13:18:11 -0600
2a084544e8 moved duration padding for NAR-len to be a scalar instead (since it seems longer utterances need it much more so than shorter utterances) mrq 2024-11-21 13:04:07 -0600
6aee08f9c0 moved stuff in the web UI around (un-experimented the max NAR-len steps because its kind of important to adjust this value for better sounding audio / quicker generated audio) mrq 2024-11-20 20:37:33 -0600
dfdba3f190 oops mrq 2024-11-20 19:21:03 -0600
cd6e9ba2f2 oops mrq 2024-11-20 16:27:51 -0600
1a73ac6a20 I cannot believe it's not actually called Wand DB (added wandb logging support since I think it would have been a much better way to look at my metrics) mrq 2024-11-20 16:10:47 -0600
67f7bad168 added mixed modality AR+NAR-len to generate a short prefix through the AR, then inference with said prefix through the NAR-len (need to experiment with it more to ensure that the masked off tokens are the only tokens getting updated) mrq 2024-11-20 14:22:12 -0600
db64e6cb59 dependency updates (gradio 5.x now works on my machine) mrq 2024-11-20 12:33:01 -0600
efeb55e1b7 documentation update mrq 2024-11-19 19:19:34 -0600
b1369e7824 better modality selection (pick AR+NAR by default for the ar+nar model, pick NAR-len by default for the nar-len model), lowered default CFG because it makes the AR+NAR output sped up (but can't be too low since it's required for the NAR-len) mrq 2024-11-19 18:51:17 -0600
190a917b3e I did it. mrq 2024-11-19 12:24:33 -0600
0e621354e7 cleaned up classifier-free guidance logit processing (in order to try and cope with a bad nar-len model) mrq 2024-11-19 10:30:05 -0600
5ba80686e1 two weeks of agony concludes mrq 2024-11-18 21:29:28 -0600
2b29790173 oops mrq 2024-11-18 14:12:26 -0600
4a71981456 normalize sampler index by batch size (if not using batched sampler), add option to cap out utterances for a speaker, some other things mrq 2024-11-18 12:46:50 -0600
6cfdf94bf9 swap priority to use nar-len if available, added notes mrq 2024-11-18 09:40:04 -0600
069b27570f set option to set training masking ratio (I don't think for tts a fixed masking ratio is beneficial since the magic of the AR+NAR is being able to still reference the prior sequence of tokens for predicting things) mrq 2024-11-17 17:04:07 -0600
88d840218d default set cfg strength to 3.0 since the reference model is updated mrq 2024-11-17 10:23:40 -0600
a3e1fa3518 ugh mrq 2024-11-17 09:28:33 -0600
23fdba0c98 tweaks and changes mrq 2024-11-16 15:49:06 -0600
2fbeacfe92 ugh mrq 2024-11-14 22:18:33 -0600
39096f8ff3 redid loss calculation to be cleaner, and position ID generation, and other things (I might need to train the NAR-len from scratch and not resume from an existing checkpoint.........) mrq 2024-11-14 22:17:47 -0600
ef05c951ff adjust fp16 loss scaling since I fried a model overnight when it hit 8K scale mrq 2024-11-14 09:23:52 -0600
e412e98125 ugh mrq 2024-11-14 07:34:22 -0600
c00fc18b62 actually use the right embedding for nar-len mrq 2024-11-13 18:04:04 -0600
3ea8a610d6 fix STT mrq 2024-11-13 14:27:15 -0600
910033343c overhauled how the right resp level / classifier gets picked to avoid cringemath mrq 2024-11-13 13:31:17 -0600
269648605e move NAR-len rvq level 0 to separate embedding mrq 2024-11-13 11:38:58 -0600
29e45be0b4 tweaks to bucket sampling mrq 2024-11-13 11:09:24 -0600
b2eca271a8 ugh mrq 2024-11-13 10:35:44 -0600
be83ddabaa better causal-ness for split loss calc, and also do masking for NAR-len for it mrq 2024-11-13 10:17:52 -0600
6b76419123 ugh mrq 2024-11-13 09:54:20 -0600
ad7cfffc00 NAR-len RVQ-0 was being trained causally............. mrq 2024-11-13 09:43:50 -0600
976ee87f6f resume iteration step in tqdm trainer, warn to logger if the sampler state dict was invalidated mrq 2024-11-13 09:09:28 -0600
8286aa54c8 do not pass timestep token/embedding since it doesn't seem to matter at all after all, fixed training masking rate to 80% because a paper said so mrq 2024-11-13 09:07:10 -0600
caf721c67b set it to zero because it'll make the stop token hide more often than not mrq 2024-11-12 22:30:50 -0600
0f2584eba7 new meme sampler PogChamp new meme sampler PogChamp (it sort of helps?) mrq 2024-11-12 22:30:09 -0600
663f07038d haha... (do not create a token dropout/noise mask when not training (this sadly didnt fix NAR-len output)) mrq 2024-11-12 16:41:58 -0600

Commit Graph Select branches Hide Pull Requests master #1 #1 #2 #25 #25 Mono Color

Commit Graph

Select branches

Hide Pull Requests

master

#1

#1

#2

#25

#25