Commit Graph

742 Commits

Author SHA1 Message Date
mrq
04fef5dad5 agony 2025-02-12 00:18:24 -06:00
mrq
e5916ea519 for my sanity it seems having extraneous tokens in the embedding/classifier has the loss/acc a little higher than it should 2025-02-11 14:47:35 -06:00
mrq
d4a6709fb4 stopgap cringe to get this training session working (it does not seem fruitful) 2025-02-11 13:45:09 -06:00
mrq
c0b46b82eb tweaks 2025-02-10 21:48:29 -06:00
mrq
d6a679ca5c tweaks 2025-02-10 20:53:08 -06:00
mrq
276a2342a4 tweaks to processing script 2025-02-10 19:18:13 -06:00
mrq
b3f9b76fd9 invalidate a path if loading via metadata and entry is not in hdf5 (to avoid reparsing my metadata since I'm using a partial copy of my dataset at the moment) 2025-02-10 14:43:15 -06:00
mrq
075ffef68a ugh 2025-02-09 13:02:51 -06:00
mrq
953015748f ugh 2025-02-07 20:49:28 -06:00
mrq
ed94b261dc could have sworn i had 'vall_e.emb.process --dtype' working, also possible RAM optimization so I can stop locking up my server when firing four encoding processes 2025-02-07 18:52:19 -06:00
mrq
47eb498046 more tweaks 2025-02-06 23:26:26 -06:00
mrq
67a9401cce oops 2025-02-06 15:14:14 -06:00
mrq
712ce4af5d maybe fixed errors with DAC backend, added option to limit by duration in emb.process (because I only really need short utternaces right now and I'm not ready to spend a week on processing everything again) 2025-02-06 12:37:18 -06:00
mrq
299cc88821 re-added amp encoding/decoding for audio, possible bad idea to ignore using amp instead if requested 2025-02-05 21:55:06 -06:00
mrq
7592befc53 updated vall_e.emb.process to allow for batched processing, some typo fixes (it's painfully slow on my 7900XTX...) 2025-02-05 21:13:20 -06:00
mrq
79c504c278 cleaned up encode/decode functions to make them a little more coherent, added option to batch encode/decode (would have been very nice in the past, but this should speed things up for me when i fall for the latest meme codec) 2025-02-05 20:54:31 -06:00
mrq
84174c1c1b oops 2025-02-05 10:25:03 -06:00
mrq
bb2ebe1ca2 fixed issues that may rise from updating transformers with attention, added nvidia/audio-codec-44khz backend support (by gutting everything necessary because I do NOT want to install more dependencies 2025-02-04 20:30:07 -06:00
mrq
0841f366e8 I should really just grab modelling_llama wholesale (fix for the adapted attention class) 2025-01-28 21:55:05 -06:00
mrq
e5f9da2221 oops 2025-01-21 11:59:24 -06:00
mrq
69c1d2991f updated mixtral backend (need this for something else) 2025-01-20 21:50:56 -06:00
mrq
1a26f789a5 added option to playback audio directly, removed no-phonemize option since I swear it worked in testing but it doesn't actually work 2025-01-12 21:52:49 -06:00
mrq
9fa87c417a added option to use raw text rather than the IPA phonemes (it requires a model trained on raw text) 2025-01-06 00:10:43 -06:00
mrq
3ab11bdc7b oops 2025-01-05 23:53:17 -06:00
mrq
b445f4abb6 experimental 2025-01-05 19:05:00 -06:00
mrq
2e6a7625e4 experimental 2025-01-05 12:47:03 -06:00
mrq
31cfef59c4 when you do more training thinking the original model that can do NS/SR got deleted but it was actually a string not having its quotes in the right place....... 2024-12-27 18:16:57 -06:00
mrq
9b0d2ccbe1 2024-12-26 21:42:17 -06:00
mrq
59f56ad099 cleaup 2024-12-24 23:14:32 -06:00
mrq
82e8592f2a working vall_e.cpp 2024-12-24 17:54:48 -06:00
mrq
497bdfc67b more work (the wall is non-causal decoding......) 2024-12-22 20:11:31 -06:00
mrq
5f289db275 ugh 2024-12-22 16:15:24 -06:00
mrq
0d4329d2e3 sanity cleanup 2024-12-22 15:05:45 -06:00
mrq
353e478e68 agony 2024-12-21 22:52:10 -06:00
mrq
5788db849b added extremely barebones vall_e.cpp so I can stop having to juggle this file around so much 2024-12-21 10:57:02 -06:00
mrq
91caf00212 ugh 2024-12-20 17:13:37 -06:00
mrq
d85273609e corrected export.py's --hf 2024-12-20 15:17:13 -06:00
mrq
59bf6b8b33 exposed additional task (ns, sr, vc) (vc is experimental) 2024-12-20 11:15:29 -06:00
mrq
53230efd74 changed prompt_inject_noise to prompt_inject_noise_p so I can have another reason to do this post-training 2024-12-19 19:28:50 -06:00
mrq
e7e7f48043 livid 2024-12-19 19:25:27 -06:00
mrq
8838babcba sanity checks (and I realized that the model actually had langs set to 4 in the yaml for KO/ZH so................ 2024-12-19 19:08:57 -06:00
mrq
7617b6485f instead just compute a bunch of stuff on the transcriptions to store later in different names so I can just retrieve what I want, also added tongue twisters for nefarious reasons 2024-12-18 23:43:11 -06:00
mrq
4775edaa41 added text cleaning/normalization for wer purposes but it amounts to nothing desu 2024-12-18 19:58:53 -06:00
mrq
9090c34f10 cringe script to process seed-tts-eval's eval dataset into something i can easily use 2024-12-17 22:47:12 -06:00
mrq
ed152f78df tweaks to prompt duration to allow me to divorce how i use it for training with how I'm using it for the demo page, and demo page tweaks to make my life easier 2024-12-17 19:33:04 -06:00
mrq
7129582303 actually do proper wer/cer calculation by un-normalizing the scores 2024-12-17 14:22:30 -06:00
mrq
c2c6d912ac actually do speaker verification 2024-12-17 10:11:14 -06:00
mrq
c2e17e287b really shoddy voice conversion implementation (it sort of works...) 2024-12-16 22:54:53 -06:00
mrq
8515038968 imagine my disappointment when the epoch finished just for it to throw an exception 2024-12-16 18:28:01 -06:00
mrq
4a65ac9eb7 oops 2024-12-15 17:21:51 -06:00
mrq
cd4a5f427c KO/ZH model soon 2024-12-15 17:01:14 -06:00
mrq
4800e7179a remove nan checks because it causes problems in distributed training because I'm not syncing between GPUs (and nan losses gets ignored anyways with loss scaling) 2024-12-15 09:42:54 -06:00
mrq
2ba6b483dc ugh 2024-12-14 22:43:51 -06:00
mrq
3dd31e74d1 finally figured out a clean way to handle "resuming" the tqdm bar 2024-12-14 18:44:43 -06:00
mrq
35389481ee move lazy-stored ortho matrix to the grad device for apollo because agony 2024-12-13 23:22:26 -06:00
mrq
09804ecc16 APOLLO tweaks to make it work with deepspeed 2024-12-13 23:03:52 -06:00
mrq
64c67160a3 tweaks 2024-12-13 19:00:35 -06:00
mrq
0fbfb8bbe8 actually save the optimizer for the local engine backend because safetensors doesn't save it 2024-12-12 17:12:59 -06:00
mrq
f41251f648 more fixes for local engine backend 2024-12-12 14:38:42 -06:00
mrq
6b237ae5e3 tweaks for the local engine orchestrator (that I never caught since I always used the deepspeed backend) 2024-12-12 13:37:38 -06:00
mrq
9a62e3b824 APOLLO cringe (doesn't want to work with deepspeed) 2024-12-12 00:31:58 -06:00
mrq
cddf8ca814 sort batches to try and reduce number of padded tokens in batched inference (also commented out F5 samples getting added to the demo page because I would have to regenerate them) 2024-12-11 22:45:38 -06:00
mrq
20b87bfbd0 store metrics and only recalculate them if the output file is newer than the metrics file 2024-12-11 20:55:43 -06:00
mrq
0c69e798f7 template cleanup 2024-12-11 20:06:55 -06:00
mrq
7e54e897f7 also shifted to transformer's pipeline for transcribing 2024-12-11 19:57:53 -06:00
mrq
b81a98799b uplifting transformer's WavLM stuff to do speaker verification instead 2024-12-11 19:30:05 -06:00
mrq
6468e5d124 lol 2024-12-11 19:10:32 -06:00
mrq
6f1ee0c6fa Added CER, transcription/similarity model args in demo 2024-12-10 21:00:51 -06:00
mrq
8568a93dad added WER/SIM-O metrics, added APOLLO but I need to test it 2024-12-10 20:13:21 -06:00
mrq
a6c745bafb chinese (mandarin?) support added (I guess I don't need pinyin, but tone markers are handled), korean validated, vocab adjusted 2024-12-09 14:26:19 -06:00
mrq
3ef8894290 oops 2024-12-08 15:24:21 -06:00
mrq
1d460b9fe3 logic fixes, I feel like output is better? (also NAR can have a temperature, I imagine it couldn't because it was having a causal masked passed to it for the longest time before I caught it a month ago) 2024-12-08 14:52:47 -06:00
mrq
0c5a458b00 deduce language per line to allow for a cheap way to allow for cross-lingual switching, kinda 2024-12-07 22:57:29 -06:00
mrq
a032ff588f doc update, added automatically deducing language from a given text, also checks if the input is already phonemized text to allow direct control without being cringe (procrastinating adding WER/SIM-O) 2024-12-07 22:34:25 -06:00
mrq
5d80a2d0d4 fixed NAR-len issues with non-english maybe (langs weren't being passed), added interface to inference in batches through tts.batched_inference (no support for rolling context/prefixes because there's no way to do that), demo page uses batched inferencing now 2024-12-07 19:21:05 -06:00
mrq
1f54bf5b40 revert sageattn back to optional dependency because it's not on windows, force resize_modules on by default because I broke something 2024-12-07 17:09:39 -06:00
mrq
218d0e29fd ugh (batchmean actually expects batch=seq_len, and not the actual batch) 2024-12-07 12:39:01 -06:00
mrq
61ed662856 ACTUALLY actually fix KD-loss (the -inf in the logits was caused by cringecode) 2024-12-07 12:31:54 -06:00
mrq
f97e8b0c7f ACTUALLY do KD-loss because of an oversight with masked_select outputting 1D tensors that get softmax'd in total 2024-12-07 09:52:51 -06:00
mrq
34a66e1052 agnostified KD 2024-12-06 23:53:46 -06:00
mrq
953d3eb030 ugh 2024-12-06 22:35:30 -06:00
mrq
42fafbaaca actually fixed knowledge distillation because of errant -inf logits causing problems and needed to be filtered (and splitting text language / output audio language because it helps) 2024-12-06 21:55:20 -06:00
mrq
23d402bf01 added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......) 2024-12-05 23:05:52 -06:00
mrq
4e21df8092 oops 2024-12-04 21:24:22 -06:00
mrq
93d27be539 rolling context finally (use last N utterances as the prefix for the next gen), option to split input text prompt by sentences instead of lines (or no splitting) 2024-12-04 20:31:44 -06:00
mrq
9dff68c0c5 NAR-len tweaks (remasks a small amount of tokens per step, it seems to help with reducing the number of steps needed some of the time?, disable CFG for the first half to speed things up) 2024-12-04 09:30:29 -06:00
mrq
cf97560e70 minimum CFG of 3 for NAR-len because it seems the model will auto-default to NAR-len now 2024-12-03 19:40:05 -06:00
mrq
ca31da0a95 sageattn (forgot to bother with testing this the other day, seems ifne) 2024-12-03 15:14:57 -06:00
mrq
31ab90d84a cringe code to convert to LlamaForCausalLM-happy weights + tokenizer dict (still need to write logic to actually use these weights for proper inferencing) 2024-12-03 10:18:58 -06:00
mrq
84a05acb6d touch ups in docs 2024-12-02 19:10:42 -06:00
mrq
dcaf38b359 fixed training tqdm being stubborn 2024-11-23 09:45:23 -06:00
mrq
41d7c30ea5 added much cleaner non-causal mask generation 2024-11-22 19:43:32 -06:00
mrq
c99a74e834 actually generate a causal mask because it seems sometimes it does not actually generate one because it makes assumptions 2024-11-22 18:30:24 -06:00
mrq
ccee5fc11c that was actually all pointless since sdpa always had an attention mask fed to it and does not need is_causal to implicitly generate one 2024-11-22 16:51:50 -06:00
mrq
4aa685e749 what has science done 2024-11-22 16:45:40 -06:00
mrq
147219a5e0 huge oversight in the attention masking......... (i realized I have not been providing a non-causal mask to non-causal tasks) 2024-11-22 13:44:43 -06:00
mrq
24d888c47c temporarily dropping support for xformers because it's breaking when using an attention mask (which i dont remember commenting it out when being passed), default to not use wandb because it's being a pain when doing tests and not actual sessionsS) 2024-11-22 11:29:12 -06:00
mrq
8aafae91fd dont use timeembedding 2024-11-21 23:14:52 -06:00
mrq
2cef97e43f cleanup 2024-11-21 23:08:43 -06:00
mrq
3fc0540f49 m 2024-11-21 15:07:46 -06:00