Commit Graph

  • bb2ebe1ca2 fixed issues that may rise from updating transformers with attention, added nvidia/audio-codec-44khz backend support (by gutting everything necessary because I do NOT want to install more dependencies master mrq 2025-02-04 20:30:07 -0600
  • 0841f366e8 I should really just grab modelling_llama wholesale (fix for the adapted attention class) mrq 2025-01-28 21:55:05 -0600
  • e5f9da2221 oops mrq 2025-01-21 11:59:24 -0600
  • 69c1d2991f updated mixtral backend (need this for something else) mrq 2025-01-20 21:50:56 -0600
  • 1a26f789a5 added option to playback audio directly, removed no-phonemize option since I swear it worked in testing but it doesn't actually work mrq 2025-01-12 21:52:49 -0600
  • 9fa87c417a added option to use raw text rather than the IPA phonemes (it requires a model trained on raw text) mrq 2025-01-06 00:10:43 -0600
  • 3ab11bdc7b oops mrq 2025-01-05 23:53:17 -0600
  • b445f4abb6 experimental mrq 2025-01-05 19:05:00 -0600
  • 2e6a7625e4 experimental mrq 2025-01-05 12:47:03 -0600
  • 31cfef59c4 when you do more training thinking the original model that can do NS/SR got deleted but it was actually a string not having its quotes in the right place....... mrq 2024-12-27 18:16:57 -0600
  • 9b0d2ccbe1 mrq 2024-12-26 21:42:17 -0600
  • 25a02f2c3f oops mrq 2024-12-25 00:36:19 -0600
  • b9d2cd5513 vall_e.cpp cli mrq 2024-12-25 00:28:34 -0600
  • 59f56ad099 cleaup mrq 2024-12-24 23:14:32 -0600
  • 6bf59bbd8b vall_e.cpp phonemizing and tokenizing mrq 2024-12-24 22:39:32 -0600
  • 8516bab15c cleanup mrq 2024-12-24 20:29:03 -0600
  • 82e8592f2a working vall_e.cpp mrq 2024-12-24 17:54:48 -0600
  • 2b4d783299 ugh mrq 2024-12-23 23:42:44 -0600
  • 532200de2a nvm fixed mrq 2024-12-23 22:23:43 -0600
  • f62f99b8de more work on vall_e.cpp (need to resolve why the embeddings (and maybe the weights as a whole) are different from the base model) mrq 2024-12-23 20:36:40 -0600
  • 6ecdb715b6 more work on vall_e.cpp (some more cleanup, NAR-len demasking, but still need to iron out some kinks) mrq 2024-12-23 17:20:04 -0600
  • a6945f981d vall_e.cpp cleanup (having to keep a map of something that can work without touching llama.cpp AND something minimally invasive, AND adhere to a C++ style that isn't mine, is making me bipolar) mrq 2024-12-23 14:16:16 -0600
  • 497bdfc67b more work (the wall is non-causal decoding......) mrq 2024-12-22 20:11:31 -0600
  • 5f289db275 ugh mrq 2024-12-22 16:15:24 -0600
  • 0d4329d2e3 sanity cleanup mrq 2024-12-22 15:05:45 -0600
  • 353e478e68 agony mrq 2024-12-21 22:52:10 -0600
  • 2542ed067d ugh mrq 2024-12-21 19:59:56 -0600
  • 70a0f5724b i hate learning APIs so much mrq 2024-12-21 19:40:19 -0600
  • 1b4a69ce29 more updates to vall_e.cpp mrq 2024-12-21 19:16:44 -0600
  • 503124d0d3 crammed encodec.cpp in mrq 2024-12-21 15:48:12 -0600
  • 979c1f797c quant mrq 2024-12-21 11:56:22 -0600
  • 5788db849b added extremely barebones vall_e.cpp so I can stop having to juggle this file around so much mrq 2024-12-21 10:57:02 -0600
  • 91caf00212 ugh mrq 2024-12-20 17:13:37 -0600
  • d85273609e corrected export.py's --hf mrq 2024-12-20 15:17:13 -0600
  • 59bf6b8b33 exposed additional task (ns, sr, vc) (vc is experimental) mrq 2024-12-20 11:15:29 -0600
  • 53230efd74 changed prompt_inject_noise to prompt_inject_noise_p so I can have another reason to do this post-training mrq 2024-12-19 19:28:50 -0600
  • e7e7f48043 livid mrq 2024-12-19 19:25:27 -0600
  • 8838babcba sanity checks (and I realized that the model actually had langs set to 4 in the yaml for KO/ZH so................ mrq 2024-12-19 19:08:57 -0600
  • 7617b6485f instead just compute a bunch of stuff on the transcriptions to store later in different names so I can just retrieve what I want, also added tongue twisters for nefarious reasons mrq 2024-12-18 23:43:11 -0600
  • 4775edaa41 added text cleaning/normalization for wer purposes but it amounts to nothing desu mrq 2024-12-18 19:58:53 -0600
  • 9f2bd7f6e4 ugh mrq 2024-12-17 23:17:12 -0600
  • 9090c34f10 cringe script to process seed-tts-eval's eval dataset into something i can easily use mrq 2024-12-17 22:47:12 -0600
  • ed152f78df tweaks to prompt duration to allow me to divorce how i use it for training with how I'm using it for the demo page, and demo page tweaks to make my life easier mrq 2024-12-17 19:33:04 -0600
  • 7129582303 actually do proper wer/cer calculation by un-normalizing the scores mrq 2024-12-17 14:22:30 -0600
  • c2c6d912ac actually do speaker verification mrq 2024-12-17 10:11:14 -0600
  • c2e17e287b really shoddy voice conversion implementation (it sort of works...) mrq 2024-12-16 22:54:53 -0600
  • 8515038968 imagine my disappointment when the epoch finished just for it to throw an exception mrq 2024-12-16 18:28:01 -0600
  • 4a65ac9eb7 oops mrq 2024-12-15 17:21:51 -0600
  • cd4a5f427c KO/ZH model soon mrq 2024-12-15 17:01:14 -0600
  • 4800e7179a remove nan checks because it causes problems in distributed training because I'm not syncing between GPUs (and nan losses gets ignored anyways with loss scaling) mrq 2024-12-15 09:42:54 -0600
  • 2ba6b483dc ugh mrq 2024-12-14 22:43:51 -0600
  • 3dd31e74d1 finally figured out a clean way to handle "resuming" the tqdm bar mrq 2024-12-14 18:44:43 -0600
  • 35389481ee move lazy-stored ortho matrix to the grad device for apollo because agony mrq 2024-12-13 23:22:26 -0600
  • 09804ecc16 APOLLO tweaks to make it work with deepspeed mrq 2024-12-13 23:03:52 -0600
  • 64c67160a3 tweaks mrq 2024-12-13 19:00:35 -0600
  • 0fbfb8bbe8 actually save the optimizer for the local engine backend because safetensors doesn't save it mrq 2024-12-12 17:12:59 -0600
  • f41251f648 more fixes for local engine backend mrq 2024-12-12 14:38:42 -0600
  • 6b237ae5e3 tweaks for the local engine orchestrator (that I never caught since I always used the deepspeed backend) mrq 2024-12-12 13:37:38 -0600
  • 9a62e3b824 APOLLO cringe (doesn't want to work with deepspeed) mrq 2024-12-12 00:31:58 -0600
  • cddf8ca814 sort batches to try and reduce number of padded tokens in batched inference (also commented out F5 samples getting added to the demo page because I would have to regenerate them) mrq 2024-12-11 22:45:38 -0600
  • 20b87bfbd0 store metrics and only recalculate them if the output file is newer than the metrics file mrq 2024-12-11 20:55:43 -0600
  • 0c69e798f7 template cleanup mrq 2024-12-11 20:06:55 -0600
  • 7e54e897f7 also shifted to transformer's pipeline for transcribing mrq 2024-12-11 19:57:53 -0600
  • b81a98799b uplifting transformer's WavLM stuff to do speaker verification instead mrq 2024-12-11 19:30:05 -0600
  • 6468e5d124 lol mrq 2024-12-11 19:10:32 -0600
  • 6f1ee0c6fa Added CER, transcription/similarity model args in demo mrq 2024-12-10 21:00:51 -0600
  • 8568a93dad added WER/SIM-O metrics, added APOLLO but I need to test it mrq 2024-12-10 20:13:21 -0600
  • fc5e6d8599 fixes to process_emilia.py script mrq 2024-12-09 14:38:09 -0600
  • a6c745bafb chinese (mandarin?) support added (I guess I don't need pinyin, but tone markers are handled), korean validated, vocab adjusted mrq 2024-12-09 14:26:19 -0600
  • 3ef8894290 oops mrq 2024-12-08 15:24:21 -0600
  • 1d460b9fe3 logic fixes, I feel like output is better? (also NAR can have a temperature, I imagine it couldn't because it was having a causal masked passed to it for the longest time before I caught it a month ago) mrq 2024-12-08 14:52:47 -0600
  • 0c5a458b00 deduce language per line to allow for a cheap way to allow for cross-lingual switching, kinda mrq 2024-12-07 22:57:29 -0600
  • a032ff588f doc update, added automatically deducing language from a given text, also checks if the input is already phonemized text to allow direct control without being cringe (procrastinating adding WER/SIM-O) mrq 2024-12-07 22:34:25 -0600
  • 5d80a2d0d4 fixed NAR-len issues with non-english maybe (langs weren't being passed), added interface to inference in batches through tts.batched_inference (no support for rolling context/prefixes because there's no way to do that), demo page uses batched inferencing now mrq 2024-12-07 19:21:05 -0600
  • 1f54bf5b40 revert sageattn back to optional dependency because it's not on windows, force resize_modules on by default because I broke something mrq 2024-12-07 17:09:39 -0600
  • 218d0e29fd ugh (batchmean actually expects batch=seq_len, and not the actual batch) mrq 2024-12-07 12:39:01 -0600
  • 61ed662856 ACTUALLY actually fix KD-loss (the -inf in the logits was caused by cringecode) mrq 2024-12-07 12:31:54 -0600
  • f97e8b0c7f ACTUALLY do KD-loss because of an oversight with masked_select outputting 1D tensors that get softmax'd in total mrq 2024-12-07 09:52:51 -0600
  • 34a66e1052 agnostified KD mrq 2024-12-06 23:53:46 -0600
  • 953d3eb030 ugh mrq 2024-12-06 22:35:30 -0600
  • 42fafbaaca actually fixed knowledge distillation because of errant -inf logits causing problems and needed to be filtered (and splitting text language / output audio language because it helps) mrq 2024-12-06 21:55:20 -0600
  • 23d402bf01 added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......) mrq 2024-12-05 23:05:52 -0600
  • 4e21df8092 oops mrq 2024-12-04 21:24:22 -0600
  • c66a53492c forgot to add NTLK as a dependency, promoted sageattn as a default dependency since it works fine enough and seems agnostic mrq 2024-12-04 20:33:25 -0600
  • 93d27be539 rolling context finally (use last N utterances as the prefix for the next gen), option to split input text prompt by sentences instead of lines (or no splitting) mrq 2024-12-04 20:31:44 -0600
  • 9dff68c0c5 NAR-len tweaks (remasks a small amount of tokens per step, it seems to help with reducing the number of steps needed some of the time?, disable CFG for the first half to speed things up) mrq 2024-12-04 09:30:29 -0600
  • cf97560e70 minimum CFG of 3 for NAR-len because it seems the model will auto-default to NAR-len now mrq 2024-12-03 19:40:05 -0600
  • ca31da0a95 sageattn (forgot to bother with testing this the other day, seems ifne) mrq 2024-12-03 15:14:57 -0600
  • 31ab90d84a cringe code to convert to LlamaForCausalLM-happy weights + tokenizer dict (still need to write logic to actually use these weights for proper inferencing) mrq 2024-12-03 10:18:58 -0600
  • 84a05acb6d touch ups in docs mrq 2024-12-02 19:10:42 -0600
  • dcaf38b359 fixed training tqdm being stubborn mrq 2024-11-23 09:45:23 -0600
  • 41d7c30ea5 added much cleaner non-causal mask generation mrq 2024-11-22 19:43:32 -0600
  • c99a74e834 actually generate a causal mask because it seems sometimes it does not actually generate one because it makes assumptions mrq 2024-11-22 18:30:24 -0600
  • ccee5fc11c that was actually all pointless since sdpa always had an attention mask fed to it and does not need is_causal to implicitly generate one mrq 2024-11-22 16:51:50 -0600
  • 4aa685e749 what has science done mrq 2024-11-22 16:45:40 -0600
  • 147219a5e0 huge oversight in the attention masking......... (i realized I have not been providing a non-causal mask to non-causal tasks) mrq 2024-11-22 13:44:43 -0600
  • 24d888c47c temporarily dropping support for xformers because it's breaking when using an attention mask (which i dont remember commenting it out when being passed), default to not use wandb because it's being a pain when doing tests and not actual sessionsS) mrq 2024-11-22 11:29:12 -0600
  • 8aafae91fd dont use timeembedding mrq 2024-11-21 23:14:52 -0600
  • 2cef97e43f cleanup mrq 2024-11-21 23:08:43 -0600
  • 3fc0540f49 m mrq 2024-11-21 15:07:46 -0600