bb2ebe1ca2fixed issues that may rise from updating transformers with attention, added nvidia/audio-codec-44khz backend support (by gutting everything necessary because I do NOT want to install more dependencies
master
mrq2025-02-04 20:30:07 -0600
0841f366e8I should really just grab modelling_llama wholesale (fix for the adapted attention class)mrq2025-01-28 21:55:05 -0600
69c1d2991fupdated mixtral backend (need this for something else)mrq2025-01-20 21:50:56 -0600
1a26f789a5added option to playback audio directly, removed no-phonemize option since I swear it worked in testing but it doesn't actually workmrq2025-01-12 21:52:49 -0600
9fa87c417aadded option to use raw text rather than the IPA phonemes (it requires a model trained on raw text)mrq2025-01-06 00:10:43 -0600
31cfef59c4when you do more training thinking the original model that can do NS/SR got deleted but it was actually a string not having its quotes in the right place.......mrq2024-12-27 18:16:57 -0600
f62f99b8demore work on vall_e.cpp (need to resolve why the embeddings (and maybe the weights as a whole) are different from the base model)mrq2024-12-23 20:36:40 -0600
6ecdb715b6more work on vall_e.cpp (some more cleanup, NAR-len demasking, but still need to iron out some kinks)mrq2024-12-23 17:20:04 -0600
a6945f981dvall_e.cpp cleanup (having to keep a map of something that can work without touching llama.cpp AND something minimally invasive, AND adhere to a C++ style that isn't mine, is making me bipolar)mrq2024-12-23 14:16:16 -0600
497bdfc67bmore work (the wall is non-causal decoding......)mrq2024-12-22 20:11:31 -0600
8838babcbasanity checks (and I realized that the model actually had langs set to 4 in the yaml for KO/ZH so................mrq2024-12-19 19:08:57 -0600
7617b6485finstead just compute a bunch of stuff on the transcriptions to store later in different names so I can just retrieve what I want, also added tongue twisters for nefarious reasonsmrq2024-12-18 23:43:11 -0600
4775edaa41added text cleaning/normalization for wer purposes but it amounts to nothing desumrq2024-12-18 19:58:53 -0600
9090c34f10cringe script to process seed-tts-eval's eval dataset into something i can easily usemrq2024-12-17 22:47:12 -0600
ed152f78dftweaks to prompt duration to allow me to divorce how i use it for training with how I'm using it for the demo page, and demo page tweaks to make my life easiermrq2024-12-17 19:33:04 -0600
7129582303actually do proper wer/cer calculation by un-normalizing the scoresmrq2024-12-17 14:22:30 -0600
c2c6d912acactually do speaker verificationmrq2024-12-17 10:11:14 -0600
cd4a5f427cKO/ZH model soonmrq2024-12-15 17:01:14 -0600
4800e7179aremove nan checks because it causes problems in distributed training because I'm not syncing between GPUs (and nan losses gets ignored anyways with loss scaling)mrq2024-12-15 09:42:54 -0600
0fbfb8bbe8actually save the optimizer for the local engine backend because safetensors doesn't save itmrq2024-12-12 17:12:59 -0600
f41251f648more fixes for local engine backendmrq2024-12-12 14:38:42 -0600
6b237ae5e3tweaks for the local engine orchestrator (that I never caught since I always used the deepspeed backend)mrq2024-12-12 13:37:38 -0600
9a62e3b824APOLLO cringe (doesn't want to work with deepspeed)mrq2024-12-12 00:31:58 -0600
cddf8ca814sort batches to try and reduce number of padded tokens in batched inference (also commented out F5 samples getting added to the demo page because I would have to regenerate them)mrq2024-12-11 22:45:38 -0600
20b87bfbd0store metrics and only recalculate them if the output file is newer than the metrics filemrq2024-12-11 20:55:43 -0600
6f1ee0c6faAdded CER, transcription/similarity model args in demomrq2024-12-10 21:00:51 -0600
8568a93dadadded WER/SIM-O metrics, added APOLLO but I need to test itmrq2024-12-10 20:13:21 -0600
fc5e6d8599fixes to process_emilia.py scriptmrq2024-12-09 14:38:09 -0600
a6c745bafbchinese (mandarin?) support added (I guess I don't need pinyin, but tone markers are handled), korean validated, vocab adjustedmrq2024-12-09 14:26:19 -0600
1d460b9fe3logic fixes, I feel like output is better? (also NAR can have a temperature, I imagine it couldn't because it was having a causal masked passed to it for the longest time before I caught it a month ago)mrq2024-12-08 14:52:47 -0600
0c5a458b00deduce language per line to allow for a cheap way to allow for cross-lingual switching, kindamrq2024-12-07 22:57:29 -0600
a032ff588fdoc update, added automatically deducing language from a given text, also checks if the input is already phonemized text to allow direct control without being cringe (procrastinating adding WER/SIM-O)mrq2024-12-07 22:34:25 -0600
5d80a2d0d4fixed NAR-len issues with non-english maybe (langs weren't being passed), added interface to inference in batches through tts.batched_inference (no support for rolling context/prefixes because there's no way to do that), demo page uses batched inferencing nowmrq2024-12-07 19:21:05 -0600
1f54bf5b40revert sageattn back to optional dependency because it's not on windows, force resize_modules on by default because I broke somethingmrq2024-12-07 17:09:39 -0600
218d0e29fdugh (batchmean actually expects batch=seq_len, and not the actual batch)mrq2024-12-07 12:39:01 -0600
61ed662856ACTUALLY actually fix KD-loss (the -inf in the logits was caused by cringecode)mrq2024-12-07 12:31:54 -0600
f97e8b0c7fACTUALLY do KD-loss because of an oversight with masked_select outputting 1D tensors that get softmax'd in totalmrq2024-12-07 09:52:51 -0600
42fafbaacaactually fixed knowledge distillation because of errant -inf logits causing problems and needed to be filtered (and splitting text language / output audio language because it helps)mrq2024-12-06 21:55:20 -0600
23d402bf01added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......)mrq2024-12-05 23:05:52 -0600
c66a53492cforgot to add NTLK as a dependency, promoted sageattn as a default dependency since it works fine enough and seems agnosticmrq2024-12-04 20:33:25 -0600
93d27be539rolling context finally (use last N utterances as the prefix for the next gen), option to split input text prompt by sentences instead of lines (or no splitting)mrq2024-12-04 20:31:44 -0600
9dff68c0c5NAR-len tweaks (remasks a small amount of tokens per step, it seems to help with reducing the number of steps needed some of the time?, disable CFG for the first half to speed things up)mrq2024-12-04 09:30:29 -0600
cf97560e70minimum CFG of 3 for NAR-len because it seems the model will auto-default to NAR-len nowmrq2024-12-03 19:40:05 -0600
ca31da0a95sageattn (forgot to bother with testing this the other day, seems ifne)mrq2024-12-03 15:14:57 -0600
31ab90d84acringe code to convert to LlamaForCausalLM-happy weights + tokenizer dict (still need to write logic to actually use these weights for proper inferencing)mrq2024-12-03 10:18:58 -0600
84a05acb6dtouch ups in docsmrq2024-12-02 19:10:42 -0600
dcaf38b359fixed training tqdm being stubbornmrq2024-11-23 09:45:23 -0600
41d7c30ea5added much cleaner non-causal mask generationmrq2024-11-22 19:43:32 -0600
c99a74e834actually generate a causal mask because it seems sometimes it does not actually generate one because it makes assumptionsmrq2024-11-22 18:30:24 -0600
ccee5fc11cthat was actually all pointless since sdpa always had an attention mask fed to it and does not need is_causal to implicitly generate onemrq2024-11-22 16:51:50 -0600
4aa685e749what has science donemrq2024-11-22 16:45:40 -0600
147219a5e0huge oversight in the attention masking......... (i realized I have not been providing a non-causal mask to non-causal tasks)mrq2024-11-22 13:44:43 -0600
24d888c47ctemporarily dropping support for xformers because it's breaking when using an attention mask (which i dont remember commenting it out when being passed), default to not use wandb because it's being a pain when doing tests and not actual sessionsS)mrq2024-11-22 11:29:12 -0600
8aafae91fddont use timeembeddingmrq2024-11-21 23:14:52 -0600