Commit Graph

286 Commits

Author SHA1 Message Date
mrq
2dfef693c4 comments for clarity 2025-03-16 11:30:23 -05:00
mrq
0a45c9c042 fix attention backend not being used 2025-02-27 21:38:38 -06:00
mrq
eff180248c decoupled llama backend to avoid any funny changes from transformers, removed other backends since i dont think i'll ever bother using them 2025-02-27 19:00:37 -06:00
mrq
cbd4d7d7f4 ugh 2025-02-26 21:31:10 -06:00
mrq
2ea387c08a segregated experimental changes into its own streamlined file to avoid breaking the existing model, and it can pivot to the cleaned up code if it actually works (nothing is working) 2025-02-26 21:26:13 -06:00
mrq
95da4e9405 made muon actually work by actually utilizing param groups (thanks APOLLO for reminding me this is the sane way to handle this split) 2025-02-26 10:39:13 -06:00
mrq
de27115bb7 there's something wrong with it on my 4xV100 rig...... 2025-02-25 15:14:08 -06:00
mrq
a5a04c39ef when the 2025-02-24 21:03:23 -06:00
mrq
918e0dbac1 small slop cleanup 2025-02-24 19:03:53 -06:00
mrq
0f39f4d7a1 lol 2025-02-24 17:51:35 -06:00
mrq
33d5a7109a its a miracle i was able to get a semblance of audio with the naive AudioEncoder (now it interleaves properly) 2025-02-24 14:39:12 -06:00
mrq
8f5a3997bd another experimental flag 2025-02-24 13:50:41 -06:00
mrq
b640fabab5 borrowed muon since it might better work under deepspeed and not require cruft (even though it really does not like the masked-NAR, also make the masked-NAR faux-causal since it might better help out for cfg.model.version >= 7 2025-02-23 17:23:24 -06:00
mrq
8f3c3e01ee oops 2025-02-23 12:09:56 -06:00
mrq
b39aaacd77 oops 2025-02-23 11:55:43 -06:00
mrq
3019c88799 separate mask token and stop token because this might cause issues 2025-02-23 11:36:32 -06:00
mrq
6634d07576 added muon optimizer through kludge hacks because it necessitates a second optimizer in tandum that seems to only sometimes work with deepspeed 2025-02-23 11:22:13 -06:00
mrq
ab0abd2b12 fixes fixes fixes (a quarter of my recently processed audio returned zero'd tensors......) 2025-02-22 09:07:33 -06:00
mrq
13c3a08853 nevermind thats slow 2025-02-14 16:35:17 -06:00
mrq
285e493b12 ugh.......... 2025-02-14 16:24:34 -06:00
mrq
a65c8144f4 with the amount of tweaks I keep making I could have probably had the nvidia/audio-codec-44khz model realized already...... 2025-02-13 18:38:40 -06:00
mrq
e3becec0e8 more better-er loss calc I suppose 2025-02-13 12:49:53 -06:00
mrq
e8f182b634 cleaned up loss calc code (it REALLY hates ignore_loss_for_inputs, but is fine with splitting with loss factors) 2025-02-13 09:35:27 -06:00
mrq
319ca09a4f cleanup 2025-02-12 23:36:32 -06:00
mrq
b52c5c5d80 this seems to work in testing 2025-02-12 16:16:04 -06:00
mrq
e029a8804d ironically none of this cruft gets the loss lower than the original way 2025-02-12 11:17:00 -06:00
mrq
4b31f5c808 this seems preferable 2025-02-12 00:36:50 -06:00
mrq
04fef5dad5 agony 2025-02-12 00:18:24 -06:00
mrq
79c504c278 cleaned up encode/decode functions to make them a little more coherent, added option to batch encode/decode (would have been very nice in the past, but this should speed things up for me when i fall for the latest meme codec) 2025-02-05 20:54:31 -06:00
mrq
bb2ebe1ca2 fixed issues that may rise from updating transformers with attention, added nvidia/audio-codec-44khz backend support (by gutting everything necessary because I do NOT want to install more dependencies 2025-02-04 20:30:07 -06:00
mrq
69c1d2991f updated mixtral backend (need this for something else) 2025-01-20 21:50:56 -06:00
mrq
b445f4abb6 experimental 2025-01-05 19:05:00 -06:00
mrq
2e6a7625e4 experimental 2025-01-05 12:47:03 -06:00
mrq
9b0d2ccbe1 2024-12-26 21:42:17 -06:00
mrq
59f56ad099 cleaup 2024-12-24 23:14:32 -06:00
mrq
82e8592f2a working vall_e.cpp 2024-12-24 17:54:48 -06:00
mrq
497bdfc67b more work (the wall is non-causal decoding......) 2024-12-22 20:11:31 -06:00
mrq
5f289db275 ugh 2024-12-22 16:15:24 -06:00
mrq
0d4329d2e3 sanity cleanup 2024-12-22 15:05:45 -06:00
mrq
353e478e68 agony 2024-12-21 22:52:10 -06:00
mrq
91caf00212 ugh 2024-12-20 17:13:37 -06:00
mrq
59bf6b8b33 exposed additional task (ns, sr, vc) (vc is experimental) 2024-12-20 11:15:29 -06:00
mrq
e7e7f48043 livid 2024-12-19 19:25:27 -06:00
mrq
cddf8ca814 sort batches to try and reduce number of padded tokens in batched inference (also commented out F5 samples getting added to the demo page because I would have to regenerate them) 2024-12-11 22:45:38 -06:00
mrq
61ed662856 ACTUALLY actually fix KD-loss (the -inf in the logits was caused by cringecode) 2024-12-07 12:31:54 -06:00
mrq
34a66e1052 agnostified KD 2024-12-06 23:53:46 -06:00
mrq
953d3eb030 ugh 2024-12-06 22:35:30 -06:00
mrq
42fafbaaca actually fixed knowledge distillation because of errant -inf logits causing problems and needed to be filtered (and splitting text language / output audio language because it helps) 2024-12-06 21:55:20 -06:00
mrq
23d402bf01 added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......) 2024-12-05 23:05:52 -06:00
mrq
84a05acb6d touch ups in docs 2024-12-02 19:10:42 -06:00