|
b640fabab5
|
borrowed muon since it might better work under deepspeed and not require cruft (even though it really does not like the masked-NAR, also make the masked-NAR faux-causal since it might better help out for cfg.model.version >= 7
|
2025-02-23 17:23:24 -06:00 |
|
|
8f3c3e01ee
|
oops
|
2025-02-23 12:09:56 -06:00 |
|
|
b39aaacd77
|
oops
|
2025-02-23 11:55:43 -06:00 |
|
|
3019c88799
|
separate mask token and stop token because this might cause issues
|
2025-02-23 11:36:32 -06:00 |
|
|
6634d07576
|
added muon optimizer through kludge hacks because it necessitates a second optimizer in tandum that seems to only sometimes work with deepspeed
|
2025-02-23 11:22:13 -06:00 |
|
|
ab0abd2b12
|
fixes fixes fixes (a quarter of my recently processed audio returned zero'd tensors......)
|
2025-02-22 09:07:33 -06:00 |
|
|
13c3a08853
|
nevermind thats slow
|
2025-02-14 16:35:17 -06:00 |
|
|
285e493b12
|
ugh..........
|
2025-02-14 16:24:34 -06:00 |
|
|
a65c8144f4
|
with the amount of tweaks I keep making I could have probably had the nvidia/audio-codec-44khz model realized already......
|
2025-02-13 18:38:40 -06:00 |
|
|
e3becec0e8
|
more better-er loss calc I suppose
|
2025-02-13 12:49:53 -06:00 |
|
|
e8f182b634
|
cleaned up loss calc code (it REALLY hates ignore_loss_for_inputs, but is fine with splitting with loss factors)
|
2025-02-13 09:35:27 -06:00 |
|
|
319ca09a4f
|
cleanup
|
2025-02-12 23:36:32 -06:00 |
|
|
b52c5c5d80
|
this seems to work in testing
|
2025-02-12 16:16:04 -06:00 |
|
|
e029a8804d
|
ironically none of this cruft gets the loss lower than the original way
|
2025-02-12 11:17:00 -06:00 |
|
|
4b31f5c808
|
this seems preferable
|
2025-02-12 00:36:50 -06:00 |
|
|
04fef5dad5
|
agony
|
2025-02-12 00:18:24 -06:00 |
|
|
79c504c278
|
cleaned up encode/decode functions to make them a little more coherent, added option to batch encode/decode (would have been very nice in the past, but this should speed things up for me when i fall for the latest meme codec)
|
2025-02-05 20:54:31 -06:00 |
|
|
bb2ebe1ca2
|
fixed issues that may rise from updating transformers with attention, added nvidia/audio-codec-44khz backend support (by gutting everything necessary because I do NOT want to install more dependencies
|
2025-02-04 20:30:07 -06:00 |
|
|
69c1d2991f
|
updated mixtral backend (need this for something else)
|
2025-01-20 21:50:56 -06:00 |
|
|
b445f4abb6
|
experimental
|
2025-01-05 19:05:00 -06:00 |
|
|
2e6a7625e4
|
experimental
|
2025-01-05 12:47:03 -06:00 |
|
|
9b0d2ccbe1
|
|
2024-12-26 21:42:17 -06:00 |
|
|
59f56ad099
|
cleaup
|
2024-12-24 23:14:32 -06:00 |
|
|
82e8592f2a
|
working vall_e.cpp
|
2024-12-24 17:54:48 -06:00 |
|
|
497bdfc67b
|
more work (the wall is non-causal decoding......)
|
2024-12-22 20:11:31 -06:00 |
|
|
5f289db275
|
ugh
|
2024-12-22 16:15:24 -06:00 |
|
|
0d4329d2e3
|
sanity cleanup
|
2024-12-22 15:05:45 -06:00 |
|
|
353e478e68
|
agony
|
2024-12-21 22:52:10 -06:00 |
|
|
91caf00212
|
ugh
|
2024-12-20 17:13:37 -06:00 |
|
|
59bf6b8b33
|
exposed additional task (ns, sr, vc) (vc is experimental)
|
2024-12-20 11:15:29 -06:00 |
|
|
e7e7f48043
|
livid
|
2024-12-19 19:25:27 -06:00 |
|
|
cddf8ca814
|
sort batches to try and reduce number of padded tokens in batched inference (also commented out F5 samples getting added to the demo page because I would have to regenerate them)
|
2024-12-11 22:45:38 -06:00 |
|
|
61ed662856
|
ACTUALLY actually fix KD-loss (the -inf in the logits was caused by cringecode)
|
2024-12-07 12:31:54 -06:00 |
|
|
34a66e1052
|
agnostified KD
|
2024-12-06 23:53:46 -06:00 |
|
|
953d3eb030
|
ugh
|
2024-12-06 22:35:30 -06:00 |
|
|
42fafbaaca
|
actually fixed knowledge distillation because of errant -inf logits causing problems and needed to be filtered (and splitting text language / output audio language because it helps)
|
2024-12-06 21:55:20 -06:00 |
|
|
23d402bf01
|
added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......)
|
2024-12-05 23:05:52 -06:00 |
|
|
84a05acb6d
|
touch ups in docs
|
2024-12-02 19:10:42 -06:00 |
|
|
4aa685e749
|
what has science done
|
2024-11-22 16:45:40 -06:00 |
|
|
147219a5e0
|
huge oversight in the attention masking......... (i realized I have not been providing a non-causal mask to non-causal tasks)
|
2024-11-22 13:44:43 -06:00 |
|
|
8aafae91fd
|
dont use timeembedding
|
2024-11-21 23:14:52 -06:00 |
|
|
2cef97e43f
|
cleanup
|
2024-11-21 23:08:43 -06:00 |
|
|
190a917b3e
|
I did it.
|
2024-11-19 12:24:33 -06:00 |
|
|
6cfdf94bf9
|
swap priority to use nar-len if available, added notes
|
2024-11-18 09:40:04 -06:00 |
|
|
069b27570f
|
set option to set training masking ratio (I don't think for tts a fixed masking ratio is beneficial since the magic of the AR+NAR is being able to still reference the prior sequence of tokens for predicting things)
|
2024-11-17 17:04:07 -06:00 |
|
|
88d840218d
|
default set cfg strength to 3.0 since the reference model is updated
|
2024-11-17 10:23:40 -06:00 |
|
|
a3e1fa3518
|
ugh
|
2024-11-17 09:28:33 -06:00 |
|
|
23fdba0c98
|
tweaks and changes
|
2024-11-16 15:49:06 -06:00 |
|
|
2fbeacfe92
|
ugh
|
2024-11-14 22:18:33 -06:00 |
|
|
39096f8ff3
|
redid loss calculation to be cleaner, and position ID generation, and other things (I might need to train the NAR-len from scratch and not resume from an existing checkpoint.........)
|
2024-11-14 22:17:47 -06:00 |
|