Commit Graph

393 Commits

Author SHA1 Message Date
mrq
a65c8144f4 with the amount of tweaks I keep making I could have probably had the nvidia/audio-codec-44khz model realized already...... 2025-02-13 18:38:40 -06:00
mrq
e3becec0e8 more better-er loss calc I suppose 2025-02-13 12:49:53 -06:00
mrq
e8f182b634 cleaned up loss calc code (it REALLY hates ignore_loss_for_inputs, but is fine with splitting with loss factors) 2025-02-13 09:35:27 -06:00
mrq
319ca09a4f cleanup 2025-02-12 23:36:32 -06:00
mrq
b52c5c5d80 this seems to work in testing 2025-02-12 16:16:04 -06:00
mrq
e029a8804d ironically none of this cruft gets the loss lower than the original way 2025-02-12 11:17:00 -06:00
mrq
4b31f5c808 this seems preferable 2025-02-12 00:36:50 -06:00
mrq
04fef5dad5 agony 2025-02-12 00:18:24 -06:00
mrq
075ffef68a ugh 2025-02-09 13:02:51 -06:00
mrq
47eb498046 more tweaks 2025-02-06 23:26:26 -06:00
mrq
79c504c278 cleaned up encode/decode functions to make them a little more coherent, added option to batch encode/decode (would have been very nice in the past, but this should speed things up for me when i fall for the latest meme codec) 2025-02-05 20:54:31 -06:00
mrq
bb2ebe1ca2 fixed issues that may rise from updating transformers with attention, added nvidia/audio-codec-44khz backend support (by gutting everything necessary because I do NOT want to install more dependencies 2025-02-04 20:30:07 -06:00
mrq
0841f366e8 I should really just grab modelling_llama wholesale (fix for the adapted attention class) 2025-01-28 21:55:05 -06:00
mrq
e5f9da2221 oops 2025-01-21 11:59:24 -06:00
mrq
69c1d2991f updated mixtral backend (need this for something else) 2025-01-20 21:50:56 -06:00
mrq
1a26f789a5 added option to playback audio directly, removed no-phonemize option since I swear it worked in testing but it doesn't actually work 2025-01-12 21:52:49 -06:00
mrq
3ab11bdc7b oops 2025-01-05 23:53:17 -06:00
mrq
b445f4abb6 experimental 2025-01-05 19:05:00 -06:00
mrq
2e6a7625e4 experimental 2025-01-05 12:47:03 -06:00
mrq
9b0d2ccbe1 2024-12-26 21:42:17 -06:00
mrq
59f56ad099 cleaup 2024-12-24 23:14:32 -06:00
mrq
82e8592f2a working vall_e.cpp 2024-12-24 17:54:48 -06:00
mrq
497bdfc67b more work (the wall is non-causal decoding......) 2024-12-22 20:11:31 -06:00
mrq
5f289db275 ugh 2024-12-22 16:15:24 -06:00
mrq
0d4329d2e3 sanity cleanup 2024-12-22 15:05:45 -06:00
mrq
353e478e68 agony 2024-12-21 22:52:10 -06:00
mrq
91caf00212 ugh 2024-12-20 17:13:37 -06:00
mrq
59bf6b8b33 exposed additional task (ns, sr, vc) (vc is experimental) 2024-12-20 11:15:29 -06:00
mrq
e7e7f48043 livid 2024-12-19 19:25:27 -06:00
mrq
c2c6d912ac actually do speaker verification 2024-12-17 10:11:14 -06:00
mrq
c2e17e287b really shoddy voice conversion implementation (it sort of works...) 2024-12-16 22:54:53 -06:00
mrq
8515038968 imagine my disappointment when the epoch finished just for it to throw an exception 2024-12-16 18:28:01 -06:00
mrq
4a65ac9eb7 oops 2024-12-15 17:21:51 -06:00
mrq
9a62e3b824 APOLLO cringe (doesn't want to work with deepspeed) 2024-12-12 00:31:58 -06:00
mrq
cddf8ca814 sort batches to try and reduce number of padded tokens in batched inference (also commented out F5 samples getting added to the demo page because I would have to regenerate them) 2024-12-11 22:45:38 -06:00
mrq
6468e5d124 lol 2024-12-11 19:10:32 -06:00
mrq
3ef8894290 oops 2024-12-08 15:24:21 -06:00
mrq
1d460b9fe3 logic fixes, I feel like output is better? (also NAR can have a temperature, I imagine it couldn't because it was having a causal masked passed to it for the longest time before I caught it a month ago) 2024-12-08 14:52:47 -06:00
mrq
5d80a2d0d4 fixed NAR-len issues with non-english maybe (langs weren't being passed), added interface to inference in batches through tts.batched_inference (no support for rolling context/prefixes because there's no way to do that), demo page uses batched inferencing now 2024-12-07 19:21:05 -06:00
mrq
61ed662856 ACTUALLY actually fix KD-loss (the -inf in the logits was caused by cringecode) 2024-12-07 12:31:54 -06:00
mrq
34a66e1052 agnostified KD 2024-12-06 23:53:46 -06:00
mrq
953d3eb030 ugh 2024-12-06 22:35:30 -06:00
mrq
42fafbaaca actually fixed knowledge distillation because of errant -inf logits causing problems and needed to be filtered (and splitting text language / output audio language because it helps) 2024-12-06 21:55:20 -06:00
mrq
23d402bf01 added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......) 2024-12-05 23:05:52 -06:00
mrq
93d27be539 rolling context finally (use last N utterances as the prefix for the next gen), option to split input text prompt by sentences instead of lines (or no splitting) 2024-12-04 20:31:44 -06:00
mrq
9dff68c0c5 NAR-len tweaks (remasks a small amount of tokens per step, it seems to help with reducing the number of steps needed some of the time?, disable CFG for the first half to speed things up) 2024-12-04 09:30:29 -06:00
mrq
cf97560e70 minimum CFG of 3 for NAR-len because it seems the model will auto-default to NAR-len now 2024-12-03 19:40:05 -06:00
mrq
ca31da0a95 sageattn (forgot to bother with testing this the other day, seems ifne) 2024-12-03 15:14:57 -06:00
mrq
84a05acb6d touch ups in docs 2024-12-02 19:10:42 -06:00
mrq
dcaf38b359 fixed training tqdm being stubborn 2024-11-23 09:45:23 -06:00