|
a65c8144f4
|
with the amount of tweaks I keep making I could have probably had the nvidia/audio-codec-44khz model realized already......
|
2025-02-13 18:38:40 -06:00 |
|
|
e3becec0e8
|
more better-er loss calc I suppose
|
2025-02-13 12:49:53 -06:00 |
|
|
e8f182b634
|
cleaned up loss calc code (it REALLY hates ignore_loss_for_inputs, but is fine with splitting with loss factors)
|
2025-02-13 09:35:27 -06:00 |
|
|
319ca09a4f
|
cleanup
|
2025-02-12 23:36:32 -06:00 |
|
|
b52c5c5d80
|
this seems to work in testing
|
2025-02-12 16:16:04 -06:00 |
|
|
e029a8804d
|
ironically none of this cruft gets the loss lower than the original way
|
2025-02-12 11:17:00 -06:00 |
|
|
4b31f5c808
|
this seems preferable
|
2025-02-12 00:36:50 -06:00 |
|
|
04fef5dad5
|
agony
|
2025-02-12 00:18:24 -06:00 |
|
|
075ffef68a
|
ugh
|
2025-02-09 13:02:51 -06:00 |
|
|
47eb498046
|
more tweaks
|
2025-02-06 23:26:26 -06:00 |
|
|
79c504c278
|
cleaned up encode/decode functions to make them a little more coherent, added option to batch encode/decode (would have been very nice in the past, but this should speed things up for me when i fall for the latest meme codec)
|
2025-02-05 20:54:31 -06:00 |
|
|
bb2ebe1ca2
|
fixed issues that may rise from updating transformers with attention, added nvidia/audio-codec-44khz backend support (by gutting everything necessary because I do NOT want to install more dependencies
|
2025-02-04 20:30:07 -06:00 |
|
|
0841f366e8
|
I should really just grab modelling_llama wholesale (fix for the adapted attention class)
|
2025-01-28 21:55:05 -06:00 |
|
|
e5f9da2221
|
oops
|
2025-01-21 11:59:24 -06:00 |
|
|
69c1d2991f
|
updated mixtral backend (need this for something else)
|
2025-01-20 21:50:56 -06:00 |
|
|
1a26f789a5
|
added option to playback audio directly, removed no-phonemize option since I swear it worked in testing but it doesn't actually work
|
2025-01-12 21:52:49 -06:00 |
|
|
3ab11bdc7b
|
oops
|
2025-01-05 23:53:17 -06:00 |
|
|
b445f4abb6
|
experimental
|
2025-01-05 19:05:00 -06:00 |
|
|
2e6a7625e4
|
experimental
|
2025-01-05 12:47:03 -06:00 |
|
|
9b0d2ccbe1
|
|
2024-12-26 21:42:17 -06:00 |
|
|
59f56ad099
|
cleaup
|
2024-12-24 23:14:32 -06:00 |
|
|
82e8592f2a
|
working vall_e.cpp
|
2024-12-24 17:54:48 -06:00 |
|
|
497bdfc67b
|
more work (the wall is non-causal decoding......)
|
2024-12-22 20:11:31 -06:00 |
|
|
5f289db275
|
ugh
|
2024-12-22 16:15:24 -06:00 |
|
|
0d4329d2e3
|
sanity cleanup
|
2024-12-22 15:05:45 -06:00 |
|
|
353e478e68
|
agony
|
2024-12-21 22:52:10 -06:00 |
|
|
91caf00212
|
ugh
|
2024-12-20 17:13:37 -06:00 |
|
|
59bf6b8b33
|
exposed additional task (ns, sr, vc) (vc is experimental)
|
2024-12-20 11:15:29 -06:00 |
|
|
e7e7f48043
|
livid
|
2024-12-19 19:25:27 -06:00 |
|
|
c2c6d912ac
|
actually do speaker verification
|
2024-12-17 10:11:14 -06:00 |
|
|
c2e17e287b
|
really shoddy voice conversion implementation (it sort of works...)
|
2024-12-16 22:54:53 -06:00 |
|
|
8515038968
|
imagine my disappointment when the epoch finished just for it to throw an exception
|
2024-12-16 18:28:01 -06:00 |
|
|
4a65ac9eb7
|
oops
|
2024-12-15 17:21:51 -06:00 |
|
|
9a62e3b824
|
APOLLO cringe (doesn't want to work with deepspeed)
|
2024-12-12 00:31:58 -06:00 |
|
|
cddf8ca814
|
sort batches to try and reduce number of padded tokens in batched inference (also commented out F5 samples getting added to the demo page because I would have to regenerate them)
|
2024-12-11 22:45:38 -06:00 |
|
|
6468e5d124
|
lol
|
2024-12-11 19:10:32 -06:00 |
|
|
3ef8894290
|
oops
|
2024-12-08 15:24:21 -06:00 |
|
|
1d460b9fe3
|
logic fixes, I feel like output is better? (also NAR can have a temperature, I imagine it couldn't because it was having a causal masked passed to it for the longest time before I caught it a month ago)
|
2024-12-08 14:52:47 -06:00 |
|
|
5d80a2d0d4
|
fixed NAR-len issues with non-english maybe (langs weren't being passed), added interface to inference in batches through tts.batched_inference (no support for rolling context/prefixes because there's no way to do that), demo page uses batched inferencing now
|
2024-12-07 19:21:05 -06:00 |
|
|
61ed662856
|
ACTUALLY actually fix KD-loss (the -inf in the logits was caused by cringecode)
|
2024-12-07 12:31:54 -06:00 |
|
|
34a66e1052
|
agnostified KD
|
2024-12-06 23:53:46 -06:00 |
|
|
953d3eb030
|
ugh
|
2024-12-06 22:35:30 -06:00 |
|
|
42fafbaaca
|
actually fixed knowledge distillation because of errant -inf logits causing problems and needed to be filtered (and splitting text language / output audio language because it helps)
|
2024-12-06 21:55:20 -06:00 |
|
|
23d402bf01
|
added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......)
|
2024-12-05 23:05:52 -06:00 |
|
|
93d27be539
|
rolling context finally (use last N utterances as the prefix for the next gen), option to split input text prompt by sentences instead of lines (or no splitting)
|
2024-12-04 20:31:44 -06:00 |
|
|
9dff68c0c5
|
NAR-len tweaks (remasks a small amount of tokens per step, it seems to help with reducing the number of steps needed some of the time?, disable CFG for the first half to speed things up)
|
2024-12-04 09:30:29 -06:00 |
|
|
cf97560e70
|
minimum CFG of 3 for NAR-len because it seems the model will auto-default to NAR-len now
|
2024-12-03 19:40:05 -06:00 |
|
|
ca31da0a95
|
sageattn (forgot to bother with testing this the other day, seems ifne)
|
2024-12-03 15:14:57 -06:00 |
|
|
84a05acb6d
|
touch ups in docs
|
2024-12-02 19:10:42 -06:00 |
|
|
dcaf38b359
|
fixed training tqdm being stubborn
|
2024-11-23 09:45:23 -06:00 |
|