Commit Graph

138 Commits

Author SHA1 Message Date
mrq
75b04686f8 added prom-less training / inferencing, some other things 2024-07-22 19:36:07 -05:00
mrq
d53038a9e4 actually have split classifiers working 2024-07-19 15:33:31 -05:00
mrq
28a674e0f1 fixes... 2024-07-18 23:25:32 -05:00
mrq
39f961abcd test trainer (vall_e.models.ar_nar) tests some SpeechX features 2024-07-18 18:46:45 -05:00
mrq
83a0954f85 fixes for re-introducing SpeechX tasks (need to actually validate if these all do the right things) 2024-07-18 17:16:32 -05:00
mrq
97e768601c re-introducing SpeechX tasks (need to validate them all, everything works with base tts anyways) 2024-07-18 16:16:14 -05:00
mrq
c2b8035e74 oops, kept forgetting to actually pass in lang/tone tokens (despite not really using these at the moment) 2024-07-18 14:18:34 -05:00
mrq
22fe53508c added experimental disjointed position IDs (because I *think* this might help because technically a sequence is made up of several parts, and the position embeddings shouldn't be unified) 2024-07-16 19:52:41 -05:00
mrq
fe0f235335 mechanism to store the model config inside the weights and load them, some other things to allow LoRA training on the RetNet (gradient checkpointing will gripe about inputs not having require_grad and nothing seems to remedy it) 2024-07-16 18:23:13 -05:00
mrq
3acc54df22 allow loading a different model within the web ui (apparently I did not have the web UI in the documentation) 2024-07-15 19:59:48 -05:00
mrq
f770467eb3 stuff 2024-07-01 18:13:29 -05:00
mrq
dced595391 more cleanup 2024-06-30 11:00:12 -05:00
mrq
bc2a6fa756 sanity cleanup: moved experimental features under its own thing 2024-06-30 10:37:33 -05:00
mrq
b21f74a5c5 added summing of external embeddings (at this point i dont think any amount of cope bandaids will get DAC to train nicely, I think the RVQ levels the NAR tends add too much noise if they're not accurate) 2024-06-29 23:42:30 -05:00
mrq
793ccb16fb ugh 2024-06-29 22:14:35 -05:00
mrq
2808f881c8 cleaned up subjugated audio embedding into a flag, flag can also have it include the original, underlying embedding as well (it seems to do better when set to inclusive) 2024-06-29 21:46:35 -05:00
mrq
ec5eaebcbc experimental method of using DACs quantizer ""embeddings"" to see if it helps with model quality 2024-06-29 19:46:11 -05:00
mrq
2bfe786ebd ban stop token for NAR levels (because sometimes it gets sampled and causes problems) 2024-06-17 22:14:43 -05:00
mrq
d343bde09b residual_in_fp32=False for mamba arch backends because it breaks the classifier (output projection / lm head / what-have-you) under AMP 2024-06-15 12:08:03 -05:00
mrq
ccb14c06ef mamba2-hf using vasqu/mamba2-torch because it lets me use mamba2 without triton ops (training with my 4xV100s are not happy with mamba2 because of triton) 2024-06-14 19:42:17 -05:00
mrq
83eab4fa59 actually going for the suggested "2x layers, no intermediate scaling" is wrong for VALL-E, directly copying the normal transformer structure fixes mamba2 performance in the test trainer 2024-06-13 20:08:22 -05:00
mrq
26da24fd8d mamba updated to fix that pesky NaN error during training 2024-06-13 12:38:33 -05:00
mrq
cca542a4c0 ugh 2024-06-11 23:59:28 -05:00
mrq
65a8960305 option to split classifier per-level instead of sharing one (at this point I'm just scrambling to try and cope with training a DAC model, the NAR is being a pain) 2024-06-11 22:28:59 -05:00
mrq
a7a6e0ac76 validated that inferencing works, changed some defaults (NAR benefits from greedy sampling) 2024-06-09 17:11:38 -05:00
mrq
80f9530840 ugh 2024-06-09 01:43:44 -05:00
mrq
5c732b72ee ugh 2024-06-08 20:34:00 -05:00
mrq
8d068fa3f9 reticulating splines 2024-06-08 20:30:15 -05:00
mrq
b072f9b96b fixes 2024-06-08 16:01:34 -05:00
mrq
58fb0a84db added experimental NAR only model (inferences text length, need more experimenting), AudioEmbedding logic cleanup (I still think it's being done wrong) 2024-06-08 15:42:02 -05:00
mrq
7d6fff24f9 un-tensor'd quant_level marker since it doesn't need to be one (I forgot why I had it as one but nothing seems to need it as a tensor that didn't already make it one) 2024-06-07 20:46:22 -05:00
mrq
b0158a61d5 fixed some logic errors with training (grabbing wrong quant level...) 2024-06-07 20:34:36 -05:00
mrq
eafa622be2 I forgot the actual reason I was cleaning things up was to re-include prom loss calculation (I realized the reason I did this was because of an prom embedding oversight, it seems to work now) 2024-06-07 20:29:25 -05:00
mrq
a5c90348d9 head hurt 2024-06-06 20:51:31 -05:00
mrq
516b0894d7 m 2024-06-06 19:41:26 -05:00
mrq
ee25d2e62e removed the need to supply targ_list + different AudioEmbedding + other things 2024-06-06 18:52:41 -05:00
mrq
b2194b859a re-added loading multiple models because I'm now entertaining having split AR/NAR models again (and need a way to load both at once) 2024-06-06 09:48:43 -05:00
mrq
b05a905b95 ugh 2024-06-05 21:02:05 -05:00
mrq
4073656293 oops 2024-06-05 20:53:10 -05:00
mrq
ff6fe6f1bc cleanup 2024-06-05 20:30:43 -05:00
mrq
880b4ecd1b cleanup, putting some thoughts in comments before I forget about them 2024-06-05 19:50:06 -05:00
mrq
3cfc8a96bb oops 2024-06-05 10:30:04 -05:00
mrq
48cd1054f9 madness 2024-06-04 23:48:51 -05:00
mrq
9e3f2e300f experimental "just have a token for what rvq level we're on" that seems to help all models (mamba almost works, but it might just have to be relegated as a pure AR model) 2024-06-04 23:23:31 -05:00
mrq
e0886c5a78 re-added mamba as a possible non-experimental arch backend (test trainer will set it as AR only, doing any NAR tasks lobotomizes it) 2024-06-04 22:41:22 -05:00
mrq
934672252b feverish cleanup 2024-06-03 21:28:49 -05:00
mrq
7feeb944a0 probably insane with even entertaining going this route 2024-06-03 20:26:27 -05:00
mrq
b482ca19ff added model config option to set KV head count for MQA/GQA instead of MHA for llama-based models (i think its very negligible both ways on such a small model size) 2024-05-31 19:32:37 -05:00
mrq
e15c6c74c3 correctness 2024-05-30 20:50:45 -05:00
mrq
da473295b7 better way to compute per-segment losses 2024-05-28 19:29:54 -05:00