|
11fa3da665
|
some cleanup, fixed the wrapper attention to explicitly use other sdpa backends
|
2024-08-03 19:51:00 -05:00 |
|
|
9564ecda43
|
wrapper attention class for other sdpa backends + xformers seems to have broke...
|
2024-08-03 15:12:11 -05:00 |
|
|
9e1989be1b
|
tweaked initial NAR pass's initial token embeddings to use a different value, or osmething
|
2024-08-03 09:01:37 -05:00 |
|
|
26f74c5739
|
somehow fixed non-unified position IDs for the NAR-len
|
2024-08-03 08:43:42 -05:00 |
|
|
66407e5bdb
|
tweaks for the NAR-len model, maybe
|
2024-08-03 08:40:39 -05:00 |
|
|
97c5241bef
|
fixes, throw an exception when using NAR only model with non-unified position IDs, since for some reason it outputs garbage for the NAR
|
2024-08-02 22:25:49 -05:00 |
|
|
b4c895114c
|
naive model offloading support (handles automatically splitting parts of the model to requested device per memory constraints, either inferred or requested in the yaml, input tensors are automatically migrated to the right device, it SEEMS to work for training under the test trainer when split between GPU and CPU) (this was specifically only because that Flux imagegen model released so I can test it there)
|
2024-08-01 20:12:06 -05:00 |
|
|
387358bc8a
|
fixes for the NAR-len model, and documentation some config options, and a better way to handle resizing modules on state_dict load
|
2024-07-31 20:35:09 -05:00 |
|
|
07f8e2ad06
|
added option to set the causal size (how many tokens to sample per AR step), but requires the model to be trained for this (which explains why recurrent chunk sampling just doesn't work for the retnet tests, obvious in hindsight)
|
2024-07-30 20:53:51 -05:00 |
|
|
ebf848d249
|
possible speedup for samplers that require a list of previous tokens (the DRY sampler made me realize that I should copy the tolist() thing from the rep pen sampler for everything else)
|
2024-07-29 20:23:26 -05:00 |
|
|
55b0121b1a
|
trying (and failing) to nail a weird regression in fancier attentions
|
2024-07-29 19:53:37 -05:00 |
|
|
c2f5b916fc
|
added what I think is DRY sampling
|
2024-07-29 19:15:07 -05:00 |
|
|
1acb0e9c84
|
added experimental training setting to perform token dropout to MAYBE compensate for errors from the preceding RVQ level (two types: token error offset, token dropout embedding replace)
|
2024-07-24 19:35:17 -05:00 |
|
|
75b04686f8
|
added prom-less training / inferencing, some other things
|
2024-07-22 19:36:07 -05:00 |
|
|
d53038a9e4
|
actually have split classifiers working
|
2024-07-19 15:33:31 -05:00 |
|
|
28a674e0f1
|
fixes...
|
2024-07-18 23:25:32 -05:00 |
|
|
39f961abcd
|
test trainer (vall_e.models.ar_nar) tests some SpeechX features
|
2024-07-18 18:46:45 -05:00 |
|
|
83a0954f85
|
fixes for re-introducing SpeechX tasks (need to actually validate if these all do the right things)
|
2024-07-18 17:16:32 -05:00 |
|
|
97e768601c
|
re-introducing SpeechX tasks (need to validate them all, everything works with base tts anyways)
|
2024-07-18 16:16:14 -05:00 |
|
|
c2b8035e74
|
oops, kept forgetting to actually pass in lang/tone tokens (despite not really using these at the moment)
|
2024-07-18 14:18:34 -05:00 |
|
|
22fe53508c
|
added experimental disjointed position IDs (because I *think* this might help because technically a sequence is made up of several parts, and the position embeddings shouldn't be unified)
|
2024-07-16 19:52:41 -05:00 |
|
|
fe0f235335
|
mechanism to store the model config inside the weights and load them, some other things to allow LoRA training on the RetNet (gradient checkpointing will gripe about inputs not having require_grad and nothing seems to remedy it)
|
2024-07-16 18:23:13 -05:00 |
|
|
3acc54df22
|
allow loading a different model within the web ui (apparently I did not have the web UI in the documentation)
|
2024-07-15 19:59:48 -05:00 |
|
|
f770467eb3
|
stuff
|
2024-07-01 18:13:29 -05:00 |
|
|
dced595391
|
more cleanup
|
2024-06-30 11:00:12 -05:00 |
|
|
bc2a6fa756
|
sanity cleanup: moved experimental features under its own thing
|
2024-06-30 10:37:33 -05:00 |
|
|
b21f74a5c5
|
added summing of external embeddings (at this point i dont think any amount of cope bandaids will get DAC to train nicely, I think the RVQ levels the NAR tends add too much noise if they're not accurate)
|
2024-06-29 23:42:30 -05:00 |
|
|
793ccb16fb
|
ugh
|
2024-06-29 22:14:35 -05:00 |
|
|
2808f881c8
|
cleaned up subjugated audio embedding into a flag, flag can also have it include the original, underlying embedding as well (it seems to do better when set to inclusive)
|
2024-06-29 21:46:35 -05:00 |
|
|
ec5eaebcbc
|
experimental method of using DACs quantizer ""embeddings"" to see if it helps with model quality
|
2024-06-29 19:46:11 -05:00 |
|
|
2bfe786ebd
|
ban stop token for NAR levels (because sometimes it gets sampled and causes problems)
|
2024-06-17 22:14:43 -05:00 |
|
|
d343bde09b
|
residual_in_fp32=False for mamba arch backends because it breaks the classifier (output projection / lm head / what-have-you) under AMP
|
2024-06-15 12:08:03 -05:00 |
|
|
ccb14c06ef
|
mamba2-hf using vasqu/mamba2-torch because it lets me use mamba2 without triton ops (training with my 4xV100s are not happy with mamba2 because of triton)
|
2024-06-14 19:42:17 -05:00 |
|
|
83eab4fa59
|
actually going for the suggested "2x layers, no intermediate scaling" is wrong for VALL-E, directly copying the normal transformer structure fixes mamba2 performance in the test trainer
|
2024-06-13 20:08:22 -05:00 |
|
|
26da24fd8d
|
mamba updated to fix that pesky NaN error during training
|
2024-06-13 12:38:33 -05:00 |
|
|
cca542a4c0
|
ugh
|
2024-06-11 23:59:28 -05:00 |
|
|
65a8960305
|
option to split classifier per-level instead of sharing one (at this point I'm just scrambling to try and cope with training a DAC model, the NAR is being a pain)
|
2024-06-11 22:28:59 -05:00 |
|
|
a7a6e0ac76
|
validated that inferencing works, changed some defaults (NAR benefits from greedy sampling)
|
2024-06-09 17:11:38 -05:00 |
|
|
80f9530840
|
ugh
|
2024-06-09 01:43:44 -05:00 |
|
|
5c732b72ee
|
ugh
|
2024-06-08 20:34:00 -05:00 |
|
|
8d068fa3f9
|
reticulating splines
|
2024-06-08 20:30:15 -05:00 |
|
|
b072f9b96b
|
fixes
|
2024-06-08 16:01:34 -05:00 |
|
|
58fb0a84db
|
added experimental NAR only model (inferences text length, need more experimenting), AudioEmbedding logic cleanup (I still think it's being done wrong)
|
2024-06-08 15:42:02 -05:00 |
|
|
7d6fff24f9
|
un-tensor'd quant_level marker since it doesn't need to be one (I forgot why I had it as one but nothing seems to need it as a tensor that didn't already make it one)
|
2024-06-07 20:46:22 -05:00 |
|
|
b0158a61d5
|
fixed some logic errors with training (grabbing wrong quant level...)
|
2024-06-07 20:34:36 -05:00 |
|
|
eafa622be2
|
I forgot the actual reason I was cleaning things up was to re-include prom loss calculation (I realized the reason I did this was because of an prom embedding oversight, it seems to work now)
|
2024-06-07 20:29:25 -05:00 |
|
|
a5c90348d9
|
head hurt
|
2024-06-06 20:51:31 -05:00 |
|
|
516b0894d7
|
m
|
2024-06-06 19:41:26 -05:00 |
|
|
ee25d2e62e
|
removed the need to supply targ_list + different AudioEmbedding + other things
|
2024-06-06 18:52:41 -05:00 |
|
|
b2194b859a
|
re-added loading multiple models because I'm now entertaining having split AR/NAR models again (and need a way to load both at once)
|
2024-06-06 09:48:43 -05:00 |
|