|
d33a906119
|
cleanup for AR_NAR inferencing to allow both TTS and STT tasks simultaneously (need to have training eval do this to though)
|
2024-09-06 14:30:12 -05:00 |
|
|
341e19162b
|
fixes, again
|
2024-09-06 11:41:41 -05:00 |
|
|
94cf81d38c
|
tweak
|
2024-09-05 23:21:18 -05:00 |
|
|
413097f5f7
|
fixes
|
2024-09-05 21:42:59 -05:00 |
|
|
54547b74d8
|
experimental implementation of STT (need to actually test on a model, test trainer seems to work)
|
2024-09-05 20:43:20 -05:00 |
|
|
32287710a2
|
moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)
|
2024-08-29 13:27:16 -05:00 |
|
|
2a1794c084
|
ughghghhhh
|
2024-08-09 21:15:01 -05:00 |
|
|
ed373957e2
|
maybe not
|
2024-08-09 11:38:08 -05:00 |
|
|
debcc93e7e
|
add adapted MixtralAttention for when I make a bad decision to actually train a MoE
|
2024-08-04 22:03:22 -05:00 |
|
|
10aaf840e7
|
added export option to convert Llama to MixtralMoE for another dumb experiment
|
2024-08-04 20:25:06 -05:00 |
|
|
3a65cc4b22
|
fix issue with sft and shared tensors...
|
2024-08-04 19:56:21 -05:00 |
|
|
6a733eb2ed
|
changed torch.Tensor().to(device, dtype) to just torch.tensor(..., device, dtype) because it's been bothering my autism that I'm creating tensors then converting rather than creating with the right device/dtype, some 'optimization' to compile the model but it doesnt seem to do anything useful
|
2024-08-03 22:10:21 -05:00 |
|
|
11fa3da665
|
some cleanup, fixed the wrapper attention to explicitly use other sdpa backends
|
2024-08-03 19:51:00 -05:00 |
|
|
97c5241bef
|
fixes, throw an exception when using NAR only model with non-unified position IDs, since for some reason it outputs garbage for the NAR
|
2024-08-02 22:25:49 -05:00 |
|
|
443422ecb5
|
ugh, finally got some form of offloading working (need to test if it works on different GPUs, but GPU and CPU offloading seems to work in the test trainer)
|
2024-08-01 22:43:39 -05:00 |
|
|
c9ec6b28ef
|
it actually wasn't working because Engines.__init__() automatically moves the entire module to the requested device, which was being called after offloading the model in the test trainer (and it seems I cant do it without injecting a bunch of shit in modeling_llama.py)
|
2024-08-01 20:56:28 -05:00 |
|
|
b4c895114c
|
naive model offloading support (handles automatically splitting parts of the model to requested device per memory constraints, either inferred or requested in the yaml, input tensors are automatically migrated to the right device, it SEEMS to work for training under the test trainer when split between GPU and CPU) (this was specifically only because that Flux imagegen model released so I can test it there)
|
2024-08-01 20:12:06 -05:00 |
|
|
07f8e2ad06
|
added option to set the causal size (how many tokens to sample per AR step), but requires the model to be trained for this (which explains why recurrent chunk sampling just doesn't work for the retnet tests, obvious in hindsight)
|
2024-07-30 20:53:51 -05:00 |
|
|
c2f5b916fc
|
added what I think is DRY sampling
|
2024-07-29 19:15:07 -05:00 |
|
|
ce8bb1e4f7
|
sanity cleanups with weird off-by-one-ness, cleaned up and validated vall_e.models.experimental works again
|
2024-07-27 15:36:05 -05:00 |
|
|
06e948aec1
|
suppress warning on exit about distributed not being cleaned up (because I updated my system)
|
2024-07-25 16:50:47 -05:00 |
|
|
1acb0e9c84
|
added experimental training setting to perform token dropout to MAYBE compensate for errors from the preceding RVQ level (two types: token error offset, token dropout embedding replace)
|
2024-07-24 19:35:17 -05:00 |
|
|
75b04686f8
|
added prom-less training / inferencing, some other things
|
2024-07-22 19:36:07 -05:00 |
|
|
e19aa643a6
|
cleaned up demo page creation, added option to pass in RVQ level sampling distribution for training
|
2024-07-21 19:12:03 -05:00 |
|
|
d87b492295
|
added rudimentary demo page creator (currently just embeds base64 wavs into the page, need to test not doing that)
|
2024-07-19 20:49:40 -05:00 |
|
|
39f961abcd
|
test trainer (vall_e.models.ar_nar) tests some SpeechX features
|
2024-07-18 18:46:45 -05:00 |
|
|
97e768601c
|
re-introducing SpeechX tasks (need to validate them all, everything works with base tts anyways)
|
2024-07-18 16:16:14 -05:00 |
|
|
c2b8035e74
|
oops, kept forgetting to actually pass in lang/tone tokens (despite not really using these at the moment)
|
2024-07-18 14:18:34 -05:00 |
|
|
3acc54df22
|
allow loading a different model within the web ui (apparently I did not have the web UI in the documentation)
|
2024-07-15 19:59:48 -05:00 |
|
|
7b210d9738
|
sanity cleanup
|
2024-07-04 15:58:08 -05:00 |
|
|
dced595391
|
more cleanup
|
2024-06-30 11:00:12 -05:00 |
|
|
bc2a6fa756
|
sanity cleanup: moved experimental features under its own thing
|
2024-06-30 10:37:33 -05:00 |
|
|
b21f74a5c5
|
added summing of external embeddings (at this point i dont think any amount of cope bandaids will get DAC to train nicely, I think the RVQ levels the NAR tends add too much noise if they're not accurate)
|
2024-06-29 23:42:30 -05:00 |
|
|
2808f881c8
|
cleaned up subjugated audio embedding into a flag, flag can also have it include the original, underlying embedding as well (it seems to do better when set to inclusive)
|
2024-06-29 21:46:35 -05:00 |
|
|
ec5eaebcbc
|
experimental method of using DACs quantizer ""embeddings"" to see if it helps with model quality
|
2024-06-29 19:46:11 -05:00 |
|
|
a8718d35a4
|
nasty bandaid because some of my DAC dataset only has 8 RVQ levels instead of the full 9
|
2024-06-29 10:16:37 -05:00 |
|
|
591d3ac848
|
have eval dataloader use eval batch size for batchedordersampler
|
2024-06-28 22:44:00 -05:00 |
|
|
83075c1505
|
sort duration buckets to ensure that paths sorted-by-duration are actually sorted by duration (because i didnt know that python dicts can have non-strings as keys), added batching samples based on total duration to ensure best training throughput
|
2024-06-28 22:28:54 -05:00 |
|
|
2bfe786ebd
|
ban stop token for NAR levels (because sometimes it gets sampled and causes problems)
|
2024-06-17 22:14:43 -05:00 |
|
|
7cfb78fa64
|
enable LoRA for targetted RVQ levels (to experiment with, seems to help)
|
2024-06-17 21:45:03 -05:00 |
|
|
19410a919e
|
ugh
|
2024-06-15 12:29:03 -05:00 |
|
|
83eab4fa59
|
actually going for the suggested "2x layers, no intermediate scaling" is wrong for VALL-E, directly copying the normal transformer structure fixes mamba2 performance in the test trainer
|
2024-06-13 20:08:22 -05:00 |
|
|
26da24fd8d
|
mamba updated to fix that pesky NaN error during training
|
2024-06-13 12:38:33 -05:00 |
|
|
65a8960305
|
option to split classifier per-level instead of sharing one (at this point I'm just scrambling to try and cope with training a DAC model, the NAR is being a pain)
|
2024-06-11 22:28:59 -05:00 |
|
|
132a02c48b
|
sanity cleanup, backup config yaml for each log file
|
2024-06-09 11:22:52 -05:00 |
|
|
8d068fa3f9
|
reticulating splines
|
2024-06-08 20:30:15 -05:00 |
|
|
b072f9b96b
|
fixes
|
2024-06-08 16:01:34 -05:00 |
|
|
58fb0a84db
|
added experimental NAR only model (inferences text length, need more experimenting), AudioEmbedding logic cleanup (I still think it's being done wrong)
|
2024-06-08 15:42:02 -05:00 |
|
|
7d6fff24f9
|
un-tensor'd quant_level marker since it doesn't need to be one (I forgot why I had it as one but nothing seems to need it as a tensor that didn't already make it one)
|
2024-06-07 20:46:22 -05:00 |
|
|
f9f309281a
|
ugh
|
2024-06-06 20:55:27 -05:00 |
|