|
bc2a6fa756
|
sanity cleanup: moved experimental features under its own thing
|
2024-06-30 10:37:33 -05:00 |
|
|
b21f74a5c5
|
added summing of external embeddings (at this point i dont think any amount of cope bandaids will get DAC to train nicely, I think the RVQ levels the NAR tends add too much noise if they're not accurate)
|
2024-06-29 23:42:30 -05:00 |
|
|
793ccb16fb
|
ugh
|
2024-06-29 22:14:35 -05:00 |
|
|
2808f881c8
|
cleaned up subjugated audio embedding into a flag, flag can also have it include the original, underlying embedding as well (it seems to do better when set to inclusive)
|
2024-06-29 21:46:35 -05:00 |
|
|
ec5eaebcbc
|
experimental method of using DACs quantizer ""embeddings"" to see if it helps with model quality
|
2024-06-29 19:46:11 -05:00 |
|
|
a8718d35a4
|
nasty bandaid because some of my DAC dataset only has 8 RVQ levels instead of the full 9
|
2024-06-29 10:16:37 -05:00 |
|
|
c4dd523b6f
|
change from chunk-slicing paths for distributed dataloader to instead interleave
|
2024-06-29 10:10:35 -05:00 |
|
|
dd40463803
|
limit eval size because the training batch size seems to be used for the eval dataloader, somehow (bandaid)
|
2024-06-29 09:11:28 -05:00 |
|
|
591d3ac848
|
have eval dataloader use eval batch size for batchedordersampler
|
2024-06-28 22:44:00 -05:00 |
|
|
1a392b69f6
|
local training backend should be a bit more aware of variable batch sizes, maybe
|
2024-06-28 22:39:05 -05:00 |
|
|
83075c1505
|
sort duration buckets to ensure that paths sorted-by-duration are actually sorted by duration (because i didnt know that python dicts can have non-strings as keys), added batching samples based on total duration to ensure best training throughput
|
2024-06-28 22:28:54 -05:00 |
|
|
8fffb94964
|
backport fix from tortoise_tts with local trainer + loading state when training lora
|
2024-06-25 13:41:29 -05:00 |
|
|
62a53eed64
|
fixed deducing tokenizer path, added option to default to naive tokenizer (for old models, like ar+nar-retnet-8)
|
2024-06-18 22:11:14 -05:00 |
|
|
8a986eb480
|
load exported LoRA weights if exists (to-do: make a better LoRA loading mechanism)
|
2024-06-18 21:45:46 -05:00 |
|
|
2bfe786ebd
|
ban stop token for NAR levels (because sometimes it gets sampled and causes problems)
|
2024-06-17 22:14:43 -05:00 |
|
|
7cfb78fa64
|
enable LoRA for targetted RVQ levels (to experiment with, seems to help)
|
2024-06-17 21:45:03 -05:00 |
|
|
7047fcc6e2
|
actually make deepspeed work with LoRAs
|
2024-06-17 13:55:37 -05:00 |
|
|
1d159b1476
|
updated export routine to split LoRA weights from the state dict (should work with deepspeed)
|
2024-06-17 13:28:18 -05:00 |
|
|
726a4b613f
|
naive, rudimentary DeepSpeed support (just live with the LoRA weights living with the original weights, they can be split later)
|
2024-06-17 13:17:24 -05:00 |
|
|
bd0bc10ec0
|
added LoRA policy to decide what layer of the model gets adapted based on simple inclusion/exclusion terms
|
2024-06-17 13:05:06 -05:00 |
|
|
be051d9544
|
added other LoRA method using parametrization rather than linear injection
|
2024-06-17 09:58:34 -05:00 |
|
|
45a39fb79f
|
very rudimentary lora support (no deepspeed support, tested training and saving but not loading yet)
|
2024-06-17 00:09:16 -05:00 |
|
|
19410a919e
|
ugh
|
2024-06-15 12:29:03 -05:00 |
|
|
d343bde09b
|
residual_in_fp32=False for mamba arch backends because it breaks the classifier (output projection / lm head / what-have-you) under AMP
|
2024-06-15 12:08:03 -05:00 |
|
|
ccb14c06ef
|
mamba2-hf using vasqu/mamba2-torch because it lets me use mamba2 without triton ops (training with my 4xV100s are not happy with mamba2 because of triton)
|
2024-06-14 19:42:17 -05:00 |
|
|
31f71fa134
|
sampler update (some brainworm just never actually had a sampler for sample_type=path)
|
2024-06-14 16:55:40 -05:00 |
|
|
b3b67f34ac
|
added option to sort paths by durations to better group equally lengthed sequences together (and there was maybe a logic error from creating the samplers and then interleave-reordering paths, desyncing them, maybe)
|
2024-06-13 22:37:34 -05:00 |
|
|
83eab4fa59
|
actually going for the suggested "2x layers, no intermediate scaling" is wrong for VALL-E, directly copying the normal transformer structure fixes mamba2 performance in the test trainer
|
2024-06-13 20:08:22 -05:00 |
|
|
26da24fd8d
|
mamba updated to fix that pesky NaN error during training
|
2024-06-13 12:38:33 -05:00 |
|
|
bcf3910a17
|
the NAR only dream is dead (it just won't work)
|
2024-06-12 19:49:47 -05:00 |
|
|
a9353cf9fa
|
ugh
|
2024-06-12 00:14:29 -05:00 |
|
|
cca542a4c0
|
ugh
|
2024-06-11 23:59:28 -05:00 |
|
|
65a8960305
|
option to split classifier per-level instead of sharing one (at this point I'm just scrambling to try and cope with training a DAC model, the NAR is being a pain)
|
2024-06-11 22:28:59 -05:00 |
|
|
a7a6e0ac76
|
validated that inferencing works, changed some defaults (NAR benefits from greedy sampling)
|
2024-06-09 17:11:38 -05:00 |
|
|
234f9efc6e
|
ugh
|
2024-06-09 11:39:43 -05:00 |
|
|
132a02c48b
|
sanity cleanup, backup config yaml for each log file
|
2024-06-09 11:22:52 -05:00 |
|
|
8d92dac829
|
forgot I renamed this
|
2024-06-09 11:12:30 -05:00 |
|
|
80f9530840
|
ugh
|
2024-06-09 01:43:44 -05:00 |
|
|
5c732b72ee
|
ugh
|
2024-06-08 20:34:00 -05:00 |
|
|
8d068fa3f9
|
reticulating splines
|
2024-06-08 20:30:15 -05:00 |
|
|
ead3e2f0cb
|
ugh
|
2024-06-08 16:14:57 -05:00 |
|
|
b072f9b96b
|
fixes
|
2024-06-08 16:01:34 -05:00 |
|
|
58fb0a84db
|
added experimental NAR only model (inferences text length, need more experimenting), AudioEmbedding logic cleanup (I still think it's being done wrong)
|
2024-06-08 15:42:02 -05:00 |
|
|
e35a91c67a
|
ugh
|
2024-06-07 21:56:14 -05:00 |
|
|
7d6fff24f9
|
un-tensor'd quant_level marker since it doesn't need to be one (I forgot why I had it as one but nothing seems to need it as a tensor that didn't already make it one)
|
2024-06-07 20:46:22 -05:00 |
|
|
b0158a61d5
|
fixed some logic errors with training (grabbing wrong quant level...)
|
2024-06-07 20:34:36 -05:00 |
|
|
eafa622be2
|
I forgot the actual reason I was cleaning things up was to re-include prom loss calculation (I realized the reason I did this was because of an prom embedding oversight, it seems to work now)
|
2024-06-07 20:29:25 -05:00 |
|
|
da8242d086
|
finally got around to removing omegaconf
|
2024-06-07 20:23:53 -05:00 |
|
|
4ade2b60ee
|
ugh
|
2024-06-06 21:57:11 -05:00 |
|
|
f9f309281a
|
ugh
|
2024-06-06 20:55:27 -05:00 |
|
|
a5c90348d9
|
head hurt
|
2024-06-06 20:51:31 -05:00 |
|
|
516b0894d7
|
m
|
2024-06-06 19:41:26 -05:00 |
|
|
ee25d2e62e
|
removed the need to supply targ_list + different AudioEmbedding + other things
|
2024-06-06 18:52:41 -05:00 |
|
|
fcac9503e2
|
cleanup
|
2024-06-06 13:08:02 -05:00 |
|
|
b2194b859a
|
re-added loading multiple models because I'm now entertaining having split AR/NAR models again (and need a way to load both at once)
|
2024-06-06 09:48:43 -05:00 |
|
|
b05a905b95
|
ugh
|
2024-06-05 21:02:05 -05:00 |
|
|
4073656293
|
oops
|
2024-06-05 20:53:10 -05:00 |
|
|
ff6fe6f1bc
|
cleanup
|
2024-06-05 20:30:43 -05:00 |
|
|
880b4ecd1b
|
cleanup, putting some thoughts in comments before I forget about them
|
2024-06-05 19:50:06 -05:00 |
|
|
3cfc8a96bb
|
oops
|
2024-06-05 10:30:04 -05:00 |
|
|
48cd1054f9
|
madness
|
2024-06-04 23:48:51 -05:00 |
|
|
9e3f2e300f
|
experimental "just have a token for what rvq level we're on" that seems to help all models (mamba almost works, but it might just have to be relegated as a pure AR model)
|
2024-06-04 23:23:31 -05:00 |
|
|
e0886c5a78
|
re-added mamba as a possible non-experimental arch backend (test trainer will set it as AR only, doing any NAR tasks lobotomizes it)
|
2024-06-04 22:41:22 -05:00 |
|
|
687c71e028
|
disable accuracy calc because it breaks with actual batched training even though it shouldn't
|
2024-06-04 22:13:44 -05:00 |
|
|
d005e24953
|
oops
|
2024-06-04 22:10:04 -05:00 |
|
|
0f7f3ae754
|
added loss calc split and acc for experimental model
|
2024-06-04 22:04:40 -05:00 |
|
|
014e565c4b
|
tweaks
|
2024-06-04 20:41:13 -05:00 |
|
|
6d5bd0156a
|
fixes
|
2024-06-04 18:50:48 -05:00 |
|
|
ed3aeaf3a1
|
copy pasted from test to actual trainer
|
2024-06-04 18:40:30 -05:00 |
|
|
0aa01ba31a
|
forgot one crucial detail (you *need* the previous RVQ level to keep coherence between all RVQ levels) (experimental deinterleaved is a bit crusty though)
|
2024-06-04 18:30:30 -05:00 |
|
|
2ffad5cb6f
|
typo
|
2024-06-04 14:20:57 -05:00 |
|
|
406ff7bbe1
|
re-implemented config.model.interleave for the HF-compat experimental method
|
2024-06-04 14:19:52 -05:00 |
|
|
c93d5863fd
|
fixes
|
2024-06-04 00:07:00 -05:00 |
|
|
186b93a77e
|
oops
|
2024-06-03 22:35:55 -05:00 |
|
|
e50edc3b48
|
added a flag to convert to a HF compatible model on export by stitching things
|
2024-06-03 22:34:47 -05:00 |
|
|
934672252b
|
feverish cleanup
|
2024-06-03 21:28:49 -05:00 |
|
|
7feeb944a0
|
probably insane with even entertaining going this route
|
2024-06-03 20:26:27 -05:00 |
|
|
c2a436d368
|
somehow between training sessions grad_norm = None even though it worked before
|
2024-06-02 08:29:27 -05:00 |
|
|
c1fcd889d5
|
reverted automatically disabling split loss calc, since it seems that it's actually cacling loss on prom causes the oddities, maybe
|
2024-06-01 12:34:59 -05:00 |
|
|
8cf176ab46
|
ugh
|
2024-06-01 10:46:42 -05:00 |
|
|
827cf632e7
|
report current loss scale and adjust grad norm by loss scale (for deepspeed)
|
2024-06-01 10:44:32 -05:00 |
|
|
d0ebce6bac
|
ugh
|
2024-06-01 10:30:13 -05:00 |
|
|
39bc019142
|
actually save per-rank sampler states
|
2024-06-01 09:46:32 -05:00 |
|
|
74df2f5332
|
split sampler dict by global_rank, also handle splitting dataset paths by global_rank if sampler_type == path (because I do not trust DistributedSampler) (need to test)
|
2024-06-01 09:29:49 -05:00 |
|
|
31785f4eeb
|
actually don't default to compute split losses, test bitnet model doesn't seem to be doing things right (despite debug printouts showing theyre roughly the same logit/loss sequences, could just be bitnet linears being not up to par on actual models)
|
2024-06-01 09:12:51 -05:00 |
|
|
e9c87060df
|
oops
|
2024-05-31 22:22:28 -05:00 |
|
|
b482ca19ff
|
added model config option to set KV head count for MQA/GQA instead of MHA for llama-based models (i think its very negligible both ways on such a small model size)
|
2024-05-31 19:32:37 -05:00 |
|
|
e15c6c74c3
|
correctness
|
2024-05-30 20:50:45 -05:00 |
|
|
da473295b7
|
better way to compute per-segment losses
|
2024-05-28 19:29:54 -05:00 |
|
|
6c49ad06a3
|
forgot to reinclude mult by loss factors
|
2024-05-27 20:40:21 -05:00 |
|
|
b82f0d5c0c
|
finally nailed the issue that caused logging to break on one machine but not another (bitnet includes zetascale which is a parasite that will break logging)
|
2024-05-27 19:47:58 -05:00 |
|
|
c0ac84c795
|
uh
|
2024-05-27 19:05:56 -05:00 |
|
|
197d517181
|
ugh
|
2024-05-27 17:09:35 -05:00 |
|
|
5af6f41c94
|
added loss calcs against prom (requires the right settings for not shit results, disabled by default)
|
2024-05-27 08:43:00 -05:00 |
|
|
05cd8b797e
|
nevermind it breaks training
|
2024-05-25 18:03:43 -05:00 |
|
|
85f9684720
|
some cleanup
|
2024-05-25 17:46:52 -05:00 |
|
|
d760924719
|
added kludgy eval only so I don't have to start training, type eval, stop training, then delete the logs for that session
|
2024-05-25 17:39:51 -05:00 |
|
|
ddbacde0d1
|
DAC just doesn't work well enough......
|
2024-05-25 11:07:52 -05:00 |
|
|
e3ef89f5aa
|
100x better for subtrain/eval to be by group instead
|
2024-05-19 16:40:14 -05:00 |
|
|
458b95d196
|
added option to split between text loss and audio loss (to-do: document this better), because it may or may not be a problem with LLaMA-backed models because my loss hovers around 3.9 / 56% accuracy despite sounding decent at the moment
|
2024-05-19 11:23:56 -05:00 |
|