Commit Graph

82 Commits

Author SHA1 Message Date
mrq
917eeb40d2 ughhh 2024-05-12 08:22:39 -05:00
mrq
9910c75d5a checkpointing for bitnet impl 2024-05-12 07:52:54 -05:00
mrq
14709ac67f ughh 2024-05-12 07:30:59 -05:00
mrq
a755eb3c62 ugh 2024-05-11 17:34:45 -05:00
mrq
88e9b9caff local ddp fix 2024-05-11 17:29:01 -05:00
mrq
3337c69e5a leverage between xformers and torch.backends.cuda.sdp_kernel for attention 2024-05-11 17:14:05 -05:00
mrq
d33c7bb7cf ugh 2024-05-11 16:47:19 -05:00
mrq
2109712e5b resolve deprecation warning that doesn't show on my old training rig but does on my new one 2024-05-09 23:25:44 -05:00
mrq
1547de5020 haha... 2024-05-09 23:15:52 -05:00
mrq
0d5d545a40 crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc. 2024-05-09 20:28:20 -05:00
mrq
33b7f81b94 small cleanups 2024-05-04 22:37:22 -05:00
mrq
253441b750 forgot to disable verbose flag 2024-05-04 13:13:52 -05:00
mrq
3dca1125f5 implemented xformers in HF's Llama (because theres no flash attention for Volta cards) 2024-05-04 13:07:45 -05:00
mrq
ffa200eec7 added option to specify frames per second for the given audio representation (Encodec is 75Hz, DAC is 41Hz (at 24K sources)) 2024-05-04 12:05:41 -05:00
mrq
b5d1456a09 backwards compat for my shitty old weights (was testing if disabling AudioEmbedding summing magically made things better (it did not)) 2024-04-29 22:14:01 -05:00
mrq
5120ffdda7 god it would be nice to know the best way to handle audio embeddings, because I genuinely don't know without skimming through papers or devoting X amount of GPU hours in training 2024-04-29 18:24:05 -05:00
mrq
b0bd88833c refractor cleanup, had a revelation on how I can handle a batch of varying tasks 2024-04-16 21:04:48 -05:00
mrq
467fa1c5ee wrapper fixes 2024-04-16 10:19:02 -05:00
mrq
aa1e25fbf5 backwards compat for old YAMLs with models, option to set flash attention 2 for Llama (and derivatives), included syncdoth/RetNets torchscale retnet for shits and grins, etc. 2024-04-16 10:02:31 -05:00
mrq
545162195b deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things 2024-04-15 19:54:32 -05:00
mrq
d69a00e389 Properly pass retention_mask for retnet-HF, attempt to fix recurrent forward for retnet (doesn't work still) 2024-04-14 13:12:50 -05:00
mrq
9d97eb5104 added FP8 support through NVIDIA/TransformerEngine, added RetNet_HF through syncdoth/RetNet (as an alternative to branch away from torchscale) 2024-04-08 20:14:51 -05:00
mrq
7075c2a5f0 added an option to allow injecting embeddings from another model, because it dawned upon me how valuable embeddings from a good model can be for subsequent trainings (defined under cfg.models._embeddings as a relative path to the yaml) 2024-04-04 19:11:49 -05:00
mrq
35d78a2bb0 Yet Another Underlying Transformer Implementation (BitNet, will give it a few days to see how it fares) 2024-02-29 20:29:17 -06:00
mrq
3da1518ace added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work) 2024-01-31 21:48:36 -06:00
mrq
cce929e136 nasty hotfix for transformer's Mixtral throwing an error when batch sizes > 1 2024-01-26 19:41:12 -06:00
mrq
e799665759 experimental weighting of prom/resp embeds 2024-01-25 12:18:48 -06:00
mrq
c690aa509d fixes and compat (MoE-fying an existing model and retraining from there just ruins it after a second of audio...) 2023-12-25 21:20:32 -06:00
mrq
0db3203b21 added LLaMA/Mixtral (if experts>1) model arches, utilize XMoE's loss as well, set MoE frequency to 1 to make every layer MoE'd for RetNet, etc. (going to do tests without burning out again to see how things go) 2023-12-22 19:27:36 -06:00
mrq
9c198eb75a added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works) 2023-12-20 18:45:58 -06:00
mrq
9a6040383e make validation samplers ignore sampler type 2023-10-22 09:01:47 -05:00
mrq
a539f6889f mucked around with the loss calculation, this seems better? 2023-10-13 18:22:21 -05:00
mrq
65f500083d tweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to work 2023-10-12 22:21:43 -05:00
mrq
08bae355eb actually use langs from the dataloader 2023-10-11 21:21:50 -05:00
mrq
3af19d79fd oops 2023-10-11 20:49:54 -05:00
mrq
8740cdefc6 added initial support for languages (still testing, marked as model version 3), added experimental 'context extend by limiting the resp context' (untested) 2023-10-11 20:38:40 -05:00
mrq
7facacf7c9 separated samplers into its own file, don't bother copying the logits back to the GPU after sampling, it's not necessary 2023-10-11 12:25:31 -05:00
mrq
47b3077415 fixed mirostat issue 2023-10-10 18:09:49 -05:00
mrq
99e980d323 documentation and more better-er attribution 2023-10-10 17:15:16 -05:00
mrq
e727b6e5c1 changed dynamic temperature trigger to be a min-(n)ar-temp value between [0,(n)ar-temp), flags to set min temp, checkbox in web UI to request it 2023-10-10 17:02:33 -05:00
mrq
ec25f56bd9 used torch.max fixes things, somehow, for dynamic temp sampling 2023-10-10 16:42:24 -05:00
mrq
26fbb92ec6 reduced dynamic temperature threshold to > 1.0, as it seems to not quite be useful for audio LMs, sped up any sampling that touches logits by copying them to CPU first, as accessing tensors on the GPU is slow as balls) 2023-10-09 14:46:17 -05:00
mrq
27483e56f0 disabled preparing of SpeechX tasks, added dynamic temperature testing (to-do: test it, credited in the function) 2023-10-09 13:01:40 -05:00
mrq
2deb995cc9 updated setup script 2023-10-06 20:08:28 -05:00
mrq
63cc9cf37a added compat flags for torchscale because the maintainer for torchscale broke compat for existing models 2023-10-05 16:39:46 -05:00
mrq
c0b25541e3 restructured some things with the model to remove dead weights 2023-09-20 19:10:59 -05:00
mrq
a6bfe43590 added mirostat sampling (given a partially trained model, it got far decent output than I expected, need to test on a better trained model) 2023-09-18 18:55:41 -05:00
mrq
2567e082b5 UGH 2023-09-16 00:26:13 -05:00
mrq
22ffaf3a33 have loss for the NAR not-ignore the text prompt, I imagine this should help the NAR and explain why it's always had a bit of an issue with training 2023-09-15 19:08:44 -05:00
mrq
4aef798135 added picking final candidate based on sum of score instead of first candidate (this changes nothing). 2023-09-13 13:19:11 -05:00