Commit Graph

167 Commits (master)
 

Author SHA1 Message Date
mrq 7075c2a5f0 added an option to allow injecting embeddings from another model, because it dawned upon me how valuable embeddings from a good model can be for subsequent trainings (defined under cfg.models._embeddings as a relative path to the yaml) 2024-04-04 19:11:49 +07:00
mrq 91062361af tweaks 2024-03-01 20:38:06 +07:00
mrq f3c59c3e7e cleaner replacement code (because I realized BitNet had an implementation for it too), added calculating gradient norm and performing gradient clipping in local trainer (non-deepspeed) 2024-03-01 20:18:43 +07:00
mrq 47435207f7 Added cfg.bitsandbytes.replace as a less intrusive alternative to cfg.bitsandbytes.inject to replace all Linear modules in a model 2024-03-01 19:20:10 +07:00
mrq 0427d8d076 logger broke for some reason, added flag to just tqdm.write instead, make cfg.bitsandbytes.bitnet==True yamls denoted since I'm sure they're not interoperable 2024-03-01 10:32:35 +07:00
mrq 35d78a2bb0 Yet Another Underlying Transformer Implementation (BitNet, will give it a few days to see how it fares) 2024-02-29 20:29:17 +07:00
mrq 3da1518ace added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work) 2024-01-31 21:48:36 +07:00
mrq cce929e136 nasty hotfix for transformer's Mixtral throwing an error when batch sizes > 1 2024-01-26 19:41:12 +07:00
mrq e799665759 experimental weighting of prom/resp embeds 2024-01-25 12:18:48 +07:00
mrq c690aa509d fixes and compat (MoE-fying an existing model and retraining from there just ruins it after a second of audio...) 2023-12-25 21:20:32 +07:00
mrq e513d2ef19 experts weren't forwarded into constructer (wasted a few days of training garbage) 2023-12-23 16:08:17 +07:00
mrq 0db3203b21 added LLaMA/Mixtral (if experts>1) model arches, utilize XMoE's loss as well, set MoE frequency to 1 to make every layer MoE'd for RetNet, etc. (going to do tests without burning out again to see how things go) 2023-12-22 19:27:36 +07:00
mrq 9c198eb75a added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works) 2023-12-20 18:45:58 +07:00
mrq 6c51a629cc resetting step count resets the samples processed and other metrics 2023-10-29 12:11:19 +07:00
mrq 0aa2a3cc07 evaluation/validation passes language ID during training (oops) 2023-10-29 12:00:40 +07:00
mrq ed54f4ebec un 'experimental' the better target sequence preparation 2023-10-22 09:06:59 +07:00
mrq 9a6040383e make validation samplers ignore sampler type 2023-10-22 09:01:47 +07:00
mrq 32d4271ca8 fixed issue with training from scratch (oops) 2023-10-21 09:55:38 +07:00
mrq 3195026dba fixed issue with the 'add another target audio to artificially create longer sequences' for HDF5 just duplicating the utterance initially sampled 2023-10-18 20:38:33 +07:00
mrq 09cda7d3f9 added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup 2023-10-16 19:30:38 +07:00
mrq a539f6889f mucked around with the loss calculation, this seems better? 2023-10-13 18:22:21 +07:00
mrq fb467b19ba exposed rolling resp context to the web UI, added passing in language to inferencing command line 2023-10-12 23:21:01 +07:00
mrq 298fd9a5f9 fixed issue with webui 2023-10-12 22:49:25 +07:00
mrq 65f500083d tweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to work 2023-10-12 22:21:43 +07:00
mrq 08bae355eb actually use langs from the dataloader 2023-10-11 21:21:50 +07:00
mrq 3af19d79fd oops 2023-10-11 20:49:54 +07:00
mrq 8740cdefc6 added initial support for languages (still testing, marked as model version 3), added experimental 'context extend by limiting the resp context' (untested) 2023-10-11 20:38:40 +07:00
mrq 6045cbce94 added experimental option to append utterances for training target (emphasis on experimental) 2023-10-11 17:32:45 +07:00
mrq 7facacf7c9 separated samplers into its own file, don't bother copying the logits back to the GPU after sampling, it's not necessary 2023-10-11 12:25:31 +07:00
mrq 100dd164e6 apply phoneme cleanup in inferencing as well 2023-10-10 19:21:19 +07:00
mrq b4405c98ea remove double spaces in the text phonemes (might have caused problems.........) 2023-10-10 19:18:24 +07:00
mrq 47b3077415 fixed mirostat issue 2023-10-10 18:09:49 +07:00
mrq 99e980d323 documentation and more better-er attribution 2023-10-10 17:15:16 +07:00
mrq e727b6e5c1 changed dynamic temperature trigger to be a min-(n)ar-temp value between [0,(n)ar-temp), flags to set min temp, checkbox in web UI to request it 2023-10-10 17:02:33 +07:00
mrq ec25f56bd9 used torch.max fixes things, somehow, for dynamic temp sampling 2023-10-10 16:42:24 +07:00
mrq 87db03dd93 trim the input prompt to 3 seconds when training NAR tasks (marked as experimental; the paper mentions doing so, but I don't know how much this would harm the retention heads) 2023-10-09 22:03:58 +07:00
mrq 893a610fad cleanup, use deepspeed inferencing pathway if requested 2023-10-09 15:24:04 +07:00
mrq 26fbb92ec6 reduced dynamic temperature threshold to > 1.0, as it seems to not quite be useful for audio LMs, sped up any sampling that touches logits by copying them to CPU first, as accessing tensors on the GPU is slow as balls) 2023-10-09 14:46:17 +07:00
mrq 29873e6ded extend the max temps in the web UI to actually allow dynamic temp sampling 2023-10-09 13:30:45 +07:00
mrq 27483e56f0 disabled preparing of SpeechX tasks, added dynamic temperature testing (to-do: test it, credited in the function) 2023-10-09 13:01:40 +07:00
mrq 2deb995cc9 updated setup script 2023-10-06 20:08:28 +07:00
mrq 1fd91b6437 cleanup 2023-10-06 10:13:54 +07:00
mrq 3db7e7dea1 implicitly load checkpoint if deepspeed checkpoint not found, updated setup script to grab the diskcached dataloader things 2023-10-06 10:02:45 +07:00
mrq 82f02ae9b1 oops 2023-10-06 09:26:52 +07:00
mrq 2f2505b12f updated setup script 2023-10-06 08:08:28 +07:00
mrq 63cc9cf37a added compat flags for torchscale because the maintainer for torchscale broke compat for existing models 2023-10-05 16:39:46 +07:00
mrq 12cfc9e502 added prodigyopt as a dependency because I keep forgetting 2023-10-04 19:42:56 +07:00
mrq 153f8b293c added min-x and min-y arguments to plot.py, helper script to download from my existing checkpoint 2023-10-04 19:41:37 +07:00
mrq 777ba43305 oops 2023-10-03 15:01:37 +07:00
mrq d12877ee09 added option to set probability of selecting the AR during training under a monolithic AR+NAR, added some more to-dos while I have them in mind 2023-10-02 16:52:42 +07:00