Commit Graph

  • b0bd88833c refractor cleanup, had a revelation on how I can handle a batch of varying tasks mrq 2024-04-16 21:04:48 -0500
  • 467fa1c5ee wrapper fixes mrq 2024-04-16 10:19:02 -0500
  • aa1e25fbf5 backwards compat for old YAMLs with models, option to set flash attention 2 for Llama (and derivatives), included syncdoth/RetNets torchscale retnet for shits and grins, etc. mrq 2024-04-16 10:02:31 -0500
  • 545162195b deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things mrq 2024-04-15 19:54:32 -0500
  • d69a00e389 Properly pass retention_mask for retnet-HF, attempt to fix recurrent forward for retnet (doesn't work still) mrq 2024-04-14 13:12:50 -0500
  • 789bb5d11b add an optional label override for model loading (used for easy testing between 12/16/20/24 layered model) mrq 2024-04-13 12:43:35 -0500
  • f0c4baeb25 added Adagrad (experimenting with it), added 'extended' model size (16 layers instead of 12, experimenting with it) mrq 2024-04-09 22:04:01 -0500
  • 4d75ee066c actually do the Linear replacement with TE's Linear mrq 2024-04-09 14:41:13 -0500
  • 9d97eb5104 added FP8 support through NVIDIA/TransformerEngine, added RetNet_HF through syncdoth/RetNet (as an alternative to branch away from torchscale) mrq 2024-04-08 20:14:51 -0500
  • 7075c2a5f0 added an option to allow injecting embeddings from another model, because it dawned upon me how valuable embeddings from a good model can be for subsequent trainings (defined under cfg.models._embeddings as a relative path to the yaml) mrq 2024-04-04 19:11:49 -0500
  • 91062361af tweaks mrq 2024-03-01 20:38:06 -0600
  • f3c59c3e7e cleaner replacement code (because I realized BitNet had an implementation for it too), added calculating gradient norm and performing gradient clipping in local trainer (non-deepspeed) mrq 2024-03-01 20:18:43 -0600
  • 47435207f7 Added cfg.bitsandbytes.replace as a less intrusive alternative to cfg.bitsandbytes.inject to replace all Linear modules in a model mrq 2024-03-01 19:20:10 -0600
  • 0427d8d076 logger broke for some reason, added flag to just tqdm.write instead, make cfg.bitsandbytes.bitnet==True yamls denoted since I'm sure they're not interoperable mrq 2024-03-01 10:32:35 -0600
  • 35d78a2bb0 Yet Another Underlying Transformer Implementation (BitNet, will give it a few days to see how it fares) mrq 2024-02-29 20:29:17 -0600
  • 3da1518ace added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work) mrq 2024-01-31 21:48:36 -0600
  • cce929e136 nasty hotfix for transformer's Mixtral throwing an error when batch sizes > 1 mrq 2024-01-26 19:41:12 -0600
  • e799665759 experimental weighting of prom/resp embeds mrq 2024-01-25 12:18:48 -0600
  • c690aa509d fixes and compat (MoE-fying an existing model and retraining from there just ruins it after a second of audio...) mrq 2023-12-25 21:20:32 -0600
  • e513d2ef19 experts weren't forwarded into constructer (wasted a few days of training garbage) mrq 2023-12-23 16:08:17 -0600
  • 0db3203b21 added LLaMA/Mixtral (if experts>1) model arches, utilize XMoE's loss as well, set MoE frequency to 1 to make every layer MoE'd for RetNet, etc. (going to do tests without burning out again to see how things go) mrq 2023-12-22 19:27:36 -0600
  • 9c198eb75a added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works) mrq 2023-12-20 18:45:58 -0600
  • 6c51a629cc resetting step count resets the samples processed and other metrics mrq 2023-10-29 12:11:19 -0500
  • 0aa2a3cc07 evaluation/validation passes language ID during training (oops) mrq 2023-10-29 12:00:40 -0500
  • ed54f4ebec un 'experimental' the better target sequence preparation mrq 2023-10-22 09:06:59 -0500
  • 9a6040383e make validation samplers ignore sampler type mrq 2023-10-22 09:01:47 -0500
  • 32d4271ca8 fixed issue with training from scratch (oops) mrq 2023-10-21 09:55:38 -0500
  • 3195026dba fixed issue with the 'add another target audio to artificially create longer sequences' for HDF5 just duplicating the utterance initially sampled mrq 2023-10-18 20:38:33 -0500
  • 09cda7d3f9 added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup mrq 2023-10-16 19:30:38 -0500
  • a539f6889f mucked around with the loss calculation, this seems better? mrq 2023-10-13 18:22:21 -0500
  • fb467b19ba exposed rolling resp context to the web UI, added passing in language to inferencing command line mrq 2023-10-12 23:21:01 -0500
  • 298fd9a5f9 fixed issue with webui mrq 2023-10-12 22:49:25 -0500
  • 65f500083d tweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to work mrq 2023-10-12 22:21:43 -0500
  • 08bae355eb actually use langs from the dataloader mrq 2023-10-11 21:21:50 -0500
  • 3af19d79fd oops mrq 2023-10-11 20:49:54 -0500
  • 8740cdefc6 added initial support for languages (still testing, marked as model version 3), added experimental 'context extend by limiting the resp context' (untested) mrq 2023-10-11 20:38:40 -0500
  • 6045cbce94 added experimental option to append utterances for training target (emphasis on experimental) mrq 2023-10-11 17:32:45 -0500
  • 7facacf7c9 separated samplers into its own file, don't bother copying the logits back to the GPU after sampling, it's not necessary mrq 2023-10-11 12:25:31 -0500
  • 100dd164e6 apply phoneme cleanup in inferencing as well mrq 2023-10-10 19:21:19 -0500
  • b4405c98ea remove double spaces in the text phonemes (might have caused problems.........) mrq 2023-10-10 19:18:24 -0500
  • 47b3077415 fixed mirostat issue mrq 2023-10-10 18:09:49 -0500
  • 99e980d323 documentation and more better-er attribution mrq 2023-10-10 17:15:16 -0500
  • e727b6e5c1 changed dynamic temperature trigger to be a min-(n)ar-temp value between [0,(n)ar-temp), flags to set min temp, checkbox in web UI to request it mrq 2023-10-10 17:02:33 -0500
  • ec25f56bd9 used torch.max fixes things, somehow, for dynamic temp sampling mrq 2023-10-10 16:42:24 -0500
  • 87db03dd93 trim the input prompt to 3 seconds when training NAR tasks (marked as experimental; the paper mentions doing so, but I don't know how much this would harm the retention heads) mrq 2023-10-09 22:03:58 -0500
  • 893a610fad cleanup, use deepspeed inferencing pathway if requested mrq 2023-10-09 15:24:04 -0500
  • 26fbb92ec6 reduced dynamic temperature threshold to > 1.0, as it seems to not quite be useful for audio LMs, sped up any sampling that touches logits by copying them to CPU first, as accessing tensors on the GPU is slow as balls) mrq 2023-10-09 14:46:17 -0500
  • 29873e6ded extend the max temps in the web UI to actually allow dynamic temp sampling mrq 2023-10-09 13:30:45 -0500
  • 27483e56f0 disabled preparing of SpeechX tasks, added dynamic temperature testing (to-do: test it, credited in the function) mrq 2023-10-09 13:01:40 -0500
  • 2deb995cc9 updated setup script mrq 2023-10-06 20:08:28 -0500
  • 1fd91b6437 cleanup mrq 2023-10-06 10:13:54 -0500
  • 3db7e7dea1 implicitly load checkpoint if deepspeed checkpoint not found, updated setup script to grab the diskcached dataloader things mrq 2023-10-06 10:02:45 -0500
  • 82f02ae9b1 oops mrq 2023-10-06 09:26:52 -0500
  • 2f2505b12f updated setup script mrq 2023-10-06 08:08:28 -0500
  • 63cc9cf37a added compat flags for torchscale because the maintainer for torchscale broke compat for existing models mrq 2023-10-05 16:39:46 -0500
  • 12cfc9e502 added prodigyopt as a dependency because I keep forgetting mrq 2023-10-04 19:42:56 -0500
  • 153f8b293c added min-x and min-y arguments to plot.py, helper script to download from my existing checkpoint mrq 2023-10-04 19:41:37 -0500
  • 777ba43305 oops mrq 2023-10-03 15:01:37 -0500
  • d12877ee09 added option to set probability of selecting the AR during training under a monolithic AR+NAR, added some more to-dos while I have them in mind mrq 2023-10-02 16:52:42 -0500
  • e85b798fbf set default NAR levels to max for the web UI mrq 2023-09-29 19:14:16 -0500
  • c7fb740d41 do not specify a default dtype for the web UI, let it implicitly load from the yaml instead mrq 2023-09-24 17:54:03 -0500
  • 4abd6564d1 fixed training stats not loading from exported weights, a bit of a readme cleanup, updated example training yaml mrq 2023-09-23 19:59:00 -0500
  • 9384900ce6 revert the frankensteined "train one model but hotload the other" since it kept loading the last exported weights and I'm not supporting this usecase anymore anyways mrq 2023-09-22 13:04:17 -0500
  • e7da1eb90d edge case mrq 2023-09-20 19:20:17 -0500
  • c0b25541e3 restructured some things with the model to remove dead weights mrq 2023-09-20 19:10:59 -0500
  • a6bfe43590 added mirostat sampling (given a partially trained model, it got far decent output than I expected, need to test on a better trained model) mrq 2023-09-18 18:55:41 -0500
  • 2567e082b5 UGH mrq 2023-09-16 00:26:13 -0500
  • 22ffaf3a33 have loss for the NAR not-ignore the text prompt, I imagine this should help the NAR and explain why it's always had a bit of an issue with training mrq 2023-09-15 19:08:44 -0500
  • 4aef798135 added picking final candidate based on sum of score instead of first candidate (this changes nothing). mrq 2023-09-13 13:19:11 -0500
  • 23a5fdd645 implemented a naive beam search (I really should be taking a break) mrq 2023-09-12 21:28:07 -0500
  • a6ae344e5b some comments mrq 2023-09-12 16:04:45 -0500
  • d07c63b9d8 unified more things with training the AR+NAR monolothic model mrq 2023-09-12 15:54:41 -0500
  • 40ef34e1ca this embedding class definitely works, and migrating from the previous embedding weights seems to work. mrq 2023-09-11 14:13:42 -0500
  • a1f250ffac set default max_levels for NAR to 0 and implicitly set it to max resps levels because the previous way was implicitly assuming all models were outputting at 1+7 RVQ bins. mrq 2023-09-10 20:33:33 -0500
  • 671dca88ee throw error when no reference audio is provided in the web UI because someone keeps doing that in the HF space mrq 2023-09-10 15:50:50 -0500
  • ba71020318 added option to limit (or exceed) inferenced RVQ-bin levels through the NAR mrq 2023-09-10 13:50:13 -0500
  • c74fe2f718 tweaks to web UI mrq 2023-09-09 22:27:20 -0500
  • 7f8bd2b936 added printing elasped inference time mrq 2023-09-09 20:05:03 -0500
  • 4f61f5c889 added option to set the trim length for an input prompt mrq 2023-09-09 18:04:44 -0500
  • d10053d11f render README.md markdown for huggingface space mrq 2023-09-09 17:04:51 -0500
  • bc30026377 added advanced sampler parameters to the web UI mrq 2023-09-09 16:51:36 -0500
  • 5ac119a6e7 added light web UI (need to port the telemetry disabling bandaids from aivc) mrq 2023-09-09 16:17:20 -0500
  • 10c34c5b98 added a length-based decay factor for repetition penalty mrq 2023-09-08 21:02:00 -0500
  • b922f35b6b added documentation on how these new sampling parameters are very iffy and you really need to know what you are doing to use them because this is audio generation and not text generation mrq 2023-09-08 20:43:36 -0500
  • 14c78bae39 added lots of sampling options (top-k/top-p, repetition penalty, length penalty) mrq 2023-09-08 20:30:54 -0500
  • f69aad9c65 some day I'll get it right mrq 2023-09-08 15:36:26 -0500
  • b2907ae7e0 seems that my PromEmbedding/RespEmbedding doesn't actually work all that well, naively using dedicated MultiEmbeddings for AR/NAR in the monolithic model is the best way to go mrq 2023-09-08 01:03:24 -0500
  • 67617d7d69 also cull frozen_params in the params optimizer receives to reduce VRAM it consumes mrq 2023-09-07 18:27:02 -0500
  • 8837bc34d7 added option to specify parameters to freeze per-model in YAML (because I need to see about committing atrocities with convering an AR into an AR+NAR) mrq 2023-09-07 18:19:51 -0500
  • c47fc3274e added backwards compat flag mrq 2023-09-07 17:12:17 -0500
  • ab5134f385 tweaks and fixes mrq 2023-09-07 17:08:38 -0500
  • b2c2dec291 added homebrewed per-RVQ-bin embedding solutions mrq 2023-09-07 16:48:02 -0500
  • e7a67410d1 oops mrq 2023-09-07 09:14:03 -0500
  • 712808494f added support for optional prodigy optimizer (https://github.com/konstmish/prodigy) although it consumes a lot more VRAM per parameter mrq 2023-09-06 20:33:16 -0500
  • 7ce06432fd fixed the AR+NAR dual model, the resp_emb has to be split up (classifier might too) mrq 2023-09-06 19:33:39 -0500
  • 100ca6b7d0 added option to use SGD optimizer through the YAML, added option to pass in additional optimizer parameters through the YAML, added experimental unified AR+NAR model (does not seem fruitful in testing) mrq 2023-09-06 18:58:35 -0500
  • 451726fdd5 added ability to disable activation checkpointing through the YAML (it is very VRAM intensive at double layer size) mrq 2023-09-05 15:38:21 -0500
  • 143aee7526 removed dedicated interleaved AR code mrq 2023-09-03 22:47:03 -0500
  • 2f9cd0842f merged dedicated interleaved AR code with the normal AR code mrq 2023-09-03 22:46:08 -0500
  • 3a6bd50322 haha mrq 2023-09-03 21:36:58 -0500