aa1e25fbf5backwards compat for old YAMLs with models, option to set flash attention 2 for Llama (and derivatives), included syncdoth/RetNets torchscale retnet for shits and grins, etc.mrq2024-04-16 10:02:31 -0500
545162195bdeprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other thingsmrq2024-04-15 19:54:32 -0500
d69a00e389Properly pass retention_mask for retnet-HF, attempt to fix recurrent forward for retnet (doesn't work still)mrq2024-04-14 13:12:50 -0500
789bb5d11badd an optional label override for model loading (used for easy testing between 12/16/20/24 layered model)mrq2024-04-13 12:43:35 -0500
f0c4baeb25added Adagrad (experimenting with it), added 'extended' model size (16 layers instead of 12, experimenting with it)mrq2024-04-09 22:04:01 -0500
4d75ee066cactually do the Linear replacement with TE's Linearmrq2024-04-09 14:41:13 -0500
9d97eb5104added FP8 support through NVIDIA/TransformerEngine, added RetNet_HF through syncdoth/RetNet (as an alternative to branch away from torchscale)mrq2024-04-08 20:14:51 -0500
7075c2a5f0added an option to allow injecting embeddings from another model, because it dawned upon me how valuable embeddings from a good model can be for subsequent trainings (defined under cfg.models._embeddings as a relative path to the yaml)mrq2024-04-04 19:11:49 -0500
f3c59c3e7ecleaner replacement code (because I realized BitNet had an implementation for it too), added calculating gradient norm and performing gradient clipping in local trainer (non-deepspeed)mrq2024-03-01 20:18:43 -0600
47435207f7Added cfg.bitsandbytes.replace as a less intrusive alternative to cfg.bitsandbytes.inject to replace all Linear modules in a modelmrq2024-03-01 19:20:10 -0600
0427d8d076logger broke for some reason, added flag to just tqdm.write instead, make cfg.bitsandbytes.bitnet==True yamls denoted since I'm sure they're not interoperablemrq2024-03-01 10:32:35 -0600
35d78a2bb0Yet Another Underlying Transformer Implementation (BitNet, will give it a few days to see how it fares)mrq2024-02-29 20:29:17 -0600
3da1518aceadded Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work)mrq2024-01-31 21:48:36 -0600
cce929e136nasty hotfix for transformer's Mixtral throwing an error when batch sizes > 1mrq2024-01-26 19:41:12 -0600
e799665759experimental weighting of prom/resp embedsmrq2024-01-25 12:18:48 -0600
c690aa509dfixes and compat (MoE-fying an existing model and retraining from there just ruins it after a second of audio...)mrq2023-12-25 21:20:32 -0600
e513d2ef19experts weren't forwarded into constructer (wasted a few days of training garbage)mrq2023-12-23 16:08:17 -0600
0db3203b21added LLaMA/Mixtral (if experts>1) model arches, utilize XMoE's loss as well, set MoE frequency to 1 to make every layer MoE'd for RetNet, etc. (going to do tests without burning out again to see how things go)mrq2023-12-22 19:27:36 -0600
9c198eb75aadded torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works)mrq2023-12-20 18:45:58 -0600
6c51a629ccresetting step count resets the samples processed and other metricsmrq2023-10-29 12:11:19 -0500
0aa2a3cc07evaluation/validation passes language ID during training (oops)mrq2023-10-29 12:00:40 -0500
ed54f4ebecun 'experimental' the better target sequence preparationmrq2023-10-22 09:06:59 -0500
32d4271ca8fixed issue with training from scratch (oops)mrq2023-10-21 09:55:38 -0500
3195026dbafixed issue with the 'add another target audio to artificially create longer sequences' for HDF5 just duplicating the utterance initially sampledmrq2023-10-18 20:38:33 -0500
09cda7d3f9added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanupmrq2023-10-16 19:30:38 -0500
a539f6889fmucked around with the loss calculation, this seems better?mrq2023-10-13 18:22:21 -0500
fb467b19baexposed rolling resp context to the web UI, added passing in language to inferencing command linemrq2023-10-12 23:21:01 -0500
298fd9a5f9fixed issue with webuimrq2023-10-12 22:49:25 -0500
65f500083dtweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to workmrq2023-10-12 22:21:43 -0500
08bae355ebactually use langs from the dataloadermrq2023-10-11 21:21:50 -0500
8740cdefc6added initial support for languages (still testing, marked as model version 3), added experimental 'context extend by limiting the resp context' (untested)mrq2023-10-11 20:38:40 -0500
6045cbce94added experimental option to append utterances for training target (emphasis on experimental)mrq2023-10-11 17:32:45 -0500
7facacf7c9separated samplers into its own file, don't bother copying the logits back to the GPU after sampling, it's not necessarymrq2023-10-11 12:25:31 -0500
100dd164e6apply phoneme cleanup in inferencing as wellmrq2023-10-10 19:21:19 -0500
b4405c98earemove double spaces in the text phonemes (might have caused problems.........)mrq2023-10-10 19:18:24 -0500
99e980d323documentation and more better-er attributionmrq2023-10-10 17:15:16 -0500
e727b6e5c1changed dynamic temperature trigger to be a min-(n)ar-temp value between [0,(n)ar-temp), flags to set min temp, checkbox in web UI to request itmrq2023-10-10 17:02:33 -0500
87db03dd93trim the input prompt to 3 seconds when training NAR tasks (marked as experimental; the paper mentions doing so, but I don't know how much this would harm the retention heads)mrq2023-10-09 22:03:58 -0500
893a610fadcleanup, use deepspeed inferencing pathway if requestedmrq2023-10-09 15:24:04 -0500
26fbb92ec6reduced dynamic temperature threshold to > 1.0, as it seems to not quite be useful for audio LMs, sped up any sampling that touches logits by copying them to CPU first, as accessing tensors on the GPU is slow as balls)mrq2023-10-09 14:46:17 -0500
29873e6dedextend the max temps in the web UI to actually allow dynamic temp samplingmrq2023-10-09 13:30:45 -0500
27483e56f0disabled preparing of SpeechX tasks, added dynamic temperature testing (to-do: test it, credited in the function)mrq2023-10-09 13:01:40 -0500
3db7e7dea1implicitly load checkpoint if deepspeed checkpoint not found, updated setup script to grab the diskcached dataloader thingsmrq2023-10-06 10:02:45 -0500
d12877ee09added option to set probability of selecting the AR during training under a monolithic AR+NAR, added some more to-dos while I have them in mindmrq2023-10-02 16:52:42 -0500
e85b798fbfset default NAR levels to max for the web UImrq2023-09-29 19:14:16 -0500
c7fb740d41do not specify a default dtype for the web UI, let it implicitly load from the yaml insteadmrq2023-09-24 17:54:03 -0500
4abd6564d1fixed training stats not loading from exported weights, a bit of a readme cleanup, updated example training yamlmrq2023-09-23 19:59:00 -0500
9384900ce6revert the frankensteined "train one model but hotload the other" since it kept loading the last exported weights and I'm not supporting this usecase anymore anywaysmrq2023-09-22 13:04:17 -0500
c0b25541e3restructured some things with the model to remove dead weightsmrq2023-09-20 19:10:59 -0500
a6bfe43590added mirostat sampling (given a partially trained model, it got far decent output than I expected, need to test on a better trained model)mrq2023-09-18 18:55:41 -0500
22ffaf3a33have loss for the NAR not-ignore the text prompt, I imagine this should help the NAR and explain why it's always had a bit of an issue with trainingmrq2023-09-15 19:08:44 -0500
4aef798135added picking final candidate based on sum of score instead of first candidate (this changes nothing).mrq2023-09-13 13:19:11 -0500
23a5fdd645implemented a naive beam search (I really should be taking a break)mrq2023-09-12 21:28:07 -0500
d07c63b9d8unified more things with training the AR+NAR monolothic modelmrq2023-09-12 15:54:41 -0500
40ef34e1cathis embedding class definitely works, and migrating from the previous embedding weights seems to work.mrq2023-09-11 14:13:42 -0500
a1f250ffacset default max_levels for NAR to 0 and implicitly set it to max resps levels because the previous way was implicitly assuming all models were outputting at 1+7 RVQ bins.mrq2023-09-10 20:33:33 -0500
671dca88eethrow error when no reference audio is provided in the web UI because someone keeps doing that in the HF spacemrq2023-09-10 15:50:50 -0500
ba71020318added option to limit (or exceed) inferenced RVQ-bin levels through the NARmrq2023-09-10 13:50:13 -0500
c74fe2f718tweaks to web UImrq2023-09-09 22:27:20 -0500
4f61f5c889added option to set the trim length for an input promptmrq2023-09-09 18:04:44 -0500
d10053d11frender README.md markdown for huggingface spacemrq2023-09-09 17:04:51 -0500
bc30026377added advanced sampler parameters to the web UImrq2023-09-09 16:51:36 -0500
5ac119a6e7added light web UI (need to port the telemetry disabling bandaids from aivc)mrq2023-09-09 16:17:20 -0500
10c34c5b98added a length-based decay factor for repetition penaltymrq2023-09-08 21:02:00 -0500
b922f35b6badded documentation on how these new sampling parameters are very iffy and you really need to know what you are doing to use them because this is audio generation and not text generationmrq2023-09-08 20:43:36 -0500
f69aad9c65some day I'll get it rightmrq2023-09-08 15:36:26 -0500
b2907ae7e0seems that my PromEmbedding/RespEmbedding doesn't actually work all that well, naively using dedicated MultiEmbeddings for AR/NAR in the monolithic model is the best way to gomrq2023-09-08 01:03:24 -0500
67617d7d69also cull frozen_params in the params optimizer receives to reduce VRAM it consumesmrq2023-09-07 18:27:02 -0500
8837bc34d7added option to specify parameters to freeze per-model in YAML (because I need to see about committing atrocities with convering an AR into an AR+NAR)mrq2023-09-07 18:19:51 -0500
7ce06432fdfixed the AR+NAR dual model, the resp_emb has to be split up (classifier might too)mrq2023-09-06 19:33:39 -0500
100ca6b7d0added option to use SGD optimizer through the YAML, added option to pass in additional optimizer parameters through the YAML, added experimental unified AR+NAR model (does not seem fruitful in testing)mrq2023-09-06 18:58:35 -0500
451726fdd5added ability to disable activation checkpointing through the YAML (it is very VRAM intensive at double layer size)mrq2023-09-05 15:38:21 -0500
143aee7526removed dedicated interleaved AR codemrq2023-09-03 22:47:03 -0500
2f9cd0842fmerged dedicated interleaved AR code with the normal AR codemrq2023-09-03 22:46:08 -0500