Commit Graph

539 Commits

Author SHA1 Message Date
mrq
5120ffdda7 god it would be nice to know the best way to handle audio embeddings, because I genuinely don't know without skimming through papers or devoting X amount of GPU hours in training 2024-04-29 18:24:05 -05:00
mrq
6a11bc9cb6 update tokenizer because, for some reason, it had the wrong order for the special tokens to where eos = unk 2024-04-29 09:09:26 -05:00
mrq
57810e4ba4 metadata only path (might drop HDF5 since its giving file sizes twice as large as my actual unpacked dataset) 2024-04-28 23:03:09 -05:00
mrq
caad7ee3c9 final tweaks, hopefully 2024-04-28 22:28:29 -05:00
mrq
ffc334cf58 added dataset transcription helper script (now I don't ever have to touch ai-voice-cloning) (to-do: unify scripts into the module) 2024-04-21 17:43:20 -05:00
mrq
b251669536 forgot to fix up the test trainer 2024-04-21 14:58:04 -05:00
mrq
071fb97777 dataset preparation script updates, caved and am using HF tokenizer now 2024-04-21 14:49:18 -05:00
mrq
a8ffa88844 it slipped my mind that technically DAC can be used at any sample rate, since it models waveforms; make it a config YAML option to allow this behavior 2024-04-19 18:36:54 -05:00
mrq
00804a47e9 Forgot to copy intermediary dataset conversion script 2024-04-18 21:34:28 -05:00
mrq
8214aa23d7 converting over to a different intermediary dataset format 2024-04-18 21:24:06 -05:00
mrq
4f5c9e518a actually use the passed-through sample rate from encode for DAC because it does its own resampling I guess 2024-04-18 13:32:41 -05:00
mrq
2e9e6e68f7 Forgot I need to use the DAC's 44K model because 24K model has 32 codebooks instead of 9. 2024-04-17 20:59:25 -05:00
mrq
5ff2b4aab5 finally swallowing the Descript-Audio-Codec pill (I guess I'm going to have to regenerate my entire dataset) 2024-04-17 20:39:35 -05:00
mrq
b0bd88833c refractor cleanup, had a revelation on how I can handle a batch of varying tasks 2024-04-16 21:04:48 -05:00
mrq
467fa1c5ee wrapper fixes 2024-04-16 10:19:02 -05:00
mrq
aa1e25fbf5 backwards compat for old YAMLs with models, option to set flash attention 2 for Llama (and derivatives), included syncdoth/RetNets torchscale retnet for shits and grins, etc. 2024-04-16 10:02:31 -05:00
mrq
545162195b deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things 2024-04-15 19:54:32 -05:00
mrq
d69a00e389 Properly pass retention_mask for retnet-HF, attempt to fix recurrent forward for retnet (doesn't work still) 2024-04-14 13:12:50 -05:00
mrq
789bb5d11b add an optional label override for model loading (used for easy testing between 12/16/20/24 layered model) 2024-04-13 12:43:35 -05:00
mrq
f0c4baeb25 added Adagrad (experimenting with it), added 'extended' model size (16 layers instead of 12, experimenting with it) 2024-04-09 22:04:01 -05:00
mrq
4d75ee066c actually do the Linear replacement with TE's Linear 2024-04-09 14:41:13 -05:00
mrq
9d97eb5104 added FP8 support through NVIDIA/TransformerEngine, added RetNet_HF through syncdoth/RetNet (as an alternative to branch away from torchscale) 2024-04-08 20:14:51 -05:00
mrq
7075c2a5f0 added an option to allow injecting embeddings from another model, because it dawned upon me how valuable embeddings from a good model can be for subsequent trainings (defined under cfg.models._embeddings as a relative path to the yaml) 2024-04-04 19:11:49 -05:00
mrq
91062361af tweaks 2024-03-01 20:38:06 -06:00
mrq
f3c59c3e7e cleaner replacement code (because I realized BitNet had an implementation for it too), added calculating gradient norm and performing gradient clipping in local trainer (non-deepspeed) 2024-03-01 20:18:43 -06:00
mrq
47435207f7 Added cfg.bitsandbytes.replace as a less intrusive alternative to cfg.bitsandbytes.inject to replace all Linear modules in a model 2024-03-01 19:20:10 -06:00
mrq
0427d8d076 logger broke for some reason, added flag to just tqdm.write instead, make cfg.bitsandbytes.bitnet==True yamls denoted since I'm sure they're not interoperable 2024-03-01 10:32:35 -06:00
mrq
35d78a2bb0 Yet Another Underlying Transformer Implementation (BitNet, will give it a few days to see how it fares) 2024-02-29 20:29:17 -06:00
mrq
3da1518ace added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work) 2024-01-31 21:48:36 -06:00
mrq
cce929e136 nasty hotfix for transformer's Mixtral throwing an error when batch sizes > 1 2024-01-26 19:41:12 -06:00
mrq
e799665759 experimental weighting of prom/resp embeds 2024-01-25 12:18:48 -06:00
mrq
c690aa509d fixes and compat (MoE-fying an existing model and retraining from there just ruins it after a second of audio...) 2023-12-25 21:20:32 -06:00
mrq
e513d2ef19 experts weren't forwarded into constructer (wasted a few days of training garbage) 2023-12-23 16:08:17 -06:00
mrq
0db3203b21 added LLaMA/Mixtral (if experts>1) model arches, utilize XMoE's loss as well, set MoE frequency to 1 to make every layer MoE'd for RetNet, etc. (going to do tests without burning out again to see how things go) 2023-12-22 19:27:36 -06:00
mrq
9c198eb75a added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works) 2023-12-20 18:45:58 -06:00
mrq
6c51a629cc resetting step count resets the samples processed and other metrics 2023-10-29 12:11:19 -05:00
mrq
0aa2a3cc07 evaluation/validation passes language ID during training (oops) 2023-10-29 12:00:40 -05:00
mrq
ed54f4ebec un 'experimental' the better target sequence preparation 2023-10-22 09:06:59 -05:00
mrq
9a6040383e make validation samplers ignore sampler type 2023-10-22 09:01:47 -05:00
mrq
32d4271ca8 fixed issue with training from scratch (oops) 2023-10-21 09:55:38 -05:00
mrq
3195026dba fixed issue with the 'add another target audio to artificially create longer sequences' for HDF5 just duplicating the utterance initially sampled 2023-10-18 20:38:33 -05:00
mrq
09cda7d3f9 added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup 2023-10-16 19:30:38 -05:00
mrq
a539f6889f mucked around with the loss calculation, this seems better? 2023-10-13 18:22:21 -05:00
mrq
fb467b19ba exposed rolling resp context to the web UI, added passing in language to inferencing command line 2023-10-12 23:21:01 -05:00
mrq
298fd9a5f9 fixed issue with webui 2023-10-12 22:49:25 -05:00
mrq
65f500083d tweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to work 2023-10-12 22:21:43 -05:00
mrq
08bae355eb actually use langs from the dataloader 2023-10-11 21:21:50 -05:00
mrq
3af19d79fd oops 2023-10-11 20:49:54 -05:00
mrq
8740cdefc6 added initial support for languages (still testing, marked as model version 3), added experimental 'context extend by limiting the resp context' (untested) 2023-10-11 20:38:40 -05:00
mrq
6045cbce94 added experimental option to append utterances for training target (emphasis on experimental) 2023-10-11 17:32:45 -05:00