Commit Graph

194 Commits

Author SHA1 Message Date
mrq
b6131565ad autotune? 2024-05-09 21:25:40 -05:00
mrq
6ed6ab8c03 a bit more cleanup for deepspeed ds_cfg creation 2024-05-09 21:00:26 -05:00
mrq
0d5d545a40 crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc. 2024-05-09 20:28:20 -05:00
mrq
c6e0f905b5 final tweaks (again) before training restarts 2024-05-08 02:11:38 -05:00
mrq
215800484d correcting my wrong of assuming I could just use raw 24Khz audio in the 44Khz DAC without too much of an issue (there are issues) 2024-05-04 23:49:15 -05:00
mrq
9f738fbd5b seems I actually don't need RVQ bins 9-32 with the 24Khz DAC model........ (time to requantize my audio...) 2024-05-04 23:09:18 -05:00
mrq
33b7f81b94 small cleanups 2024-05-04 22:37:22 -05:00
mrq
8aa1b2dabf documentation update 2024-05-04 21:03:46 -05:00
mrq
253441b750 forgot to disable verbose flag 2024-05-04 13:13:52 -05:00
mrq
3dca1125f5 implemented xformers in HF's Llama (because theres no flash attention for Volta cards) 2024-05-04 13:07:45 -05:00
mrq
277dcec484 apparently I got an error for trying to serialize an errant tensor that made its way into the json, this could be remedied easily with recursively traversing the dict and coercing any objects to primitives, but I'm tired and I just want to start training and nap 2024-05-04 12:33:43 -05:00
mrq
ffa200eec7 added option to specify frames per second for the given audio representation (Encodec is 75Hz, DAC is 41Hz (at 24K sources)) 2024-05-04 12:05:41 -05:00
mrq
c494894261 simple DDP wrapper (for my NVlink test) 2024-05-04 11:48:26 -05:00
mrq
a7b43b98b5 renamed cfg.bitsandbytes to cfg.optimizations (and having it serve as cfg.optimizations.bitsandbytes) 2024-05-02 20:08:59 -05:00
mrq
b5d1456a09 backwards compat for my shitty old weights (was testing if disabling AudioEmbedding summing magically made things better (it did not)) 2024-04-29 22:14:01 -05:00
mrq
5120ffdda7 god it would be nice to know the best way to handle audio embeddings, because I genuinely don't know without skimming through papers or devoting X amount of GPU hours in training 2024-04-29 18:24:05 -05:00
mrq
6a11bc9cb6 update tokenizer because, for some reason, it had the wrong order for the special tokens to where eos = unk 2024-04-29 09:09:26 -05:00
mrq
57810e4ba4 metadata only path (might drop HDF5 since its giving file sizes twice as large as my actual unpacked dataset) 2024-04-28 23:03:09 -05:00
mrq
caad7ee3c9 final tweaks, hopefully 2024-04-28 22:28:29 -05:00
mrq
ffc334cf58 added dataset transcription helper script (now I don't ever have to touch ai-voice-cloning) (to-do: unify scripts into the module) 2024-04-21 17:43:20 -05:00
mrq
b251669536 forgot to fix up the test trainer 2024-04-21 14:58:04 -05:00
mrq
071fb97777 dataset preparation script updates, caved and am using HF tokenizer now 2024-04-21 14:49:18 -05:00
mrq
a8ffa88844 it slipped my mind that technically DAC can be used at any sample rate, since it models waveforms; make it a config YAML option to allow this behavior 2024-04-19 18:36:54 -05:00
mrq
8214aa23d7 converting over to a different intermediary dataset format 2024-04-18 21:24:06 -05:00
mrq
4f5c9e518a actually use the passed-through sample rate from encode for DAC because it does its own resampling I guess 2024-04-18 13:32:41 -05:00
mrq
2e9e6e68f7 Forgot I need to use the DAC's 44K model because 24K model has 32 codebooks instead of 9. 2024-04-17 20:59:25 -05:00
mrq
5ff2b4aab5 finally swallowing the Descript-Audio-Codec pill (I guess I'm going to have to regenerate my entire dataset) 2024-04-17 20:39:35 -05:00
mrq
b0bd88833c refractor cleanup, had a revelation on how I can handle a batch of varying tasks 2024-04-16 21:04:48 -05:00
mrq
467fa1c5ee wrapper fixes 2024-04-16 10:19:02 -05:00
mrq
aa1e25fbf5 backwards compat for old YAMLs with models, option to set flash attention 2 for Llama (and derivatives), included syncdoth/RetNets torchscale retnet for shits and grins, etc. 2024-04-16 10:02:31 -05:00
mrq
545162195b deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things 2024-04-15 19:54:32 -05:00
mrq
d69a00e389 Properly pass retention_mask for retnet-HF, attempt to fix recurrent forward for retnet (doesn't work still) 2024-04-14 13:12:50 -05:00
mrq
789bb5d11b add an optional label override for model loading (used for easy testing between 12/16/20/24 layered model) 2024-04-13 12:43:35 -05:00
mrq
f0c4baeb25 added Adagrad (experimenting with it), added 'extended' model size (16 layers instead of 12, experimenting with it) 2024-04-09 22:04:01 -05:00
mrq
4d75ee066c actually do the Linear replacement with TE's Linear 2024-04-09 14:41:13 -05:00
mrq
9d97eb5104 added FP8 support through NVIDIA/TransformerEngine, added RetNet_HF through syncdoth/RetNet (as an alternative to branch away from torchscale) 2024-04-08 20:14:51 -05:00
mrq
7075c2a5f0 added an option to allow injecting embeddings from another model, because it dawned upon me how valuable embeddings from a good model can be for subsequent trainings (defined under cfg.models._embeddings as a relative path to the yaml) 2024-04-04 19:11:49 -05:00
mrq
91062361af tweaks 2024-03-01 20:38:06 -06:00
mrq
f3c59c3e7e cleaner replacement code (because I realized BitNet had an implementation for it too), added calculating gradient norm and performing gradient clipping in local trainer (non-deepspeed) 2024-03-01 20:18:43 -06:00
mrq
47435207f7 Added cfg.bitsandbytes.replace as a less intrusive alternative to cfg.bitsandbytes.inject to replace all Linear modules in a model 2024-03-01 19:20:10 -06:00
mrq
0427d8d076 logger broke for some reason, added flag to just tqdm.write instead, make cfg.bitsandbytes.bitnet==True yamls denoted since I'm sure they're not interoperable 2024-03-01 10:32:35 -06:00
mrq
35d78a2bb0 Yet Another Underlying Transformer Implementation (BitNet, will give it a few days to see how it fares) 2024-02-29 20:29:17 -06:00
mrq
3da1518ace added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work) 2024-01-31 21:48:36 -06:00
mrq
cce929e136 nasty hotfix for transformer's Mixtral throwing an error when batch sizes > 1 2024-01-26 19:41:12 -06:00
mrq
e799665759 experimental weighting of prom/resp embeds 2024-01-25 12:18:48 -06:00
mrq
c690aa509d fixes and compat (MoE-fying an existing model and retraining from there just ruins it after a second of audio...) 2023-12-25 21:20:32 -06:00
mrq
e513d2ef19 experts weren't forwarded into constructer (wasted a few days of training garbage) 2023-12-23 16:08:17 -06:00
mrq
0db3203b21 added LLaMA/Mixtral (if experts>1) model arches, utilize XMoE's loss as well, set MoE frequency to 1 to make every layer MoE'd for RetNet, etc. (going to do tests without burning out again to see how things go) 2023-12-22 19:27:36 -06:00
mrq
9c198eb75a added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works) 2023-12-20 18:45:58 -06:00
mrq
6c51a629cc resetting step count resets the samples processed and other metrics 2023-10-29 12:11:19 -05:00