Commit Graph

  • 880b4ecd1b cleanup, putting some thoughts in comments before I forget about them mrq 2024-06-05 19:50:06 -0500
  • 3cfc8a96bb oops mrq 2024-06-05 10:30:04 -0500
  • 48cd1054f9 madness mrq 2024-06-04 23:48:51 -0500
  • 9e3f2e300f experimental "just have a token for what rvq level we're on" that seems to help all models (mamba almost works, but it might just have to be relegated as a pure AR model) mrq 2024-06-04 23:23:31 -0500
  • e0886c5a78 re-added mamba as a possible non-experimental arch backend (test trainer will set it as AR only, doing any NAR tasks lobotomizes it) mrq 2024-06-04 22:41:22 -0500
  • 687c71e028 disable accuracy calc because it breaks with actual batched training even though it shouldn't mrq 2024-06-04 22:13:44 -0500
  • d005e24953 oops mrq 2024-06-04 22:10:04 -0500
  • 0f7f3ae754 added loss calc split and acc for experimental model mrq 2024-06-04 22:04:40 -0500
  • 014e565c4b tweaks mrq 2024-06-04 20:41:13 -0500
  • 6d5bd0156a fixes mrq 2024-06-04 18:50:48 -0500
  • ed3aeaf3a1 copy pasted from test to actual trainer mrq 2024-06-04 18:40:30 -0500
  • 0aa01ba31a forgot one crucial detail (you *need* the previous RVQ level to keep coherence between all RVQ levels) (experimental deinterleaved is a bit crusty though) mrq 2024-06-04 18:30:30 -0500
  • 2ffad5cb6f typo mrq 2024-06-04 14:20:57 -0500
  • 406ff7bbe1 re-implemented config.model.interleave for the HF-compat experimental method mrq 2024-06-04 14:19:52 -0500
  • c93d5863fd fixes mrq 2024-06-04 00:07:00 -0500
  • 186b93a77e oops mrq 2024-06-03 22:35:55 -0500
  • e50edc3b48 added a flag to convert to a HF compatible model on export by stitching things mrq 2024-06-03 22:34:47 -0500
  • 934672252b feverish cleanup mrq 2024-06-03 21:28:49 -0500
  • 7feeb944a0 probably insane with even entertaining going this route mrq 2024-06-03 20:26:27 -0500
  • c2a436d368 somehow between training sessions grad_norm = None even though it worked before mrq 2024-06-02 08:29:27 -0500
  • c1fcd889d5 reverted automatically disabling split loss calc, since it seems that it's actually cacling loss on prom causes the oddities, maybe mrq 2024-06-01 12:34:59 -0500
  • 8cf176ab46 ugh mrq 2024-06-01 10:46:42 -0500
  • 827cf632e7 report current loss scale and adjust grad norm by loss scale (for deepspeed) mrq 2024-06-01 10:44:32 -0500
  • d0ebce6bac ugh mrq 2024-06-01 10:30:13 -0500
  • 39bc019142 actually save per-rank sampler states mrq 2024-06-01 09:46:32 -0500
  • 74df2f5332 split sampler dict by global_rank, also handle splitting dataset paths by global_rank if sampler_type == path (because I do not trust DistributedSampler) (need to test) mrq 2024-06-01 09:29:49 -0500
  • 31785f4eeb actually don't default to compute split losses, test bitnet model doesn't seem to be doing things right (despite debug printouts showing theyre roughly the same logit/loss sequences, could just be bitnet linears being not up to par on actual models) mrq 2024-06-01 09:12:51 -0500
  • e9c87060df oops mrq 2024-05-31 22:22:28 -0500
  • b482ca19ff added model config option to set KV head count for MQA/GQA instead of MHA for llama-based models (i think its very negligible both ways on such a small model size) mrq 2024-05-31 19:32:37 -0500
  • e15c6c74c3 correctness mrq 2024-05-30 20:50:45 -0500
  • da473295b7 better way to compute per-segment losses mrq 2024-05-28 19:29:54 -0500
  • 6c49ad06a3 forgot to reinclude mult by loss factors mrq 2024-05-27 20:40:21 -0500
  • b82f0d5c0c finally nailed the issue that caused logging to break on one machine but not another (bitnet includes zetascale which is a parasite that will break logging) mrq 2024-05-27 19:47:58 -0500
  • c0ac84c795 uh mrq 2024-05-27 19:05:56 -0500
  • 197d517181 ugh mrq 2024-05-27 17:09:35 -0500
  • 5af6f41c94 added loss calcs against prom (requires the right settings for not shit results, disabled by default) mrq 2024-05-27 08:43:00 -0500
  • 05cd8b797e nevermind it breaks training mrq 2024-05-25 18:03:43 -0500
  • 85f9684720 some cleanup mrq 2024-05-25 17:46:52 -0500
  • d760924719 added kludgy eval only so I don't have to start training, type eval, stop training, then delete the logs for that session mrq 2024-05-25 17:39:51 -0500
  • ddbacde0d1 DAC just doesn't work well enough...... mrq 2024-05-25 11:07:52 -0500
  • e3ef89f5aa 100x better for subtrain/eval to be by group instead mrq 2024-05-19 16:40:14 -0500
  • 458b95d196 added option to split between text loss and audio loss (to-do: document this better), because it may or may not be a problem with LLaMA-backed models because my loss hovers around 3.9 / 56% accuracy despite sounding decent at the moment mrq 2024-05-19 11:23:56 -0500
  • 74e531d391 ugh mrq 2024-05-18 12:02:56 -0500
  • 59ef9461f8 ugh mrq 2024-05-18 10:13:58 -0500
  • 4bc7e5a6d1 fix loading without needing an hdf5 dataset already prepped (and some other incidental speedups during dataloader prep) mrq 2024-05-18 07:14:26 -0500
  • d88a5ca183 ugh mrq 2024-05-16 07:25:33 -0500
  • d9aabfa3ae final tweaks, hopefully, again mrq 2024-05-15 23:04:19 -0500
  • 8d79f78e0a god I need to replace omegaconf mrq 2024-05-12 14:01:52 -0500
  • 5eb5db7f7f just don't use DAC 24Khz, it's bad mrq 2024-05-12 13:41:17 -0500
  • 230da8b559 should be the final things to scramble around for, DAC's 24KHz model is unusable for this, but both encodec's 24KHz and DAC's 44KHz work mrq 2024-05-12 13:22:08 -0500
  • 2437a86efa ugh mrq 2024-05-12 13:02:15 -0500
  • 4f1593c8db a bunch of shit to salvage my old encodec-quantized audio because dac-encoded audio just does not want to converge mrq 2024-05-12 10:17:29 -0500
  • 917eeb40d2 ughhh mrq 2024-05-12 08:22:39 -0500
  • 9910c75d5a checkpointing for bitnet impl mrq 2024-05-12 07:52:54 -0500
  • 14709ac67f ughh mrq 2024-05-12 07:30:59 -0500
  • 3774fcbdee ugh mrq 2024-05-11 22:58:38 -0500
  • 856545f8bb nan loss detection (should have added it earlier), loss scaling for local backend + fp16 mrq 2024-05-11 22:23:29 -0500
  • a755eb3c62 ugh mrq 2024-05-11 17:34:45 -0500
  • 88e9b9caff local ddp fix mrq 2024-05-11 17:29:01 -0500
  • 3337c69e5a leverage between xformers and torch.backends.cuda.sdp_kernel for attention mrq 2024-05-11 17:14:05 -0500
  • d33c7bb7cf ugh mrq 2024-05-11 16:47:19 -0500
  • 0b6499601b sanitizing mrq 2024-05-11 16:31:05 -0500
  • 71e373064f remove redundant loss, tweak readme mrq 2024-05-11 15:02:47 -0500
  • 04a80d6b55 maybe it's better to be more explicit in deepspeed configs mrq 2024-05-11 13:57:43 -0500
  • 4d93a16ef7 might just be better to explicitly define prompt duration ranges, especially under a "train small contexts then increase it" training paradigm mrq 2024-05-11 09:50:54 -0500
  • bd0a36ba8d I swear I keep seeing tqdm flicker back a number mrq 2024-05-10 18:36:01 -0500
  • 2109712e5b resolve deprecation warning that doesn't show on my old training rig but does on my new one mrq 2024-05-09 23:25:44 -0500
  • 1547de5020 haha... mrq 2024-05-09 23:15:52 -0500
  • b7bd885651 some possible sanity with deepspeed config mrq 2024-05-09 22:48:42 -0500
  • c4b696ebeb oops mrq 2024-05-09 22:33:40 -0500
  • c22a177cf8 forgot to pass warmup to schedule free mrq 2024-05-09 22:18:49 -0500
  • b6131565ad autotune? mrq 2024-05-09 21:25:40 -0500
  • 6ed6ab8c03 a bit more cleanup for deepspeed ds_cfg creation mrq 2024-05-09 21:00:26 -0500
  • 0d5d545a40 crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc. mrq 2024-05-09 20:28:20 -0500
  • c6e0f905b5 final tweaks (again) before training restarts mrq 2024-05-08 02:11:38 -0500
  • 215800484d correcting my wrong of assuming I could just use raw 24Khz audio in the 44Khz DAC without too much of an issue (there are issues) mrq 2024-05-04 23:49:15 -0500
  • 9f738fbd5b seems I actually don't need RVQ bins 9-32 with the 24Khz DAC model........ (time to requantize my audio...) mrq 2024-05-04 23:09:18 -0500
  • 33b7f81b94 small cleanups mrq 2024-05-04 22:37:22 -0500
  • 8aa1b2dabf documentation update mrq 2024-05-04 21:03:46 -0500
  • 253441b750 forgot to disable verbose flag mrq 2024-05-04 13:13:52 -0500
  • 3dca1125f5 implemented xformers in HF's Llama (because theres no flash attention for Volta cards) mrq 2024-05-04 13:07:45 -0500
  • 277dcec484 apparently I got an error for trying to serialize an errant tensor that made its way into the json, this could be remedied easily with recursively traversing the dict and coercing any objects to primitives, but I'm tired and I just want to start training and nap mrq 2024-05-04 12:33:43 -0500
  • ffa200eec7 added option to specify frames per second for the given audio representation (Encodec is 75Hz, DAC is 41Hz (at 24K sources)) mrq 2024-05-04 12:05:41 -0500
  • c494894261 simple DDP wrapper (for my NVlink test) mrq 2024-05-04 11:48:26 -0500
  • 783db3d2c5 forgot to commit the DAC test utterance mrq 2024-05-04 09:46:51 -0500
  • a7b43b98b5 renamed cfg.bitsandbytes to cfg.optimizations (and having it serve as cfg.optimizations.bitsandbytes) mrq 2024-05-02 20:08:59 -0500
  • b5d1456a09 backwards compat for my shitty old weights (was testing if disabling AudioEmbedding summing magically made things better (it did not)) mrq 2024-04-29 22:14:01 -0500
  • 5120ffdda7 god it would be nice to know the best way to handle audio embeddings, because I genuinely don't know without skimming through papers or devoting X amount of GPU hours in training mrq 2024-04-29 18:24:05 -0500
  • 6a11bc9cb6 update tokenizer because, for some reason, it had the wrong order for the special tokens to where eos = unk mrq 2024-04-29 09:09:26 -0500
  • 57810e4ba4 metadata only path (might drop HDF5 since its giving file sizes twice as large as my actual unpacked dataset) mrq 2024-04-28 23:03:09 -0500
  • caad7ee3c9 final tweaks, hopefully mrq 2024-04-28 22:28:29 -0500
  • ffc334cf58 added dataset transcription helper script (now I don't ever have to touch ai-voice-cloning) (to-do: unify scripts into the module) mrq 2024-04-21 17:43:20 -0500
  • b251669536 forgot to fix up the test trainer mrq 2024-04-21 14:58:04 -0500
  • 071fb97777 dataset preparation script updates, caved and am using HF tokenizer now mrq 2024-04-21 14:49:18 -0500
  • a8ffa88844 it slipped my mind that technically DAC can be used at any sample rate, since it models waveforms; make it a config YAML option to allow this behavior mrq 2024-04-19 18:36:54 -0500
  • 00804a47e9 Forgot to copy intermediary dataset conversion script mrq 2024-04-18 21:34:28 -0500
  • 8214aa23d7 converting over to a different intermediary dataset format mrq 2024-04-18 21:24:06 -0500
  • 4f5c9e518a actually use the passed-through sample rate from encode for DAC because it does its own resampling I guess mrq 2024-04-18 13:32:41 -0500
  • 2e9e6e68f7 Forgot I need to use the DAC's 44K model because 24K model has 32 codebooks instead of 9. mrq 2024-04-17 20:59:25 -0500
  • 5ff2b4aab5 finally swallowing the Descript-Audio-Codec pill (I guess I'm going to have to regenerate my entire dataset) mrq 2024-04-17 20:39:35 -0500