Commit Graph

  • 00d1fed217 another optimization (within the dataloader because the similar utterance sampler was mondo slow) master mrq 2025-03-08 17:10:50 -0600
  • 5e9d1a5302 one more time one more time (this normalization isn't a spook) mrq 2025-03-07 19:32:42 -0600
  • 93044829af one more time (could have sworn i tested it with batch size > 1) mrq 2025-03-07 19:14:33 -0600
  • 6cea840710 oops mrq 2025-03-07 18:57:25 -0600
  • dbd34b6430 add specialized calc_loss because schizo mrq 2025-03-07 18:44:11 -0600
  • 8d848ed549 handle case of dropping cond for segment mask mrq 2025-03-07 14:11:58 -0600
  • 89e52b9877 ugh mrq 2025-03-07 13:55:57 -0600
  • 6afc2b7526 gut feeling to change the attention mask mrq 2025-03-07 13:51:59 -0600
  • 91ede71cf0 ugh mrq 2025-03-06 17:19:27 -0600
  • 2dd80a03ff stuff for interfacing with the loss scaler value (because I want to cap it) mrq 2025-03-06 17:07:29 -0600
  • a30dffcca7 wandb additions (to-do eventually, upload samples as artifacts) mrq 2025-03-06 15:44:40 -0600
  • ec87308d75 final tweaks before training this meme 44khz model for the 3rd time mrq 2025-03-06 15:31:15 -0600
  • 5cd71ef238 QoL so I can stop having to manually inject different configs mrq 2025-03-06 14:48:14 -0600
  • 0d809561c6 accuracy k=1 and k=80 because im probably dumb for k=10 as the default since it does not represent any usecase mrq 2025-03-05 16:35:34 -0600
  • 2fb2b732fc wow that was fast mrq 2025-03-04 23:17:18 -0600
  • 462f71e2f7 ugh mrq 2025-03-04 14:57:00 -0600
  • 1cd24f3381 a birdie tells me i should probably use a different optimizer (also preliminary support for native sparse attention but I don't know if I'll use it) mrq 2025-03-04 14:53:02 -0600
  • 0451f75e33 now that the new model seems a little more promising, i can re-document things non-cynically mrq 2025-03-03 13:21:41 -0600
  • 3f1070f575 tweaks mrq 2025-03-02 22:36:25 -0600
  • 4afa4ccce5 at wits end (parhaps the semantic token approach is the toughest pill to swallow) mrq 2025-03-01 21:03:25 -0600
  • 1d3290b023 could have sworn this worked before, might have broke it when i decoupled from omegaconf mrq 2025-03-01 19:30:26 -0600
  • 17094b8002 reticulating splines mrq 2025-03-01 17:48:51 -0600
  • 56f8be4d62 lol mrq 2025-02-28 22:15:37 -0600
  • ddc49c89c5 the learning rate scheduler pill is a tough pill to swallow mrq 2025-02-28 22:12:19 -0600
  • b97faa8173 fixes... mrq 2025-02-28 18:53:07 -0600
  • 4e7d885542 lol mrq 2025-02-28 18:06:41 -0600
  • a174c33db6 a gorillionth time's the charm (aka: the encoder/decoder pill is a tough pill to swallow) mrq 2025-02-28 17:56:50 -0600
  • 09d82a26fe ugh mrq 2025-02-28 01:06:38 -0600
  • 93feb5660f do not like that mrq 2025-02-27 23:59:56 -0600
  • f4f435d7f5 when you already had these ideas to stabilize training but you just ignored them mrq 2025-02-27 23:39:20 -0600
  • 0a45c9c042 fix attention backend not being used mrq 2025-02-27 21:38:38 -0600
  • b8e9f3d785 maybe this will work mrq 2025-02-27 20:42:12 -0600
  • 01e96bafc9 ugh mrq 2025-02-27 19:05:32 -0600
  • eff180248c decoupled llama backend to avoid any funny changes from transformers, removed other backends since i dont think i'll ever bother using them mrq 2025-02-27 19:00:37 -0600
  • ceecac6ffe I think I made resp_parallel_training=True faster with loss factoring? mrq 2025-02-26 23:13:32 -0600
  • 06ef3daf3c require minimum of 1 second durations for training because of my slop code auto-transposing that I don't wanna fix right now mrq 2025-02-26 22:00:33 -0600
  • cbd4d7d7f4 ugh mrq 2025-02-26 21:31:10 -0600
  • 2ea387c08a segregated experimental changes into its own streamlined file to avoid breaking the existing model, and it can pivot to the cleaned up code if it actually works (nothing is working) mrq 2025-02-26 21:26:13 -0600
  • 7d2e64630c lol mrq 2025-02-26 10:49:06 -0600
  • 95da4e9405 made muon actually work by actually utilizing param groups (thanks APOLLO for reminding me this is the sane way to handle this split) mrq 2025-02-26 10:39:13 -0600
  • de27115bb7 there's something wrong with it on my 4xV100 rig...... mrq 2025-02-25 15:14:08 -0600
  • db181f8e88 only do auto=equal for nemo as its an FSQ mrq 2025-02-24 21:07:44 -0600
  • a5a04c39ef when the mrq 2025-02-24 21:03:23 -0600
  • 918e0dbac1 small slop cleanup mrq 2025-02-24 19:03:53 -0600
  • 3330b5bb00 maybe fix NaNs being thrown for immature models at fp16 for training evals mrq 2025-02-24 18:25:54 -0600
  • 0f39f4d7a1 lol mrq 2025-02-24 17:51:35 -0600
  • 33d5a7109a its a miracle i was able to get a semblance of audio with the naive AudioEncoder (now it interleaves properly) mrq 2025-02-24 14:39:12 -0600
  • 6e7b269147 ugh mrq 2025-02-24 13:54:21 -0600
  • 8f5a3997bd another experimental flag mrq 2025-02-24 13:50:41 -0600
  • f593ee98fc ugh mrq 2025-02-23 21:20:36 -0600
  • cbf6b84e27 fixed grad norm and loss scale not reporting for local trainer mrq 2025-02-23 19:08:26 -0600
  • b640fabab5 borrowed muon since it might better work under deepspeed and not require cruft (even though it really does not like the masked-NAR, also make the masked-NAR faux-causal since it might better help out for cfg.model.version >= 7 mrq 2025-02-23 17:23:24 -0600
  • d33ccd188a ugh mrq 2025-02-23 12:31:07 -0600
  • 8f3c3e01ee oops mrq 2025-02-23 12:09:56 -0600
  • b39aaacd77 oops mrq 2025-02-23 11:55:43 -0600
  • 3019c88799 separate mask token and stop token because this might cause issues mrq 2025-02-23 11:36:32 -0600
  • 6634d07576 added muon optimizer through kludge hacks because it necessitates a second optimizer in tandum that seems to only sometimes work with deepspeed mrq 2025-02-23 11:22:13 -0600
  • 67a6009555 (finally) added parallel AR for cfg.model.version >= 7 (nvidia/audio-codec-44khz is being a pain and it might require training purely AR first......) mrq 2025-02-23 08:31:03 -0600
  • 15b3c20e19 also throw exception for zero'd out tensor during training (I am very paranoid now) mrq 2025-02-22 14:09:41 -0600
  • ab0abd2b12 fixes fixes fixes (a quarter of my recently processed audio returned zero'd tensors......) mrq 2025-02-22 09:07:33 -0600
  • 50506e5ebc oops mrq 2025-02-20 20:55:58 -0600
  • fc1ec2019d added option to buffer process jobs across multiple speakers to maybe squeeze out some throughput speeds for vall_e.emb.process (in the event of lots of speakers with low file counts, such as Emilia) mrq 2025-02-20 14:56:32 -0600
  • ce1ca0124a lol... mrq 2025-02-20 13:40:36 -0600
  • 92139b6da9 additional cruft, added a note in documentation to be aware of NUMA node topology when running vall_e.emb.process with more than one process mrq 2025-02-18 19:56:30 -0600
  • 596c2df11c added arg to skip processing speakers with not enough utterances for whenever I get around to processing my subest of Emilia for nvidia/audio-codec-44khz (because Emilia has a ton of low-utternace speaker counts and right now my focus with the nemo model is on getting it to actually speak without much problems rather than feed it a gorillion speakers) mrq 2025-02-18 10:49:21 -0600
  • 8331eee6fa added arg to limit vall_e.emb.process batch size since there's some speaker groups in LibriLight/Speech/whatever that have 10K utterances and I'm going impatient mrq 2025-02-18 10:19:17 -0600
  • 8f86cf0e4e possible logic optimization so I don't spend another 15 minutes simply iterating back to the point I was at in vall_e.emb.process mrq 2025-02-16 11:34:05 -0600
  • 0dc49ef4d5 documentation update while I wait for more audio (between 4 and 8 seconds per utterance) quantize for nvidia/audio-codec-44khz (I was foolish to think I can get something servicable with just 4 seconds max for an utterance) mrq 2025-02-15 17:42:06 -0600
  • 13c3a08853 nevermind thats slow mrq 2025-02-14 16:35:17 -0600
  • 285e493b12 ugh.......... mrq 2025-02-14 16:24:34 -0600
  • a65c8144f4 with the amount of tweaks I keep making I could have probably had the nvidia/audio-codec-44khz model realized already...... mrq 2025-02-13 18:38:40 -0600
  • e3becec0e8 more better-er loss calc I suppose mrq 2025-02-13 12:49:53 -0600
  • e8f182b634 cleaned up loss calc code (it REALLY hates ignore_loss_for_inputs, but is fine with splitting with loss factors) mrq 2025-02-13 09:35:27 -0600
  • 319ca09a4f cleanup mrq 2025-02-12 23:36:32 -0600
  • b52c5c5d80 this seems to work in testing mrq 2025-02-12 16:16:04 -0600
  • e029a8804d ironically none of this cruft gets the loss lower than the original way mrq 2025-02-12 11:17:00 -0600
  • 4b31f5c808 this seems preferable mrq 2025-02-12 00:36:50 -0600
  • 04fef5dad5 agony mrq 2025-02-12 00:18:24 -0600
  • 1c0ed6abac added notes on this unfruitful experiment mrq 2025-02-11 16:21:43 -0600
  • e5916ea519 for my sanity it seems having extraneous tokens in the embedding/classifier has the loss/acc a little higher than it should mrq 2025-02-11 14:47:35 -0600
  • d4a6709fb4 stopgap cringe to get this training session working (it does not seem fruitful) mrq 2025-02-11 13:45:09 -0600
  • c0b46b82eb tweaks mrq 2025-02-10 21:48:29 -0600
  • d6a679ca5c tweaks mrq 2025-02-10 20:53:08 -0600
  • 276a2342a4 tweaks to processing script mrq 2025-02-10 19:18:13 -0600
  • b3f9b76fd9 invalidate a path if loading via metadata and entry is not in hdf5 (to avoid reparsing my metadata since I'm using a partial copy of my dataset at the moment) mrq 2025-02-10 14:43:15 -0600
  • 075ffef68a ugh mrq 2025-02-09 13:02:51 -0600
  • 953015748f ugh mrq 2025-02-07 20:49:28 -0600
  • ed94b261dc could have sworn i had 'vall_e.emb.process --dtype' working, also possible RAM optimization so I can stop locking up my server when firing four encoding processes mrq 2025-02-07 18:52:19 -0600
  • 47eb498046 more tweaks mrq 2025-02-06 23:26:26 -0600
  • 67a9401cce oops mrq 2025-02-06 15:14:14 -0600
  • 712ce4af5d maybe fixed errors with DAC backend, added option to limit by duration in emb.process (because I only really need short utternaces right now and I'm not ready to spend a week on processing everything again) mrq 2025-02-06 12:37:18 -0600
  • 299cc88821 re-added amp encoding/decoding for audio, possible bad idea to ignore using amp instead if requested mrq 2025-02-05 21:55:06 -0600
  • 7592befc53 updated vall_e.emb.process to allow for batched processing, some typo fixes (it's painfully slow on my 7900XTX...) mrq 2025-02-05 21:13:20 -0600
  • 79c504c278 cleaned up encode/decode functions to make them a little more coherent, added option to batch encode/decode (would have been very nice in the past, but this should speed things up for me when i fall for the latest meme codec) mrq 2025-02-05 20:54:31 -0600
  • 84174c1c1b oops mrq 2025-02-05 10:25:03 -0600
  • bb2ebe1ca2 fixed issues that may rise from updating transformers with attention, added nvidia/audio-codec-44khz backend support (by gutting everything necessary because I do NOT want to install more dependencies mrq 2025-02-04 20:30:07 -0600
  • 0841f366e8 I should really just grab modelling_llama wholesale (fix for the adapted attention class) mrq 2025-01-28 21:55:05 -0600
  • e5f9da2221 oops mrq 2025-01-21 11:59:24 -0600
  • 69c1d2991f updated mixtral backend (need this for something else) mrq 2025-01-20 21:50:56 -0600
  • 1a26f789a5 added option to playback audio directly, removed no-phonemize option since I swear it worked in testing but it doesn't actually work mrq 2025-01-12 21:52:49 -0600