1cd24f3381a birdie tells me i should probably use a different optimizer (also preliminary support for native sparse attention but I don't know if I'll use it)mrq2025-03-04 14:53:02 -0600
0451f75e33now that the new model seems a little more promising, i can re-document things non-cynicallymrq2025-03-03 13:21:41 -0600
eff180248cdecoupled llama backend to avoid any funny changes from transformers, removed other backends since i dont think i'll ever bother using themmrq2025-02-27 19:00:37 -0600
ceecac6ffeI think I made resp_parallel_training=True faster with loss factoring?mrq2025-02-26 23:13:32 -0600
06ef3daf3crequire minimum of 1 second durations for training because of my slop code auto-transposing that I don't wanna fix right nowmrq2025-02-26 22:00:33 -0600
2ea387c08asegregated experimental changes into its own streamlined file to avoid breaking the existing model, and it can pivot to the cleaned up code if it actually works (nothing is working)mrq2025-02-26 21:26:13 -0600
95da4e9405made muon actually work by actually utilizing param groups (thanks APOLLO for reminding me this is the sane way to handle this split)mrq2025-02-26 10:39:13 -0600
de27115bb7there's something wrong with it on my 4xV100 rig......mrq2025-02-25 15:14:08 -0600
db181f8e88only do auto=equal for nemo as its an FSQmrq2025-02-24 21:07:44 -0600
cbf6b84e27fixed grad norm and loss scale not reporting for local trainermrq2025-02-23 19:08:26 -0600
b640fabab5borrowed muon since it might better work under deepspeed and not require cruft (even though it really does not like the masked-NAR, also make the masked-NAR faux-causal since it might better help out for cfg.model.version >= 7mrq2025-02-23 17:23:24 -0600
3019c88799separate mask token and stop token because this might cause issuesmrq2025-02-23 11:36:32 -0600
6634d07576added muon optimizer through kludge hacks because it necessitates a second optimizer in tandum that seems to only sometimes work with deepspeedmrq2025-02-23 11:22:13 -0600
67a6009555(finally) added parallel AR for cfg.model.version >= 7 (nvidia/audio-codec-44khz is being a pain and it might require training purely AR first......)mrq2025-02-23 08:31:03 -0600
15b3c20e19also throw exception for zero'd out tensor during training (I am very paranoid now)mrq2025-02-22 14:09:41 -0600
ab0abd2b12fixes fixes fixes (a quarter of my recently processed audio returned zero'd tensors......)mrq2025-02-22 09:07:33 -0600
fc1ec2019dadded option to buffer process jobs across multiple speakers to maybe squeeze out some throughput speeds for vall_e.emb.process (in the event of lots of speakers with low file counts, such as Emilia)mrq2025-02-20 14:56:32 -0600
92139b6da9additional cruft, added a note in documentation to be aware of NUMA node topology when running vall_e.emb.process with more than one processmrq2025-02-18 19:56:30 -0600
596c2df11cadded arg to skip processing speakers with not enough utterances for whenever I get around to processing my subest of Emilia for nvidia/audio-codec-44khz (because Emilia has a ton of low-utternace speaker counts and right now my focus with the nemo model is on getting it to actually speak without much problems rather than feed it a gorillion speakers)mrq2025-02-18 10:49:21 -0600
8331eee6faadded arg to limit vall_e.emb.process batch size since there's some speaker groups in LibriLight/Speech/whatever that have 10K utterances and I'm going impatientmrq2025-02-18 10:19:17 -0600
8f86cf0e4epossible logic optimization so I don't spend another 15 minutes simply iterating back to the point I was at in vall_e.emb.processmrq2025-02-16 11:34:05 -0600
0dc49ef4d5documentation update while I wait for more audio (between 4 and 8 seconds per utterance) quantize for nvidia/audio-codec-44khz (I was foolish to think I can get something servicable with just 4 seconds max for an utterance)mrq2025-02-15 17:42:06 -0600
a65c8144f4with the amount of tweaks I keep making I could have probably had the nvidia/audio-codec-44khz model realized already......mrq2025-02-13 18:38:40 -0600
e3becec0e8more better-er loss calc I supposemrq2025-02-13 12:49:53 -0600
e8f182b634cleaned up loss calc code (it REALLY hates ignore_loss_for_inputs, but is fine with splitting with loss factors)mrq2025-02-13 09:35:27 -0600
1c0ed6abacadded notes on this unfruitful experimentmrq2025-02-11 16:21:43 -0600
e5916ea519for my sanity it seems having extraneous tokens in the embedding/classifier has the loss/acc a little higher than it shouldmrq2025-02-11 14:47:35 -0600
d4a6709fb4stopgap cringe to get this training session working (it does not seem fruitful)mrq2025-02-11 13:45:09 -0600
276a2342a4tweaks to processing scriptmrq2025-02-10 19:18:13 -0600
b3f9b76fd9invalidate a path if loading via metadata and entry is not in hdf5 (to avoid reparsing my metadata since I'm using a partial copy of my dataset at the moment)mrq2025-02-10 14:43:15 -0600
ed94b261dccould have sworn i had 'vall_e.emb.process --dtype' working, also possible RAM optimization so I can stop locking up my server when firing four encoding processesmrq2025-02-07 18:52:19 -0600
712ce4af5dmaybe fixed errors with DAC backend, added option to limit by duration in emb.process (because I only really need short utternaces right now and I'm not ready to spend a week on processing everything again)mrq2025-02-06 12:37:18 -0600
299cc88821re-added amp encoding/decoding for audio, possible bad idea to ignore using amp instead if requestedmrq2025-02-05 21:55:06 -0600
7592befc53updated vall_e.emb.process to allow for batched processing, some typo fixes (it's painfully slow on my 7900XTX...)mrq2025-02-05 21:13:20 -0600
79c504c278cleaned up encode/decode functions to make them a little more coherent, added option to batch encode/decode (would have been very nice in the past, but this should speed things up for me when i fall for the latest meme codec)mrq2025-02-05 20:54:31 -0600
bb2ebe1ca2fixed issues that may rise from updating transformers with attention, added nvidia/audio-codec-44khz backend support (by gutting everything necessary because I do NOT want to install more dependenciesmrq2025-02-04 20:30:07 -0600
0841f366e8I should really just grab modelling_llama wholesale (fix for the adapted attention class)mrq2025-01-28 21:55:05 -0600
69c1d2991fupdated mixtral backend (need this for something else)mrq2025-01-20 21:50:56 -0600
1a26f789a5added option to playback audio directly, removed no-phonemize option since I swear it worked in testing but it doesn't actually workmrq2025-01-12 21:52:49 -0600