Commit Graph

717 Commits

Author SHA1 Message Date
mrq
b640fabab5 borrowed muon since it might better work under deepspeed and not require cruft (even though it really does not like the masked-NAR, also make the masked-NAR faux-causal since it might better help out for cfg.model.version >= 7 2025-02-23 17:23:24 -06:00
mrq
d33ccd188a ugh 2025-02-23 12:31:07 -06:00
mrq
8f3c3e01ee oops 2025-02-23 12:09:56 -06:00
mrq
b39aaacd77 oops 2025-02-23 11:55:43 -06:00
mrq
3019c88799 separate mask token and stop token because this might cause issues 2025-02-23 11:36:32 -06:00
mrq
6634d07576 added muon optimizer through kludge hacks because it necessitates a second optimizer in tandum that seems to only sometimes work with deepspeed 2025-02-23 11:22:13 -06:00
mrq
67a6009555 (finally) added parallel AR for cfg.model.version >= 7 (nvidia/audio-codec-44khz is being a pain and it might require training purely AR first......) 2025-02-23 08:31:03 -06:00
mrq
15b3c20e19 also throw exception for zero'd out tensor during training (I am very paranoid now) 2025-02-22 14:09:41 -06:00
mrq
ab0abd2b12 fixes fixes fixes (a quarter of my recently processed audio returned zero'd tensors......) 2025-02-22 09:07:33 -06:00
mrq
50506e5ebc oops 2025-02-20 20:55:58 -06:00
mrq
fc1ec2019d added option to buffer process jobs across multiple speakers to maybe squeeze out some throughput speeds for vall_e.emb.process (in the event of lots of speakers with low file counts, such as Emilia) 2025-02-20 14:56:32 -06:00
mrq
ce1ca0124a lol... 2025-02-20 13:40:36 -06:00
mrq
92139b6da9 additional cruft, added a note in documentation to be aware of NUMA node topology when running vall_e.emb.process with more than one process 2025-02-18 19:56:30 -06:00
mrq
596c2df11c added arg to skip processing speakers with not enough utterances for whenever I get around to processing my subest of Emilia for nvidia/audio-codec-44khz (because Emilia has a ton of low-utternace speaker counts and right now my focus with the nemo model is on getting it to actually speak without much problems rather than feed it a gorillion speakers) 2025-02-18 10:49:21 -06:00
mrq
8331eee6fa added arg to limit vall_e.emb.process batch size since there's some speaker groups in LibriLight/Speech/whatever that have 10K utterances and I'm going impatient 2025-02-18 10:19:17 -06:00
mrq
8f86cf0e4e possible logic optimization so I don't spend another 15 minutes simply iterating back to the point I was at in vall_e.emb.process 2025-02-16 11:34:05 -06:00
mrq
13c3a08853 nevermind thats slow 2025-02-14 16:35:17 -06:00
mrq
285e493b12 ugh.......... 2025-02-14 16:24:34 -06:00
mrq
a65c8144f4 with the amount of tweaks I keep making I could have probably had the nvidia/audio-codec-44khz model realized already...... 2025-02-13 18:38:40 -06:00
mrq
e3becec0e8 more better-er loss calc I suppose 2025-02-13 12:49:53 -06:00
mrq
e8f182b634 cleaned up loss calc code (it REALLY hates ignore_loss_for_inputs, but is fine with splitting with loss factors) 2025-02-13 09:35:27 -06:00
mrq
319ca09a4f cleanup 2025-02-12 23:36:32 -06:00
mrq
b52c5c5d80 this seems to work in testing 2025-02-12 16:16:04 -06:00
mrq
e029a8804d ironically none of this cruft gets the loss lower than the original way 2025-02-12 11:17:00 -06:00
mrq
4b31f5c808 this seems preferable 2025-02-12 00:36:50 -06:00
mrq
04fef5dad5 agony 2025-02-12 00:18:24 -06:00
mrq
e5916ea519 for my sanity it seems having extraneous tokens in the embedding/classifier has the loss/acc a little higher than it should 2025-02-11 14:47:35 -06:00
mrq
d4a6709fb4 stopgap cringe to get this training session working (it does not seem fruitful) 2025-02-11 13:45:09 -06:00
mrq
c0b46b82eb tweaks 2025-02-10 21:48:29 -06:00
mrq
d6a679ca5c tweaks 2025-02-10 20:53:08 -06:00
mrq
276a2342a4 tweaks to processing script 2025-02-10 19:18:13 -06:00
mrq
b3f9b76fd9 invalidate a path if loading via metadata and entry is not in hdf5 (to avoid reparsing my metadata since I'm using a partial copy of my dataset at the moment) 2025-02-10 14:43:15 -06:00
mrq
075ffef68a ugh 2025-02-09 13:02:51 -06:00
mrq
953015748f ugh 2025-02-07 20:49:28 -06:00
mrq
ed94b261dc could have sworn i had 'vall_e.emb.process --dtype' working, also possible RAM optimization so I can stop locking up my server when firing four encoding processes 2025-02-07 18:52:19 -06:00
mrq
47eb498046 more tweaks 2025-02-06 23:26:26 -06:00
mrq
67a9401cce oops 2025-02-06 15:14:14 -06:00
mrq
712ce4af5d maybe fixed errors with DAC backend, added option to limit by duration in emb.process (because I only really need short utternaces right now and I'm not ready to spend a week on processing everything again) 2025-02-06 12:37:18 -06:00
mrq
299cc88821 re-added amp encoding/decoding for audio, possible bad idea to ignore using amp instead if requested 2025-02-05 21:55:06 -06:00
mrq
7592befc53 updated vall_e.emb.process to allow for batched processing, some typo fixes (it's painfully slow on my 7900XTX...) 2025-02-05 21:13:20 -06:00
mrq
79c504c278 cleaned up encode/decode functions to make them a little more coherent, added option to batch encode/decode (would have been very nice in the past, but this should speed things up for me when i fall for the latest meme codec) 2025-02-05 20:54:31 -06:00
mrq
84174c1c1b oops 2025-02-05 10:25:03 -06:00
mrq
bb2ebe1ca2 fixed issues that may rise from updating transformers with attention, added nvidia/audio-codec-44khz backend support (by gutting everything necessary because I do NOT want to install more dependencies 2025-02-04 20:30:07 -06:00
mrq
0841f366e8 I should really just grab modelling_llama wholesale (fix for the adapted attention class) 2025-01-28 21:55:05 -06:00
mrq
e5f9da2221 oops 2025-01-21 11:59:24 -06:00
mrq
69c1d2991f updated mixtral backend (need this for something else) 2025-01-20 21:50:56 -06:00
mrq
1a26f789a5 added option to playback audio directly, removed no-phonemize option since I swear it worked in testing but it doesn't actually work 2025-01-12 21:52:49 -06:00
mrq
9fa87c417a added option to use raw text rather than the IPA phonemes (it requires a model trained on raw text) 2025-01-06 00:10:43 -06:00
mrq
3ab11bdc7b oops 2025-01-05 23:53:17 -06:00
mrq
b445f4abb6 experimental 2025-01-05 19:05:00 -06:00