|
1cd24f3381
|
a birdie tells me i should probably use a different optimizer (also preliminary support for native sparse attention but I don't know if I'll use it)
|
2025-03-04 14:53:02 -06:00 |
|
|
0451f75e33
|
now that the new model seems a little more promising, i can re-document things non-cynically
|
2025-03-03 13:21:41 -06:00 |
|
|
3f1070f575
|
tweaks
|
2025-03-02 22:36:25 -06:00 |
|
|
1d3290b023
|
could have sworn this worked before, might have broke it when i decoupled from omegaconf
|
2025-03-01 19:30:26 -06:00 |
|
|
17094b8002
|
reticulating splines
|
2025-03-01 17:48:51 -06:00 |
|
|
56f8be4d62
|
lol
|
2025-02-28 22:15:37 -06:00 |
|
|
ddc49c89c5
|
the learning rate scheduler pill is a tough pill to swallow
|
2025-02-28 22:12:19 -06:00 |
|
|
b97faa8173
|
fixes...
|
2025-02-28 18:53:07 -06:00 |
|
|
4e7d885542
|
lol
|
2025-02-28 18:06:41 -06:00 |
|
|
a174c33db6
|
a gorillionth time's the charm (aka: the encoder/decoder pill is a tough pill to swallow)
|
2025-02-28 17:56:50 -06:00 |
|
|
09d82a26fe
|
ugh
|
2025-02-28 01:06:38 -06:00 |
|
|
93feb5660f
|
do not like that
|
2025-02-27 23:59:56 -06:00 |
|
|
f4f435d7f5
|
when you already had these ideas to stabilize training but you just ignored them
|
2025-02-27 23:39:20 -06:00 |
|
|
0a45c9c042
|
fix attention backend not being used
|
2025-02-27 21:38:38 -06:00 |
|
|
b8e9f3d785
|
maybe this will work
|
2025-02-27 20:42:12 -06:00 |
|
|
01e96bafc9
|
ugh
|
2025-02-27 19:05:32 -06:00 |
|
|
eff180248c
|
decoupled llama backend to avoid any funny changes from transformers, removed other backends since i dont think i'll ever bother using them
|
2025-02-27 19:00:37 -06:00 |
|
|
ceecac6ffe
|
I think I made resp_parallel_training=True faster with loss factoring?
|
2025-02-26 23:13:32 -06:00 |
|
|
06ef3daf3c
|
require minimum of 1 second durations for training because of my slop code auto-transposing that I don't wanna fix right now
|
2025-02-26 22:00:33 -06:00 |
|
|
cbd4d7d7f4
|
ugh
|
2025-02-26 21:31:10 -06:00 |
|
|
2ea387c08a
|
segregated experimental changes into its own streamlined file to avoid breaking the existing model, and it can pivot to the cleaned up code if it actually works (nothing is working)
|
2025-02-26 21:26:13 -06:00 |
|
|
7d2e64630c
|
lol
|
2025-02-26 10:49:06 -06:00 |
|
|
95da4e9405
|
made muon actually work by actually utilizing param groups (thanks APOLLO for reminding me this is the sane way to handle this split)
|
2025-02-26 10:39:13 -06:00 |
|
|
de27115bb7
|
there's something wrong with it on my 4xV100 rig......
|
2025-02-25 15:14:08 -06:00 |
|
|
db181f8e88
|
only do auto=equal for nemo as its an FSQ
|
2025-02-24 21:07:44 -06:00 |
|
|
a5a04c39ef
|
when the
|
2025-02-24 21:03:23 -06:00 |
|
|
918e0dbac1
|
small slop cleanup
|
2025-02-24 19:03:53 -06:00 |
|
|
3330b5bb00
|
maybe fix NaNs being thrown for immature models at fp16 for training evals
|
2025-02-24 18:25:54 -06:00 |
|
|
0f39f4d7a1
|
lol
|
2025-02-24 17:51:35 -06:00 |
|
|
33d5a7109a
|
its a miracle i was able to get a semblance of audio with the naive AudioEncoder (now it interleaves properly)
|
2025-02-24 14:39:12 -06:00 |
|
|
6e7b269147
|
ugh
|
2025-02-24 13:54:21 -06:00 |
|
|
8f5a3997bd
|
another experimental flag
|
2025-02-24 13:50:41 -06:00 |
|
|
f593ee98fc
|
ugh
|
2025-02-23 21:20:36 -06:00 |
|
|
cbf6b84e27
|
fixed grad norm and loss scale not reporting for local trainer
|
2025-02-23 19:08:26 -06:00 |
|
|
b640fabab5
|
borrowed muon since it might better work under deepspeed and not require cruft (even though it really does not like the masked-NAR, also make the masked-NAR faux-causal since it might better help out for cfg.model.version >= 7
|
2025-02-23 17:23:24 -06:00 |
|
|
d33ccd188a
|
ugh
|
2025-02-23 12:31:07 -06:00 |
|
|
8f3c3e01ee
|
oops
|
2025-02-23 12:09:56 -06:00 |
|
|
b39aaacd77
|
oops
|
2025-02-23 11:55:43 -06:00 |
|
|
3019c88799
|
separate mask token and stop token because this might cause issues
|
2025-02-23 11:36:32 -06:00 |
|
|
6634d07576
|
added muon optimizer through kludge hacks because it necessitates a second optimizer in tandum that seems to only sometimes work with deepspeed
|
2025-02-23 11:22:13 -06:00 |
|
|
67a6009555
|
(finally) added parallel AR for cfg.model.version >= 7 (nvidia/audio-codec-44khz is being a pain and it might require training purely AR first......)
|
2025-02-23 08:31:03 -06:00 |
|
|
15b3c20e19
|
also throw exception for zero'd out tensor during training (I am very paranoid now)
|
2025-02-22 14:09:41 -06:00 |
|
|
ab0abd2b12
|
fixes fixes fixes (a quarter of my recently processed audio returned zero'd tensors......)
|
2025-02-22 09:07:33 -06:00 |
|
|
50506e5ebc
|
oops
|
2025-02-20 20:55:58 -06:00 |
|
|
fc1ec2019d
|
added option to buffer process jobs across multiple speakers to maybe squeeze out some throughput speeds for vall_e.emb.process (in the event of lots of speakers with low file counts, such as Emilia)
|
2025-02-20 14:56:32 -06:00 |
|
|
ce1ca0124a
|
lol...
|
2025-02-20 13:40:36 -06:00 |
|
|
92139b6da9
|
additional cruft, added a note in documentation to be aware of NUMA node topology when running vall_e.emb.process with more than one process
|
2025-02-18 19:56:30 -06:00 |
|
|
596c2df11c
|
added arg to skip processing speakers with not enough utterances for whenever I get around to processing my subest of Emilia for nvidia/audio-codec-44khz (because Emilia has a ton of low-utternace speaker counts and right now my focus with the nemo model is on getting it to actually speak without much problems rather than feed it a gorillion speakers)
|
2025-02-18 10:49:21 -06:00 |
|
|
8331eee6fa
|
added arg to limit vall_e.emb.process batch size since there's some speaker groups in LibriLight/Speech/whatever that have 10K utterances and I'm going impatient
|
2025-02-18 10:19:17 -06:00 |
|
|
8f86cf0e4e
|
possible logic optimization so I don't spend another 15 minutes simply iterating back to the point I was at in vall_e.emb.process
|
2025-02-16 11:34:05 -06:00 |
|
|
13c3a08853
|
nevermind thats slow
|
2025-02-14 16:35:17 -06:00 |
|
|
285e493b12
|
ugh..........
|
2025-02-14 16:24:34 -06:00 |
|
|
a65c8144f4
|
with the amount of tweaks I keep making I could have probably had the nvidia/audio-codec-44khz model realized already......
|
2025-02-13 18:38:40 -06:00 |
|
|
e3becec0e8
|
more better-er loss calc I suppose
|
2025-02-13 12:49:53 -06:00 |
|
|
e8f182b634
|
cleaned up loss calc code (it REALLY hates ignore_loss_for_inputs, but is fine with splitting with loss factors)
|
2025-02-13 09:35:27 -06:00 |
|
|
319ca09a4f
|
cleanup
|
2025-02-12 23:36:32 -06:00 |
|
|
b52c5c5d80
|
this seems to work in testing
|
2025-02-12 16:16:04 -06:00 |
|
|
e029a8804d
|
ironically none of this cruft gets the loss lower than the original way
|
2025-02-12 11:17:00 -06:00 |
|
|
4b31f5c808
|
this seems preferable
|
2025-02-12 00:36:50 -06:00 |
|
|
04fef5dad5
|
agony
|
2025-02-12 00:18:24 -06:00 |
|
|
e5916ea519
|
for my sanity it seems having extraneous tokens in the embedding/classifier has the loss/acc a little higher than it should
|
2025-02-11 14:47:35 -06:00 |
|
|
d4a6709fb4
|
stopgap cringe to get this training session working (it does not seem fruitful)
|
2025-02-11 13:45:09 -06:00 |
|
|
c0b46b82eb
|
tweaks
|
2025-02-10 21:48:29 -06:00 |
|
|
d6a679ca5c
|
tweaks
|
2025-02-10 20:53:08 -06:00 |
|
|
276a2342a4
|
tweaks to processing script
|
2025-02-10 19:18:13 -06:00 |
|
|
b3f9b76fd9
|
invalidate a path if loading via metadata and entry is not in hdf5 (to avoid reparsing my metadata since I'm using a partial copy of my dataset at the moment)
|
2025-02-10 14:43:15 -06:00 |
|
|
075ffef68a
|
ugh
|
2025-02-09 13:02:51 -06:00 |
|
|
953015748f
|
ugh
|
2025-02-07 20:49:28 -06:00 |
|
|
ed94b261dc
|
could have sworn i had 'vall_e.emb.process --dtype' working, also possible RAM optimization so I can stop locking up my server when firing four encoding processes
|
2025-02-07 18:52:19 -06:00 |
|
|
47eb498046
|
more tweaks
|
2025-02-06 23:26:26 -06:00 |
|
|
67a9401cce
|
oops
|
2025-02-06 15:14:14 -06:00 |
|
|
712ce4af5d
|
maybe fixed errors with DAC backend, added option to limit by duration in emb.process (because I only really need short utternaces right now and I'm not ready to spend a week on processing everything again)
|
2025-02-06 12:37:18 -06:00 |
|
|
299cc88821
|
re-added amp encoding/decoding for audio, possible bad idea to ignore using amp instead if requested
|
2025-02-05 21:55:06 -06:00 |
|
|
7592befc53
|
updated vall_e.emb.process to allow for batched processing, some typo fixes (it's painfully slow on my 7900XTX...)
|
2025-02-05 21:13:20 -06:00 |
|
|
79c504c278
|
cleaned up encode/decode functions to make them a little more coherent, added option to batch encode/decode (would have been very nice in the past, but this should speed things up for me when i fall for the latest meme codec)
|
2025-02-05 20:54:31 -06:00 |
|
|
84174c1c1b
|
oops
|
2025-02-05 10:25:03 -06:00 |
|
|
bb2ebe1ca2
|
fixed issues that may rise from updating transformers with attention, added nvidia/audio-codec-44khz backend support (by gutting everything necessary because I do NOT want to install more dependencies
|
2025-02-04 20:30:07 -06:00 |
|
|
0841f366e8
|
I should really just grab modelling_llama wholesale (fix for the adapted attention class)
|
2025-01-28 21:55:05 -06:00 |
|
|
e5f9da2221
|
oops
|
2025-01-21 11:59:24 -06:00 |
|
|
69c1d2991f
|
updated mixtral backend (need this for something else)
|
2025-01-20 21:50:56 -06:00 |
|
|
1a26f789a5
|
added option to playback audio directly, removed no-phonemize option since I swear it worked in testing but it doesn't actually work
|
2025-01-12 21:52:49 -06:00 |
|
|
9fa87c417a
|
added option to use raw text rather than the IPA phonemes (it requires a model trained on raw text)
|
2025-01-06 00:10:43 -06:00 |
|
|
3ab11bdc7b
|
oops
|
2025-01-05 23:53:17 -06:00 |
|
|
b445f4abb6
|
experimental
|
2025-01-05 19:05:00 -06:00 |
|
|
2e6a7625e4
|
experimental
|
2025-01-05 12:47:03 -06:00 |
|
|
31cfef59c4
|
when you do more training thinking the original model that can do NS/SR got deleted but it was actually a string not having its quotes in the right place.......
|
2024-12-27 18:16:57 -06:00 |
|
|
9b0d2ccbe1
|
|
2024-12-26 21:42:17 -06:00 |
|
|
59f56ad099
|
cleaup
|
2024-12-24 23:14:32 -06:00 |
|
|
82e8592f2a
|
working vall_e.cpp
|
2024-12-24 17:54:48 -06:00 |
|
|
497bdfc67b
|
more work (the wall is non-causal decoding......)
|
2024-12-22 20:11:31 -06:00 |
|
|
5f289db275
|
ugh
|
2024-12-22 16:15:24 -06:00 |
|
|
0d4329d2e3
|
sanity cleanup
|
2024-12-22 15:05:45 -06:00 |
|
|
353e478e68
|
agony
|
2024-12-21 22:52:10 -06:00 |
|
|
5788db849b
|
added extremely barebones vall_e.cpp so I can stop having to juggle this file around so much
|
2024-12-21 10:57:02 -06:00 |
|
|
91caf00212
|
ugh
|
2024-12-20 17:13:37 -06:00 |
|
|
d85273609e
|
corrected export.py's --hf
|
2024-12-20 15:17:13 -06:00 |
|
|
59bf6b8b33
|
exposed additional task (ns, sr, vc) (vc is experimental)
|
2024-12-20 11:15:29 -06:00 |
|
|
53230efd74
|
changed prompt_inject_noise to prompt_inject_noise_p so I can have another reason to do this post-training
|
2024-12-19 19:28:50 -06:00 |
|
|
e7e7f48043
|
livid
|
2024-12-19 19:25:27 -06:00 |
|
|
8838babcba
|
sanity checks (and I realized that the model actually had langs set to 4 in the yaml for KO/ZH so................
|
2024-12-19 19:08:57 -06:00 |
|