|
00d1fed217
|
another optimization (within the dataloader because the similar utterance sampler was mondo slow)
|
2025-03-08 17:10:50 -06:00 |
|
|
5e9d1a5302
|
one more time one more time (this normalization isn't a spook)
|
2025-03-07 19:32:42 -06:00 |
|
|
93044829af
|
one more time (could have sworn i tested it with batch size > 1)
|
2025-03-07 19:14:33 -06:00 |
|
|
6cea840710
|
oops
|
2025-03-07 18:57:25 -06:00 |
|
|
dbd34b6430
|
add specialized calc_loss because schizo
|
2025-03-07 18:44:11 -06:00 |
|
|
8d848ed549
|
handle case of dropping cond for segment mask
|
2025-03-07 14:11:58 -06:00 |
|
|
89e52b9877
|
ugh
|
2025-03-07 13:55:57 -06:00 |
|
|
6afc2b7526
|
gut feeling to change the attention mask
|
2025-03-07 13:51:59 -06:00 |
|
|
91ede71cf0
|
ugh
|
2025-03-06 17:19:27 -06:00 |
|
|
2dd80a03ff
|
stuff for interfacing with the loss scaler value (because I want to cap it)
|
2025-03-06 17:07:29 -06:00 |
|
|
a30dffcca7
|
wandb additions (to-do eventually, upload samples as artifacts)
|
2025-03-06 15:44:40 -06:00 |
|
|
ec87308d75
|
final tweaks before training this meme 44khz model for the 3rd time
|
2025-03-06 15:31:15 -06:00 |
|
|
5cd71ef238
|
QoL so I can stop having to manually inject different configs
|
2025-03-06 14:48:14 -06:00 |
|
|
0d809561c6
|
accuracy k=1 and k=80 because im probably dumb for k=10 as the default since it does not represent any usecase
|
2025-03-05 16:35:34 -06:00 |
|
|
2fb2b732fc
|
wow that was fast
|
2025-03-04 23:17:18 -06:00 |
|
|
462f71e2f7
|
ugh
|
2025-03-04 14:57:00 -06:00 |
|
|
1cd24f3381
|
a birdie tells me i should probably use a different optimizer (also preliminary support for native sparse attention but I don't know if I'll use it)
|
2025-03-04 14:53:02 -06:00 |
|
|
0451f75e33
|
now that the new model seems a little more promising, i can re-document things non-cynically
|
2025-03-03 13:21:41 -06:00 |
|
|
3f1070f575
|
tweaks
|
2025-03-02 22:36:25 -06:00 |
|
|
4afa4ccce5
|
at wits end (parhaps the semantic token approach is the toughest pill to swallow)
|
2025-03-01 21:03:25 -06:00 |
|
|
1d3290b023
|
could have sworn this worked before, might have broke it when i decoupled from omegaconf
|
2025-03-01 19:30:26 -06:00 |
|
|
17094b8002
|
reticulating splines
|
2025-03-01 17:48:51 -06:00 |
|
|
56f8be4d62
|
lol
|
2025-02-28 22:15:37 -06:00 |
|
|
ddc49c89c5
|
the learning rate scheduler pill is a tough pill to swallow
|
2025-02-28 22:12:19 -06:00 |
|
|
b97faa8173
|
fixes...
|
2025-02-28 18:53:07 -06:00 |
|
|
4e7d885542
|
lol
|
2025-02-28 18:06:41 -06:00 |
|
|
a174c33db6
|
a gorillionth time's the charm (aka: the encoder/decoder pill is a tough pill to swallow)
|
2025-02-28 17:56:50 -06:00 |
|
|
09d82a26fe
|
ugh
|
2025-02-28 01:06:38 -06:00 |
|
|
93feb5660f
|
do not like that
|
2025-02-27 23:59:56 -06:00 |
|
|
f4f435d7f5
|
when you already had these ideas to stabilize training but you just ignored them
|
2025-02-27 23:39:20 -06:00 |
|
|
0a45c9c042
|
fix attention backend not being used
|
2025-02-27 21:38:38 -06:00 |
|
|
b8e9f3d785
|
maybe this will work
|
2025-02-27 20:42:12 -06:00 |
|
|
01e96bafc9
|
ugh
|
2025-02-27 19:05:32 -06:00 |
|
|
eff180248c
|
decoupled llama backend to avoid any funny changes from transformers, removed other backends since i dont think i'll ever bother using them
|
2025-02-27 19:00:37 -06:00 |
|
|
ceecac6ffe
|
I think I made resp_parallel_training=True faster with loss factoring?
|
2025-02-26 23:13:32 -06:00 |
|
|
06ef3daf3c
|
require minimum of 1 second durations for training because of my slop code auto-transposing that I don't wanna fix right now
|
2025-02-26 22:00:33 -06:00 |
|
|
cbd4d7d7f4
|
ugh
|
2025-02-26 21:31:10 -06:00 |
|
|
2ea387c08a
|
segregated experimental changes into its own streamlined file to avoid breaking the existing model, and it can pivot to the cleaned up code if it actually works (nothing is working)
|
2025-02-26 21:26:13 -06:00 |
|
|
7d2e64630c
|
lol
|
2025-02-26 10:49:06 -06:00 |
|
|
95da4e9405
|
made muon actually work by actually utilizing param groups (thanks APOLLO for reminding me this is the sane way to handle this split)
|
2025-02-26 10:39:13 -06:00 |
|
|
de27115bb7
|
there's something wrong with it on my 4xV100 rig......
|
2025-02-25 15:14:08 -06:00 |
|
|
db181f8e88
|
only do auto=equal for nemo as its an FSQ
|
2025-02-24 21:07:44 -06:00 |
|
|
a5a04c39ef
|
when the
|
2025-02-24 21:03:23 -06:00 |
|
|
918e0dbac1
|
small slop cleanup
|
2025-02-24 19:03:53 -06:00 |
|
|
3330b5bb00
|
maybe fix NaNs being thrown for immature models at fp16 for training evals
|
2025-02-24 18:25:54 -06:00 |
|
|
0f39f4d7a1
|
lol
|
2025-02-24 17:51:35 -06:00 |
|
|
33d5a7109a
|
its a miracle i was able to get a semblance of audio with the naive AudioEncoder (now it interleaves properly)
|
2025-02-24 14:39:12 -06:00 |
|
|
6e7b269147
|
ugh
|
2025-02-24 13:54:21 -06:00 |
|
|
8f5a3997bd
|
another experimental flag
|
2025-02-24 13:50:41 -06:00 |
|
|
f593ee98fc
|
ugh
|
2025-02-23 21:20:36 -06:00 |
|