vall-e

mrq/vall-e

History

mrq 95da4e9405 made muon actually work by actually utilizing param groups (thanks APOLLO for reminding me this is the sane way to handle this split)		2025-02-26 10:39:13 -06:00
..
ext	made muon actually work by actually utilizing param groups (thanks APOLLO for reminding me this is the sane way to handle this split)	2025-02-26 10:39:13 -06:00
__init__.py	added WER/SIM-O metrics, added APOLLO but I need to test it	2024-12-10 20:13:21 -06:00
distributed.py	moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)	2024-08-29 13:27:16 -05:00
io.py	agony	2024-12-21 22:52:10 -06:00
ml.py	borrowed muon since it might better work under deepspeed and not require cruft (even though it really does not like the masked-NAR, also make the masked-NAR faux-causal since it might better help out for cfg.model.version >= 7	2025-02-23 17:23:24 -06:00
pattern.py	oops, kept forgetting to actually pass in lang/tone tokens (despite not really using these at the moment)	2024-07-18 14:18:34 -05:00
sampler.py	tweaks to bucket sampling	2024-11-13 11:09:24 -06:00
trainer.py	added muon optimizer through kludge hacks because it necessitates a second optimizer in tandum that seems to only sometimes work with deepspeed	2025-02-23 11:22:13 -06:00
utils.py	added muon optimizer through kludge hacks because it necessitates a second optimizer in tandum that seems to only sometimes work with deepspeed	2025-02-23 11:22:13 -06:00