Commit Graph

136 Commits

Author SHA1 Message Date
mrq
95da4e9405 made muon actually work by actually utilizing param groups (thanks APOLLO for reminding me this is the sane way to handle this split) 2025-02-26 10:39:13 -06:00
mrq
cbf6b84e27 fixed grad norm and loss scale not reporting for local trainer 2025-02-23 19:08:26 -06:00
mrq
b640fabab5 borrowed muon since it might better work under deepspeed and not require cruft (even though it really does not like the masked-NAR, also make the masked-NAR faux-causal since it might better help out for cfg.model.version >= 7 2025-02-23 17:23:24 -06:00
mrq
3019c88799 separate mask token and stop token because this might cause issues 2025-02-23 11:36:32 -06:00
mrq
6634d07576 added muon optimizer through kludge hacks because it necessitates a second optimizer in tandum that seems to only sometimes work with deepspeed 2025-02-23 11:22:13 -06:00
mrq
a65c8144f4 with the amount of tweaks I keep making I could have probably had the nvidia/audio-codec-44khz model realized already...... 2025-02-13 18:38:40 -06:00
mrq
d4a6709fb4 stopgap cringe to get this training session working (it does not seem fruitful) 2025-02-11 13:45:09 -06:00
mrq
353e478e68 agony 2024-12-21 22:52:10 -06:00
mrq
d85273609e corrected export.py's --hf 2024-12-20 15:17:13 -06:00
mrq
c2c6d912ac actually do speaker verification 2024-12-17 10:11:14 -06:00
mrq
8515038968 imagine my disappointment when the epoch finished just for it to throw an exception 2024-12-16 18:28:01 -06:00
mrq
2ba6b483dc ugh 2024-12-14 22:43:51 -06:00
mrq
3dd31e74d1 finally figured out a clean way to handle "resuming" the tqdm bar 2024-12-14 18:44:43 -06:00
mrq
35389481ee move lazy-stored ortho matrix to the grad device for apollo because agony 2024-12-13 23:22:26 -06:00
mrq
09804ecc16 APOLLO tweaks to make it work with deepspeed 2024-12-13 23:03:52 -06:00
mrq
64c67160a3 tweaks 2024-12-13 19:00:35 -06:00
mrq
f41251f648 more fixes for local engine backend 2024-12-12 14:38:42 -06:00
mrq
9a62e3b824 APOLLO cringe (doesn't want to work with deepspeed) 2024-12-12 00:31:58 -06:00
mrq
b81a98799b uplifting transformer's WavLM stuff to do speaker verification instead 2024-12-11 19:30:05 -06:00
mrq
6f1ee0c6fa Added CER, transcription/similarity model args in demo 2024-12-10 21:00:51 -06:00
mrq
8568a93dad added WER/SIM-O metrics, added APOLLO but I need to test it 2024-12-10 20:13:21 -06:00
mrq
34a66e1052 agnostified KD 2024-12-06 23:53:46 -06:00
mrq
42fafbaaca actually fixed knowledge distillation because of errant -inf logits causing problems and needed to be filtered (and splitting text language / output audio language because it helps) 2024-12-06 21:55:20 -06:00
mrq
dcaf38b359 fixed training tqdm being stubborn 2024-11-23 09:45:23 -06:00
mrq
88d840218d default set cfg strength to 3.0 since the reference model is updated 2024-11-17 10:23:40 -06:00
mrq
29e45be0b4 tweaks to bucket sampling 2024-11-13 11:09:24 -06:00
mrq
b2eca271a8 ugh 2024-11-13 10:35:44 -06:00
mrq
ad7cfffc00 NAR-len RVQ-0 was being trained causally............. 2024-11-13 09:43:50 -06:00
mrq
976ee87f6f resume iteration step in tqdm trainer, warn to logger if the sampler state dict was invalidated 2024-11-13 09:09:28 -06:00
mrq
0f2584eba7 new meme sampler PogChamp new meme sampler PogChamp (it sort of helps?) 2024-11-12 22:30:09 -06:00
mrq
2f56696506 overhauled inference/sampler kwargs to stop being a bloated mess 2024-11-11 20:21:16 -06:00
mrq
354f8e059d store dataset hash alongside state dict so it can be ignored if mismatched 2024-11-11 18:16:56 -06:00
mrq
f7b8b1e825 dropped subtrain dataloader since its useless to duplicate 2024-11-11 17:00:49 -06:00
mrq
cf9df71f2c use homwbrewed caching system for dataloader paths / durations (I'm pretty sure I am now triggering OOM killers with my entire dataset used) 2024-11-11 16:32:08 -06:00
mrq
a9d2faf2d7 all I can do now until I wait for the model to (re)train for pure NAR 2024-11-09 22:57:34 -06:00
mrq
8eb9a4056b modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling 2024-10-22 18:12:39 -05:00
mrq
fc8dfd8617 made greedy AR sampling viable (and preferable), with caveats (per comment in vall_e.models.ar_nar) 2024-10-18 16:55:00 -05:00
mrq
75b90be325 cleaned up unused config flags, allow less strict yaml by pruning missing keys, renamed some dataset configs to be more unified 2024-10-17 17:06:48 -05:00
mrq
a507b769a1 sped up inferencing by not doing .tolist() for rep pen / length pen (and a bug fix in the web UI from prev commit) 2024-10-04 22:18:20 -05:00
mrq
769f67dcfe actually fix validation of phonemes in the symmap 2024-09-21 12:19:34 -05:00
mrq
b5bec0c9ce oops, turns out these are not split by speaker names already........ (also added sampling the dataset in the webui for easy viewing) 2024-09-18 20:19:46 -05:00
mrq
84647f588a more tweaks 2024-09-18 16:43:57 -05:00
mrq
ebac1db16c maybe final tweaks, I really needed to unify my json read/write and orjson is proven to be fast enough for me to try and rely on it more 2024-09-17 22:57:04 -05:00
mrq
a9fbe81f98 oops 2024-09-17 15:25:12 -05:00
mrq
94cf81d38c tweak 2024-09-05 23:21:18 -05:00
mrq
32287710a2 moved prints to use logger, edited readme (fused_attn doesnt seem stable for training) 2024-08-29 13:27:16 -05:00
mrq
3a65cc4b22 fix issue with sft and shared tensors... 2024-08-04 19:56:21 -05:00
mrq
2cb465018b implicitly load either normal pickled weights or safetensors on loading the model 2024-08-03 23:34:18 -05:00
mrq
c09133d00f added safetensors support (with metadata) and feed whatever torch.load/torch.save into it 2024-08-03 23:15:20 -05:00
mrq
6a733eb2ed changed torch.Tensor().to(device, dtype) to just torch.tensor(..., device, dtype) because it's been bothering my autism that I'm creating tensors then converting rather than creating with the right device/dtype, some 'optimization' to compile the model but it doesnt seem to do anything useful 2024-08-03 22:10:21 -05:00