Commit Graph

205 Commits

Author SHA1 Message Date
mrq
d33ccd188a ugh 2025-02-23 12:31:07 -06:00
mrq
67a6009555 (finally) added parallel AR for cfg.model.version >= 7 (nvidia/audio-codec-44khz is being a pain and it might require training purely AR first......) 2025-02-23 08:31:03 -06:00
mrq
15b3c20e19 also throw exception for zero'd out tensor during training (I am very paranoid now) 2025-02-22 14:09:41 -06:00
mrq
ab0abd2b12 fixes fixes fixes (a quarter of my recently processed audio returned zero'd tensors......) 2025-02-22 09:07:33 -06:00
mrq
e8f182b634 cleaned up loss calc code (it REALLY hates ignore_loss_for_inputs, but is fine with splitting with loss factors) 2025-02-13 09:35:27 -06:00
mrq
e029a8804d ironically none of this cruft gets the loss lower than the original way 2025-02-12 11:17:00 -06:00
mrq
e5916ea519 for my sanity it seems having extraneous tokens in the embedding/classifier has the loss/acc a little higher than it should 2025-02-11 14:47:35 -06:00
mrq
d4a6709fb4 stopgap cringe to get this training session working (it does not seem fruitful) 2025-02-11 13:45:09 -06:00
mrq
d6a679ca5c tweaks 2025-02-10 20:53:08 -06:00
mrq
b3f9b76fd9 invalidate a path if loading via metadata and entry is not in hdf5 (to avoid reparsing my metadata since I'm using a partial copy of my dataset at the moment) 2025-02-10 14:43:15 -06:00
mrq
47eb498046 more tweaks 2025-02-06 23:26:26 -06:00
mrq
3ab11bdc7b oops 2025-01-05 23:53:17 -06:00
mrq
2e6a7625e4 experimental 2025-01-05 12:47:03 -06:00
mrq
9b0d2ccbe1 2024-12-26 21:42:17 -06:00
mrq
d85273609e corrected export.py's --hf 2024-12-20 15:17:13 -06:00
mrq
53230efd74 changed prompt_inject_noise to prompt_inject_noise_p so I can have another reason to do this post-training 2024-12-19 19:28:50 -06:00
mrq
8838babcba sanity checks (and I realized that the model actually had langs set to 4 in the yaml for KO/ZH so................ 2024-12-19 19:08:57 -06:00
mrq
7617b6485f instead just compute a bunch of stuff on the transcriptions to store later in different names so I can just retrieve what I want, also added tongue twisters for nefarious reasons 2024-12-18 23:43:11 -06:00
mrq
4775edaa41 added text cleaning/normalization for wer purposes but it amounts to nothing desu 2024-12-18 19:58:53 -06:00
mrq
ed152f78df tweaks to prompt duration to allow me to divorce how i use it for training with how I'm using it for the demo page, and demo page tweaks to make my life easier 2024-12-17 19:33:04 -06:00
mrq
9a62e3b824 APOLLO cringe (doesn't want to work with deepspeed) 2024-12-12 00:31:58 -06:00
mrq
20b87bfbd0 store metrics and only recalculate them if the output file is newer than the metrics file 2024-12-11 20:55:43 -06:00
mrq
6468e5d124 lol 2024-12-11 19:10:32 -06:00
mrq
8568a93dad added WER/SIM-O metrics, added APOLLO but I need to test it 2024-12-10 20:13:21 -06:00
mrq
a6c745bafb chinese (mandarin?) support added (I guess I don't need pinyin, but tone markers are handled), korean validated, vocab adjusted 2024-12-09 14:26:19 -06:00
mrq
1d460b9fe3 logic fixes, I feel like output is better? (also NAR can have a temperature, I imagine it couldn't because it was having a causal masked passed to it for the longest time before I caught it a month ago) 2024-12-08 14:52:47 -06:00
mrq
4e21df8092 oops 2024-12-04 21:24:22 -06:00
mrq
93d27be539 rolling context finally (use last N utterances as the prefix for the next gen), option to split input text prompt by sentences instead of lines (or no splitting) 2024-12-04 20:31:44 -06:00
mrq
dcaf38b359 fixed training tqdm being stubborn 2024-11-23 09:45:23 -06:00
mrq
24d888c47c temporarily dropping support for xformers because it's breaking when using an attention mask (which i dont remember commenting it out when being passed), default to not use wandb because it's being a pain when doing tests and not actual sessionsS) 2024-11-22 11:29:12 -06:00
mrq
6845c447c9 added more harvard sentences to load from a text file 2024-11-21 13:18:11 -06:00
mrq
2b29790173 oops 2024-11-18 14:12:26 -06:00
mrq
4a71981456 normalize sampler index by batch size (if not using batched sampler), add option to cap out utterances for a speaker, some other things 2024-11-18 12:46:50 -06:00
mrq
39096f8ff3 redid loss calculation to be cleaner, and position ID generation, and other things (I might need to train the NAR-len from scratch and not resume from an existing checkpoint.........) 2024-11-14 22:17:47 -06:00
mrq
e412e98125 ugh 2024-11-14 07:34:22 -06:00
mrq
c00fc18b62 actually use the right embedding for nar-len 2024-11-13 18:04:04 -06:00
mrq
976ee87f6f resume iteration step in tqdm trainer, warn to logger if the sampler state dict was invalidated 2024-11-13 09:09:28 -06:00
mrq
0f2584eba7 new meme sampler PogChamp new meme sampler PogChamp (it sort of helps?) 2024-11-12 22:30:09 -06:00
mrq
2495a7ef67 Fixed STT in the web UI 2024-11-12 12:49:53 -06:00
mrq
2f56696506 overhauled inference/sampler kwargs to stop being a bloated mess 2024-11-11 20:21:16 -06:00
mrq
354f8e059d store dataset hash alongside state dict so it can be ignored if mismatched 2024-11-11 18:16:56 -06:00
mrq
f7b8b1e825 dropped subtrain dataloader since its useless to duplicate 2024-11-11 17:00:49 -06:00
mrq
cf9df71f2c use homwbrewed caching system for dataloader paths / durations (I'm pretty sure I am now triggering OOM killers with my entire dataset used) 2024-11-11 16:32:08 -06:00
mrq
9def34cd66 lol 2024-11-10 12:48:41 -06:00
mrq
9cb0b6901b unified nar.py into ar_nar.py 2024-11-10 12:19:48 -06:00
mrq
3826f9bae4 saner mask creation? (it doesnt matter, kv cache wont work) 2024-11-02 21:00:21 -05:00
mrq
ef1c17430f skip step on nan loss (ironically I have not had a nan loss after adding this), throw exception with invalid cfg.dataset.sample_type and sample_order combination (because I was tricked by this in my yaml and had inconsistent vram usage) 2024-11-01 20:54:53 -05:00
mrq
8eb9a4056b modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling 2024-10-22 18:12:39 -05:00
mrq
0dfab973e7 oops 2024-10-18 09:40:06 -05:00
mrq
75b90be325 cleaned up unused config flags, allow less strict yaml by pruning missing keys, renamed some dataset configs to be more unified 2024-10-17 17:06:48 -05:00