41d7c30ea5added much cleaner non-causal mask generationmrq2024-11-22 19:43:32 -0600
c99a74e834actually generate a causal mask because it seems sometimes it does not actually generate one because it makes assumptionsmrq2024-11-22 18:30:24 -0600
ccee5fc11cthat was actually all pointless since sdpa always had an attention mask fed to it and does not need is_causal to implicitly generate onemrq2024-11-22 16:51:50 -0600
4aa685e749what has science donemrq2024-11-22 16:45:40 -0600
147219a5e0huge oversight in the attention masking......... (i realized I have not been providing a non-causal mask to non-causal tasks)mrq2024-11-22 13:44:43 -0600
24d888c47ctemporarily dropping support for xformers because it's breaking when using an attention mask (which i dont remember commenting it out when being passed), default to not use wandb because it's being a pain when doing tests and not actual sessionsS)mrq2024-11-22 11:29:12 -0600
8aafae91fddont use timeembeddingmrq2024-11-21 23:14:52 -0600
6845c447c9added more harvard sentences to load from a text filemrq2024-11-21 13:18:11 -0600
2a084544e8moved duration padding for NAR-len to be a scalar instead (since it seems longer utterances need it much more so than shorter utterances)mrq2024-11-21 13:04:07 -0600
6aee08f9c0moved stuff in the web UI around (un-experimented the max NAR-len steps because its kind of important to adjust this value for better sounding audio / quicker generated audio)mrq2024-11-20 20:37:33 -0600
1a73ac6a20I cannot believe it's not actually called Wand DB (added wandb logging support since I think it would have been a much better way to look at my metrics)mrq2024-11-20 16:10:47 -0600
67f7bad168added mixed modality AR+NAR-len to generate a short prefix through the AR, then inference with said prefix through the NAR-len (need to experiment with it more to ensure that the masked off tokens are the only tokens getting updated)mrq2024-11-20 14:22:12 -0600
db64e6cb59dependency updates (gradio 5.x now works on my machine)mrq2024-11-20 12:33:01 -0600
b1369e7824better modality selection (pick AR+NAR by default for the ar+nar model, pick NAR-len by default for the nar-len model), lowered default CFG because it makes the AR+NAR output sped up (but can't be too low since it's required for the NAR-len)mrq2024-11-19 18:51:17 -0600
4a71981456normalize sampler index by batch size (if not using batched sampler), add option to cap out utterances for a speaker, some other thingsmrq2024-11-18 12:46:50 -0600
6cfdf94bf9swap priority to use nar-len if available, added notesmrq2024-11-18 09:40:04 -0600
069b27570fset option to set training masking ratio (I don't think for tts a fixed masking ratio is beneficial since the magic of the AR+NAR is being able to still reference the prior sequence of tokens for predicting things)mrq2024-11-17 17:04:07 -0600
88d840218ddefault set cfg strength to 3.0 since the reference model is updatedmrq2024-11-17 10:23:40 -0600
39096f8ff3redid loss calculation to be cleaner, and position ID generation, and other things (I might need to train the NAR-len from scratch and not resume from an existing checkpoint.........)mrq2024-11-14 22:17:47 -0600
ef05c951ffadjust fp16 loss scaling since I fried a model overnight when it hit 8K scalemrq2024-11-14 09:23:52 -0600
ad7cfffc00NAR-len RVQ-0 was being trained causally.............mrq2024-11-13 09:43:50 -0600
976ee87f6fresume iteration step in tqdm trainer, warn to logger if the sampler state dict was invalidatedmrq2024-11-13 09:09:28 -0600
8286aa54c8do not pass timestep token/embedding since it doesn't seem to matter at all after all, fixed training masking rate to 80% because a paper said somrq2024-11-13 09:07:10 -0600
caf721c67bset it to zero because it'll make the stop token hide more often than notmrq2024-11-12 22:30:50 -0600
0f2584eba7new meme sampler PogChamp new meme sampler PogChamp (it sort of helps?)mrq2024-11-12 22:30:09 -0600
663f07038dhaha... (do not create a token dropout/noise mask when not training (this sadly didnt fix NAR-len output))mrq2024-11-12 16:41:58 -0600
b09328069eactually do CFG sampling for base AR+NAR tasksmrq2024-11-12 13:42:39 -0600
2495a7ef67Fixed STT in the web UImrq2024-11-12 12:49:53 -0600
8927bad7bcactually fixed rep pen (for ar and nar, it seems to help with nar unmasking)mrq2024-11-11 21:40:19 -0600
ec92613847actually pass input prompt length size to inferencemrq2024-11-11 20:39:48 -0600
b1df6a7bedreverted rep pen sampler due to a regressionmrq2024-11-11 20:35:08 -0600
b1f4db39c8threw in CFG sampling for normal model as well to experiment withmrq2024-11-11 20:27:38 -0600
2f56696506overhauled inference/sampler kwargs to stop being a bloated messmrq2024-11-11 20:21:16 -0600
354f8e059dstore dataset hash alongside state dict so it can be ignored if mismatchedmrq2024-11-11 18:16:56 -0600
f7b8b1e825dropped subtrain dataloader since its useless to duplicatemrq2024-11-11 17:00:49 -0600
cf9df71f2cuse homwbrewed caching system for dataloader paths / durations (I'm pretty sure I am now triggering OOM killers with my entire dataset used)mrq2024-11-11 16:32:08 -0600
9cb0b6901bunified nar.py into ar_nar.pymrq2024-11-10 12:19:48 -0600
a9d2faf2d7all I can do now until I wait for the model to (re)train for pure NARmrq2024-11-09 22:57:34 -0600
ad7e290a5eugh (ROCm seems to silently clamp any token value >= logits.shape[-1] for loss calculation, while cuda will throw an assert, making it hard to find this dumb fuckup)mrq2024-11-09 19:40:02 -0600
943fe70c10I don't know why this fixes an assert thrown but it doesmrq2024-11-09 19:04:13 -0600
f50d92ba6cAlmost made a mistakemrq2024-11-09 18:12:54 -0600
dcd5fecff3some cleanup while I wait for the NAR-len to train to an acceptable state (currently it performs okay, but only on audo after 3 seconds or so)mrq2024-11-09 12:12:46 -0600
69b0b3b854set timestep tensor to whatever the time embedding's dtype is because it'll gripe under ampmrq2024-11-09 00:11:16 -0600
5a09a5f6e9I forgot about the time embedding...mrq2024-11-08 22:46:26 -0600
811b15d280I suppose I just have a shit training method since the sampler is as solid as I can get it...............mrq2024-11-08 22:05:41 -0600
77ff23e319repeat extend the prom to fill the initial tokens for nar-len (it somewhat works, the model just needs to train more)mrq2024-11-06 23:29:53 -0600
d606a693ffeval fix for nar-lenmrq2024-11-06 23:14:16 -0600
105ed51159I guess I'll fall for the NAR-len meme again (I don't know where my previous weights are, so I need to train it again to test something)mrq2024-11-06 19:17:12 -0600
9e65e05e83more windows specific fixes, limit gradio to <5.0.0 on linux (it works on windows, but not on my linux machine tm)mrq2024-11-04 18:00:33 -0600
c83670c38cWindows specific fixes (to-do: find libespeak-ng.dll automatically because it cannot be trusted to do it by default)mrq2024-11-03 19:19:15 -0600
d229725c76more adjustments (adjustments of early-exit entropy/varentropy thresholds, default rep pen being 1.5, experimental refine-on-stop, etc.)mrq2024-11-03 18:31:28 -0600
aee08b7307changed layerskip float16 training warning (since it didnt seem to fry on my 4xV100 system)mrq2024-11-03 09:58:29 -0600
ec79230965shuffled web UI options hidden by cfg.experimental to its own tab, expose early exit selection to inferencing (it kinda works naively, still need to implement self-speculation)mrq2024-11-01 21:30:06 -0500
ef1c17430fskip step on nan loss (ironically I have not had a nan loss after adding this), throw exception with invalid cfg.dataset.sample_type and sample_order combination (because I was tricked by this in my yaml and had inconsistent vram usage)mrq2024-11-01 20:54:53 -0500
fb8faa295bactually float16(+AMP) and layerskip is bad and will kill the model......mrq2024-11-01 18:36:44 -0500