|
4800e7179a
|
remove nan checks because it causes problems in distributed training because I'm not syncing between GPUs (and nan losses gets ignored anyways with loss scaling)
|
2024-12-15 09:42:54 -06:00 |
|
|
218d0e29fd
|
ugh (batchmean actually expects batch=seq_len, and not the actual batch)
|
2024-12-07 12:39:01 -06:00 |
|
|
61ed662856
|
ACTUALLY actually fix KD-loss (the -inf in the logits was caused by cringecode)
|
2024-12-07 12:31:54 -06:00 |
|
|
f97e8b0c7f
|
ACTUALLY do KD-loss because of an oversight with masked_select outputting 1D tensors that get softmax'd in total
|
2024-12-07 09:52:51 -06:00 |
|
|
34a66e1052
|
agnostified KD
|
2024-12-06 23:53:46 -06:00 |
|
|
23d402bf01
|
added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......)
|
2024-12-05 23:05:52 -06:00 |
|
|
88d840218d
|
default set cfg strength to 3.0 since the reference model is updated
|
2024-11-17 10:23:40 -06:00 |
|
|
23fdba0c98
|
tweaks and changes
|
2024-11-16 15:49:06 -06:00 |
|
|
f7b8b1e825
|
dropped subtrain dataloader since its useless to duplicate
|
2024-11-11 17:00:49 -06:00 |
|
|
48490757da
|
fixes
|
2024-11-10 20:37:50 -06:00 |
|
|
9def34cd66
|
lol
|
2024-11-10 12:48:41 -06:00 |
|
|
a9d2faf2d7
|
all I can do now until I wait for the model to (re)train for pure NAR
|
2024-11-09 22:57:34 -06:00 |
|
|
d606a693ff
|
eval fix for nar-len
|
2024-11-06 23:14:16 -06:00 |
|
|
aee08b7307
|
changed layerskip float16 training warning (since it didnt seem to fry on my 4xV100 system)
|
2024-11-03 09:58:29 -06:00 |
|
|
62fe5b0943
|
ughh
|
2024-11-01 22:36:48 -05:00 |
|
|
fb8faa295b
|
actually float16(+AMP) and layerskip is bad and will kill the model......
|
2024-11-01 18:36:44 -05:00 |
|
|
a96f5aee32
|
adjusted how i want to pass eval kwargs
|
2024-10-25 20:38:09 -05:00 |
|
|
8920e5e86b
|
actually have beam_width in the webUI work
|
2024-10-22 22:06:22 -05:00 |
|
|
2ea978f318
|
added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval
|
2024-10-10 13:40:25 -05:00 |
|
|
039482a48e
|
don't do eval on stt because it's so slow and I don't even bother doing any metrics against it anyways (to-do: make this a flag)
|
2024-09-26 18:56:57 -05:00 |
|
|
fa93061b3e
|
more fixes, moved sampler state dict to a better place, eval works again
|
2024-09-06 16:59:56 -05:00 |
|
|
d33a906119
|
cleanup for AR_NAR inferencing to allow both TTS and STT tasks simultaneously (need to have training eval do this to though)
|
2024-09-06 14:30:12 -05:00 |
|
|
32287710a2
|
moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)
|
2024-08-29 13:27:16 -05:00 |
|
|
ab673e0426
|
add cap for NAR-len training, to avoid any weird cases in early training where it'll just mess up and generate long lengths
|
2024-08-03 21:00:32 -05:00 |
|
|
ce8bb1e4f7
|
sanity cleanups with weird off-by-one-ness, cleaned up and validated vall_e.models.experimental works again
|
2024-07-27 15:36:05 -05:00 |
|
|
75b04686f8
|
added prom-less training / inferencing, some other things
|
2024-07-22 19:36:07 -05:00 |
|
|
692d09f9c1
|
eval/validation fix for SpeechX tasks
|
2024-07-19 09:16:37 -05:00 |
|
|
97e768601c
|
re-introducing SpeechX tasks (need to validate them all, everything works with base tts anyways)
|
2024-07-18 16:16:14 -05:00 |
|
|
f770467eb3
|
stuff
|
2024-07-01 18:13:29 -05:00 |
|
|
bc2a6fa756
|
sanity cleanup: moved experimental features under its own thing
|
2024-06-30 10:37:33 -05:00 |
|
|
a8718d35a4
|
nasty bandaid because some of my DAC dataset only has 8 RVQ levels instead of the full 9
|
2024-06-29 10:16:37 -05:00 |
|
|
dd40463803
|
limit eval size because the training batch size seems to be used for the eval dataloader, somehow (bandaid)
|
2024-06-29 09:11:28 -05:00 |
|
|
1a392b69f6
|
local training backend should be a bit more aware of variable batch sizes, maybe
|
2024-06-28 22:39:05 -05:00 |
|
|
234f9efc6e
|
ugh
|
2024-06-09 11:39:43 -05:00 |
|
|
132a02c48b
|
sanity cleanup, backup config yaml for each log file
|
2024-06-09 11:22:52 -05:00 |
|
|
ead3e2f0cb
|
ugh
|
2024-06-08 16:14:57 -05:00 |
|
|
b072f9b96b
|
fixes
|
2024-06-08 16:01:34 -05:00 |
|
|
7d6fff24f9
|
un-tensor'd quant_level marker since it doesn't need to be one (I forgot why I had it as one but nothing seems to need it as a tensor that didn't already make it one)
|
2024-06-07 20:46:22 -05:00 |
|
|
4ade2b60ee
|
ugh
|
2024-06-06 21:57:11 -05:00 |
|
|
880b4ecd1b
|
cleanup, putting some thoughts in comments before I forget about them
|
2024-06-05 19:50:06 -05:00 |
|
|
3cfc8a96bb
|
oops
|
2024-06-05 10:30:04 -05:00 |
|
|
48cd1054f9
|
madness
|
2024-06-04 23:48:51 -05:00 |
|
|
0f7f3ae754
|
added loss calc split and acc for experimental model
|
2024-06-04 22:04:40 -05:00 |
|
|
6d5bd0156a
|
fixes
|
2024-06-04 18:50:48 -05:00 |
|
|
ed3aeaf3a1
|
copy pasted from test to actual trainer
|
2024-06-04 18:40:30 -05:00 |
|
|
c93d5863fd
|
fixes
|
2024-06-04 00:07:00 -05:00 |
|
|
186b93a77e
|
oops
|
2024-06-03 22:35:55 -05:00 |
|
|
e50edc3b48
|
added a flag to convert to a HF compatible model on export by stitching things
|
2024-06-03 22:34:47 -05:00 |
|
|
934672252b
|
feverish cleanup
|
2024-06-03 21:28:49 -05:00 |
|
|
05cd8b797e
|
nevermind it breaks training
|
2024-05-25 18:03:43 -05:00 |
|