vall-e

History

mrq fb8faa295b actually float16(+AMP) and layerskip is bad and will kill the model......		2024-11-01 18:36:44 -05:00
..
arch	actually float16(+AMP) and layerskip is bad and will kill the model......	2024-11-01 18:36:44 -05:00
__init__.py	added option to load from a model state dict directly instead of a yaml (to-do: do this for LoRAs too), automatically download the default model if none is provided	2024-10-25 22:15:15 -05:00
ar_nar.py	third time's the charm (for some reason it escaped me that I should treat early exit loss as an aux_loss to be used with the normal loss, as if I was training a MoE's router)	2024-11-01 12:50:37 -05:00
ar.py	added prefixing with silence (was to test something, currently hidden under cfg.experimental=True)	2024-10-18 17:19:52 -05:00
base.py	third time's the charm (for some reason it escaped me that I should treat early exit loss as an aux_loss to be used with the normal loss, as if I was training a MoE's router)	2024-11-01 12:50:37 -05:00
experimental.py	moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)	2024-08-29 13:27:16 -05:00
lora.py
nar.py	layer skip training implemented (need to gut the inferencing from the repo, and to actually see if the model can benefit from this)	2024-10-30 20:05:45 -05:00