Commit Graph

243 Commits

Author SHA1 Message Date
mrq
069b27570f set option to set training masking ratio (I don't think for tts a fixed masking ratio is beneficial since the magic of the AR+NAR is being able to still reference the prior sequence of tokens for predicting things) 2024-11-17 17:04:07 -06:00
mrq
23fdba0c98 tweaks and changes 2024-11-16 15:49:06 -06:00
mrq
39096f8ff3 redid loss calculation to be cleaner, and position ID generation, and other things (I might need to train the NAR-len from scratch and not resume from an existing checkpoint.........) 2024-11-14 22:17:47 -06:00
mrq
ef05c951ff adjust fp16 loss scaling since I fried a model overnight when it hit 8K scale 2024-11-14 09:23:52 -06:00
mrq
910033343c overhauled how the right resp level / classifier gets picked to avoid cringemath 2024-11-13 13:31:17 -06:00
mrq
269648605e move NAR-len rvq level 0 to separate embedding 2024-11-13 11:38:58 -06:00
mrq
2f56696506 overhauled inference/sampler kwargs to stop being a bloated mess 2024-11-11 20:21:16 -06:00
mrq
cf9df71f2c use homwbrewed caching system for dataloader paths / durations (I'm pretty sure I am now triggering OOM killers with my entire dataset used) 2024-11-11 16:32:08 -06:00
mrq
a748e223ce tweaks 2024-11-11 12:40:41 -06:00
mrq
9cb0b6901b unified nar.py into ar_nar.py 2024-11-10 12:19:48 -06:00
mrq
c6a38693a2 This better work 2024-11-09 18:04:59 -06:00
mrq
bcabde3454 more notes 2024-11-06 13:51:28 -06:00
mrq
c83670c38c Windows specific fixes (to-do: find libespeak-ng.dll automatically because it cannot be trusted to do it by default) 2024-11-03 19:19:15 -06:00
mrq
d229725c76 more adjustments (adjustments of early-exit entropy/varentropy thresholds, default rep pen being 1.5, experimental refine-on-stop, etc.) 2024-11-03 18:31:28 -06:00
mrq
fb8faa295b actually float16(+AMP) and layerskip is bad and will kill the model...... 2024-11-01 18:36:44 -05:00
mrq
edf1e66bf9 layerskip_r=6 fries the model so hard the loss is sub-1... 2024-11-01 17:06:07 -05:00
mrq
9b6c57bc57 third time's the charm (for some reason it escaped me that I should treat early exit loss as an aux_loss to be used with the normal loss, as if I was training a MoE's router) 2024-11-01 12:50:37 -05:00
mrq
76ebef45dc off-by-one... 2024-10-31 13:24:48 -05:00
mrq
a22534e8f4 layer skip training implemented (need to gut the inferencing from the repo, and to actually see if the model can benefit from this) 2024-10-30 20:05:45 -05:00
mrq
4049f51ba9 added option to load lora directly from the model file itself with --lora 2024-10-26 00:13:10 -05:00
mrq
ccf71dc1b6 added option to load from a model state dict directly instead of a yaml (to-do: do this for LoRAs too), automatically download the default model if none is provided 2024-10-25 22:15:15 -05:00
mrq
a96f5aee32 adjusted how i want to pass eval kwargs 2024-10-25 20:38:09 -05:00
mrq
8920e5e86b actually have beam_width in the webUI work 2024-10-22 22:06:22 -05:00
mrq
8eb9a4056b modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling 2024-10-22 18:12:39 -05:00
mrq
c8f31db1de default to greedy sample AR (i should probably test this more but it seems to pass my harvard sentences and tongue twisters) 2024-10-18 16:58:56 -05:00
mrq
07f4935a75 more tweaks 2024-10-18 13:19:36 -05:00
mrq
0dfab973e7 oops 2024-10-18 09:40:06 -05:00
mrq
75b90be325 cleaned up unused config flags, allow less strict yaml by pruning missing keys, renamed some dataset configs to be more unified 2024-10-17 17:06:48 -05:00
mrq
f88097ccf6 add config option to set the rate of sampling randomly vs similar speakers during training 2024-10-16 14:27:58 -05:00
mrq
04e983b86b modified demo page to be more modular with demoing comparisons, actually provide a path to use modified naive attention, entropix sampling is not tied to an experimental yaml flag now 2024-10-12 11:27:55 -05:00
mrq
bef43a0c18 added experimental entropix sampling support 2024-10-11 21:18:26 -05:00
mrq
f24547ad4e add top_k sampling / offset for prompt similar utterance sampling 2024-09-26 16:26:40 -05:00
mrq
c5e9142863 added option to retokenize phonemes for hdf5 (to save having to remake my hdf5 file) 2024-09-21 13:08:01 -05:00
mrq
fe241f6a99 support for wildcard in training/validation/noise dataset array (to-do: a better way to query between metadata folder and data folder) 2024-09-18 21:34:43 -05:00
mrq
ebac1db16c maybe final tweaks, I really needed to unify my json read/write and orjson is proven to be fast enough for me to try and rely on it more 2024-09-17 22:57:04 -05:00
mrq
56f25f7a9b more stuff for similar-speaker prompt sampling (to-do: actually test if this works...) 2024-09-16 23:10:29 -05:00
mrq
54203c059d validated rep pen for STT (sometimes needed to wrangle the model) 2024-09-08 08:30:30 -05:00
mrq
5d66a7db52 webui cleanup, more tweaks, default to safetensors in config 2024-09-07 21:45:05 -05:00
mrq
54547b74d8 experimental implementation of STT (need to actually test on a model, test trainer seems to work) 2024-09-05 20:43:20 -05:00
mrq
32287710a2 moved prints to use logger, edited readme (fused_attn doesnt seem stable for training) 2024-08-29 13:27:16 -05:00
mrq
ed373957e2 maybe not 2024-08-09 11:38:08 -05:00
mrq
c658a7b440 make loss scaling opt-in rather than automatically determined (because it seems a DAC-based model really doesnt like loss scaling) 2024-08-09 10:51:36 -05:00
mrq
eac353cd0b busy work and cleanup while I wait for 1TB of audio to quantize... again. 2024-08-06 20:23:33 -05:00
mrq
2cb465018b implicitly load either normal pickled weights or safetensors on loading the model 2024-08-03 23:34:18 -05:00
mrq
c09133d00f added safetensors support (with metadata) and feed whatever torch.load/torch.save into it 2024-08-03 23:15:20 -05:00
mrq
6a733eb2ed changed torch.Tensor().to(device, dtype) to just torch.tensor(..., device, dtype) because it's been bothering my autism that I'm creating tensors then converting rather than creating with the right device/dtype, some 'optimization' to compile the model but it doesnt seem to do anything useful 2024-08-03 22:10:21 -05:00
mrq
11fa3da665 some cleanup, fixed the wrapper attention to explicitly use other sdpa backends 2024-08-03 19:51:00 -05:00
mrq
c9ec6b28ef it actually wasn't working because Engines.__init__() automatically moves the entire module to the requested device, which was being called after offloading the model in the test trainer (and it seems I cant do it without injecting a bunch of shit in modeling_llama.py) 2024-08-01 20:56:28 -05:00
mrq
b4c895114c naive model offloading support (handles automatically splitting parts of the model to requested device per memory constraints, either inferred or requested in the yaml, input tensors are automatically migrated to the right device, it SEEMS to work for training under the test trainer when split between GPU and CPU) (this was specifically only because that Flux imagegen model released so I can test it there) 2024-08-01 20:12:06 -05:00
mrq
387358bc8a fixes for the NAR-len model, and documentation some config options, and a better way to handle resizing modules on state_dict load 2024-07-31 20:35:09 -05:00
mrq
52d13b321f I rather have it default to non-strict loading instead so I can clean up YAMLs 2024-07-30 22:24:38 -05:00
mrq
07f8e2ad06 added option to set the causal size (how many tokens to sample per AR step), but requires the model to be trained for this (which explains why recurrent chunk sampling just doesn't work for the retnet tests, obvious in hindsight) 2024-07-30 20:53:51 -05:00
mrq
682e4387dc oops (fixed proms being erased from a config oversight) 2024-07-25 12:39:57 -05:00
mrq
1acb0e9c84 added experimental training setting to perform token dropout to MAYBE compensate for errors from the preceding RVQ level (two types: token error offset, token dropout embedding replace) 2024-07-24 19:35:17 -05:00
mrq
75b04686f8 added prom-less training / inferencing, some other things 2024-07-22 19:36:07 -05:00
mrq
e19aa643a6 cleaned up demo page creation, added option to pass in RVQ level sampling distribution for training 2024-07-21 19:12:03 -05:00
mrq
d53038a9e4 actually have split classifiers working 2024-07-19 15:33:31 -05:00
mrq
83a0954f85 fixes for re-introducing SpeechX tasks (need to actually validate if these all do the right things) 2024-07-18 17:16:32 -05:00
mrq
bccbb77a1a added option to either naively concat codes to concat audio waveforms (prior behavior) or to decode => concat => encode instead (although this only currently happens for prom sampling if an utternace is too small) 2024-07-18 16:48:41 -05:00
mrq
97e768601c re-introducing SpeechX tasks (need to validate them all, everything works with base tts anyways) 2024-07-18 16:16:14 -05:00
mrq
22fe53508c added experimental disjointed position IDs (because I *think* this might help because technically a sequence is made up of several parts, and the position embeddings shouldn't be unified) 2024-07-16 19:52:41 -05:00
mrq
fe0f235335 mechanism to store the model config inside the weights and load them, some other things to allow LoRA training on the RetNet (gradient checkpointing will gripe about inputs not having require_grad and nothing seems to remedy it) 2024-07-16 18:23:13 -05:00
mrq
3acc54df22 allow loading a different model within the web ui (apparently I did not have the web UI in the documentation) 2024-07-15 19:59:48 -05:00
mrq
7b210d9738 sanity cleanup 2024-07-04 15:58:08 -05:00
mrq
1ecf2793f4 (commented-out) support for facebookresearch/AudioDec, but support really didn't wow me (so I commented it out until I figure out why my output audio is super crusty with AudioDec) 2024-07-04 15:40:51 -05:00
mrq
f770467eb3 stuff 2024-07-01 18:13:29 -05:00
mrq
312a8e3ead add shuffle to samplers that can support it 2024-06-30 11:36:46 -05:00
mrq
396af541c5 ugh 2024-06-30 11:11:58 -05:00
mrq
dced595391 more cleanup 2024-06-30 11:00:12 -05:00
mrq
bc2a6fa756 sanity cleanup: moved experimental features under its own thing 2024-06-30 10:37:33 -05:00
mrq
2808f881c8 cleaned up subjugated audio embedding into a flag, flag can also have it include the original, underlying embedding as well (it seems to do better when set to inclusive) 2024-06-29 21:46:35 -05:00
mrq
ec5eaebcbc experimental method of using DACs quantizer ""embeddings"" to see if it helps with model quality 2024-06-29 19:46:11 -05:00
mrq
83075c1505 sort duration buckets to ensure that paths sorted-by-duration are actually sorted by duration (because i didnt know that python dicts can have non-strings as keys), added batching samples based on total duration to ensure best training throughput 2024-06-28 22:28:54 -05:00
mrq
8fffb94964 backport fix from tortoise_tts with local trainer + loading state when training lora 2024-06-25 13:41:29 -05:00
mrq
62a53eed64 fixed deducing tokenizer path, added option to default to naive tokenizer (for old models, like ar+nar-retnet-8) 2024-06-18 22:11:14 -05:00
mrq
8a986eb480 load exported LoRA weights if exists (to-do: make a better LoRA loading mechanism) 2024-06-18 21:45:46 -05:00
mrq
7cfb78fa64 enable LoRA for targetted RVQ levels (to experiment with, seems to help) 2024-06-17 21:45:03 -05:00
mrq
1d159b1476 updated export routine to split LoRA weights from the state dict (should work with deepspeed) 2024-06-17 13:28:18 -05:00
mrq
bd0bc10ec0 added LoRA policy to decide what layer of the model gets adapted based on simple inclusion/exclusion terms 2024-06-17 13:05:06 -05:00
mrq
45a39fb79f very rudimentary lora support (no deepspeed support, tested training and saving but not loading yet) 2024-06-17 00:09:16 -05:00
mrq
b3b67f34ac added option to sort paths by durations to better group equally lengthed sequences together (and there was maybe a logic error from creating the samplers and then interleave-reordering paths, desyncing them, maybe) 2024-06-13 22:37:34 -05:00
mrq
65a8960305 option to split classifier per-level instead of sharing one (at this point I'm just scrambling to try and cope with training a DAC model, the NAR is being a pain) 2024-06-11 22:28:59 -05:00
mrq
a7a6e0ac76 validated that inferencing works, changed some defaults (NAR benefits from greedy sampling) 2024-06-09 17:11:38 -05:00
mrq
132a02c48b sanity cleanup, backup config yaml for each log file 2024-06-09 11:22:52 -05:00
mrq
58fb0a84db added experimental NAR only model (inferences text length, need more experimenting), AudioEmbedding logic cleanup (I still think it's being done wrong) 2024-06-08 15:42:02 -05:00
mrq
e35a91c67a ugh 2024-06-07 21:56:14 -05:00
mrq
eafa622be2 I forgot the actual reason I was cleaning things up was to re-include prom loss calculation (I realized the reason I did this was because of an prom embedding oversight, it seems to work now) 2024-06-07 20:29:25 -05:00
mrq
da8242d086 finally got around to removing omegaconf 2024-06-07 20:23:53 -05:00
mrq
b2194b859a re-added loading multiple models because I'm now entertaining having split AR/NAR models again (and need a way to load both at once) 2024-06-06 09:48:43 -05:00
mrq
4073656293 oops 2024-06-05 20:53:10 -05:00
mrq
48cd1054f9 madness 2024-06-04 23:48:51 -05:00
mrq
406ff7bbe1 re-implemented config.model.interleave for the HF-compat experimental method 2024-06-04 14:19:52 -05:00
mrq
934672252b feverish cleanup 2024-06-03 21:28:49 -05:00
mrq
c1fcd889d5 reverted automatically disabling split loss calc, since it seems that it's actually cacling loss on prom causes the oddities, maybe 2024-06-01 12:34:59 -05:00
mrq
31785f4eeb actually don't default to compute split losses, test bitnet model doesn't seem to be doing things right (despite debug printouts showing theyre roughly the same logit/loss sequences, could just be bitnet linears being not up to par on actual models) 2024-06-01 09:12:51 -05:00
mrq
e9c87060df oops 2024-05-31 22:22:28 -05:00
mrq
b482ca19ff added model config option to set KV head count for MQA/GQA instead of MHA for llama-based models (i think its very negligible both ways on such a small model size) 2024-05-31 19:32:37 -05:00
mrq
da473295b7 better way to compute per-segment losses 2024-05-28 19:29:54 -05:00
mrq
5af6f41c94 added loss calcs against prom (requires the right settings for not shit results, disabled by default) 2024-05-27 08:43:00 -05:00
mrq
ddbacde0d1 DAC just doesn't work well enough...... 2024-05-25 11:07:52 -05:00