|
682e4387dc
|
oops (fixed proms being erased from a config oversight)
|
2024-07-25 12:39:57 -05:00 |
|
|
1acb0e9c84
|
added experimental training setting to perform token dropout to MAYBE compensate for errors from the preceding RVQ level (two types: token error offset, token dropout embedding replace)
|
2024-07-24 19:35:17 -05:00 |
|
|
75b04686f8
|
added prom-less training / inferencing, some other things
|
2024-07-22 19:36:07 -05:00 |
|
|
e19aa643a6
|
cleaned up demo page creation, added option to pass in RVQ level sampling distribution for training
|
2024-07-21 19:12:03 -05:00 |
|
|
d53038a9e4
|
actually have split classifiers working
|
2024-07-19 15:33:31 -05:00 |
|
|
83a0954f85
|
fixes for re-introducing SpeechX tasks (need to actually validate if these all do the right things)
|
2024-07-18 17:16:32 -05:00 |
|
|
bccbb77a1a
|
added option to either naively concat codes to concat audio waveforms (prior behavior) or to decode => concat => encode instead (although this only currently happens for prom sampling if an utternace is too small)
|
2024-07-18 16:48:41 -05:00 |
|
|
97e768601c
|
re-introducing SpeechX tasks (need to validate them all, everything works with base tts anyways)
|
2024-07-18 16:16:14 -05:00 |
|
|
22fe53508c
|
added experimental disjointed position IDs (because I *think* this might help because technically a sequence is made up of several parts, and the position embeddings shouldn't be unified)
|
2024-07-16 19:52:41 -05:00 |
|
|
fe0f235335
|
mechanism to store the model config inside the weights and load them, some other things to allow LoRA training on the RetNet (gradient checkpointing will gripe about inputs not having require_grad and nothing seems to remedy it)
|
2024-07-16 18:23:13 -05:00 |
|
|
3acc54df22
|
allow loading a different model within the web ui (apparently I did not have the web UI in the documentation)
|
2024-07-15 19:59:48 -05:00 |
|
|
7b210d9738
|
sanity cleanup
|
2024-07-04 15:58:08 -05:00 |
|
|
1ecf2793f4
|
(commented-out) support for facebookresearch/AudioDec, but support really didn't wow me (so I commented it out until I figure out why my output audio is super crusty with AudioDec)
|
2024-07-04 15:40:51 -05:00 |
|
|
f770467eb3
|
stuff
|
2024-07-01 18:13:29 -05:00 |
|
|
312a8e3ead
|
add shuffle to samplers that can support it
|
2024-06-30 11:36:46 -05:00 |
|
|
396af541c5
|
ugh
|
2024-06-30 11:11:58 -05:00 |
|
|
dced595391
|
more cleanup
|
2024-06-30 11:00:12 -05:00 |
|
|
bc2a6fa756
|
sanity cleanup: moved experimental features under its own thing
|
2024-06-30 10:37:33 -05:00 |
|
|
2808f881c8
|
cleaned up subjugated audio embedding into a flag, flag can also have it include the original, underlying embedding as well (it seems to do better when set to inclusive)
|
2024-06-29 21:46:35 -05:00 |
|
|
ec5eaebcbc
|
experimental method of using DACs quantizer ""embeddings"" to see if it helps with model quality
|
2024-06-29 19:46:11 -05:00 |
|
|
83075c1505
|
sort duration buckets to ensure that paths sorted-by-duration are actually sorted by duration (because i didnt know that python dicts can have non-strings as keys), added batching samples based on total duration to ensure best training throughput
|
2024-06-28 22:28:54 -05:00 |
|
|
8fffb94964
|
backport fix from tortoise_tts with local trainer + loading state when training lora
|
2024-06-25 13:41:29 -05:00 |
|
|
62a53eed64
|
fixed deducing tokenizer path, added option to default to naive tokenizer (for old models, like ar+nar-retnet-8)
|
2024-06-18 22:11:14 -05:00 |
|
|
8a986eb480
|
load exported LoRA weights if exists (to-do: make a better LoRA loading mechanism)
|
2024-06-18 21:45:46 -05:00 |
|
|
7cfb78fa64
|
enable LoRA for targetted RVQ levels (to experiment with, seems to help)
|
2024-06-17 21:45:03 -05:00 |
|
|
1d159b1476
|
updated export routine to split LoRA weights from the state dict (should work with deepspeed)
|
2024-06-17 13:28:18 -05:00 |
|
|
bd0bc10ec0
|
added LoRA policy to decide what layer of the model gets adapted based on simple inclusion/exclusion terms
|
2024-06-17 13:05:06 -05:00 |
|
|
45a39fb79f
|
very rudimentary lora support (no deepspeed support, tested training and saving but not loading yet)
|
2024-06-17 00:09:16 -05:00 |
|
|
b3b67f34ac
|
added option to sort paths by durations to better group equally lengthed sequences together (and there was maybe a logic error from creating the samplers and then interleave-reordering paths, desyncing them, maybe)
|
2024-06-13 22:37:34 -05:00 |
|
|
65a8960305
|
option to split classifier per-level instead of sharing one (at this point I'm just scrambling to try and cope with training a DAC model, the NAR is being a pain)
|
2024-06-11 22:28:59 -05:00 |
|
|
a7a6e0ac76
|
validated that inferencing works, changed some defaults (NAR benefits from greedy sampling)
|
2024-06-09 17:11:38 -05:00 |
|
|
132a02c48b
|
sanity cleanup, backup config yaml for each log file
|
2024-06-09 11:22:52 -05:00 |
|
|
58fb0a84db
|
added experimental NAR only model (inferences text length, need more experimenting), AudioEmbedding logic cleanup (I still think it's being done wrong)
|
2024-06-08 15:42:02 -05:00 |
|
|
e35a91c67a
|
ugh
|
2024-06-07 21:56:14 -05:00 |
|
|
eafa622be2
|
I forgot the actual reason I was cleaning things up was to re-include prom loss calculation (I realized the reason I did this was because of an prom embedding oversight, it seems to work now)
|
2024-06-07 20:29:25 -05:00 |
|
|
da8242d086
|
finally got around to removing omegaconf
|
2024-06-07 20:23:53 -05:00 |
|
|
b2194b859a
|
re-added loading multiple models because I'm now entertaining having split AR/NAR models again (and need a way to load both at once)
|
2024-06-06 09:48:43 -05:00 |
|
|
4073656293
|
oops
|
2024-06-05 20:53:10 -05:00 |
|
|
48cd1054f9
|
madness
|
2024-06-04 23:48:51 -05:00 |
|
|
406ff7bbe1
|
re-implemented config.model.interleave for the HF-compat experimental method
|
2024-06-04 14:19:52 -05:00 |
|
|
934672252b
|
feverish cleanup
|
2024-06-03 21:28:49 -05:00 |
|
|
c1fcd889d5
|
reverted automatically disabling split loss calc, since it seems that it's actually cacling loss on prom causes the oddities, maybe
|
2024-06-01 12:34:59 -05:00 |
|
|
31785f4eeb
|
actually don't default to compute split losses, test bitnet model doesn't seem to be doing things right (despite debug printouts showing theyre roughly the same logit/loss sequences, could just be bitnet linears being not up to par on actual models)
|
2024-06-01 09:12:51 -05:00 |
|
|
e9c87060df
|
oops
|
2024-05-31 22:22:28 -05:00 |
|
|
b482ca19ff
|
added model config option to set KV head count for MQA/GQA instead of MHA for llama-based models (i think its very negligible both ways on such a small model size)
|
2024-05-31 19:32:37 -05:00 |
|
|
da473295b7
|
better way to compute per-segment losses
|
2024-05-28 19:29:54 -05:00 |
|
|
5af6f41c94
|
added loss calcs against prom (requires the right settings for not shit results, disabled by default)
|
2024-05-27 08:43:00 -05:00 |
|
|
ddbacde0d1
|
DAC just doesn't work well enough......
|
2024-05-25 11:07:52 -05:00 |
|
|
458b95d196
|
added option to split between text loss and audio loss (to-do: document this better), because it may or may not be a problem with LLaMA-backed models because my loss hovers around 3.9 / 56% accuracy despite sounding decent at the moment
|
2024-05-19 11:23:56 -05:00 |
|
|
8d79f78e0a
|
god I need to replace omegaconf
|
2024-05-12 14:01:52 -05:00 |
|