Commit Graph

367 Commits

Author SHA1 Message Date
mrq
1acb0e9c84 added experimental training setting to perform token dropout to MAYBE compensate for errors from the preceding RVQ level (two types: token error offset, token dropout embedding replace) 2024-07-24 19:35:17 -05:00
mrq
611a1c4bdc might help 2024-07-22 20:57:01 -05:00
mrq
188d116222 some weird fixes for an equally weird regression with LoRA loading 2024-07-22 20:47:24 -05:00
mrq
e33c4b0cb1 oops 2024-07-22 19:38:39 -05:00
mrq
75b04686f8 added prom-less training / inferencing, some other things 2024-07-22 19:36:07 -05:00
mrq
491ae2a684 some insanity for sanity checks (some phonemes from phonemizing japanese are not in my tokenizer...) 2024-07-22 00:30:40 -05:00
mrq
ad024f400f actually pass language into dataset process script, fix coercing japanese into hiragana because espeak does not like kanji 2024-07-21 23:21:37 -05:00
mrq
3e5ca3a201 more demo page tweaks 2024-07-21 19:31:13 -05:00
mrq
7366f36f81 oops 2024-07-21 19:17:25 -05:00
mrq
e19aa643a6 cleaned up demo page creation, added option to pass in RVQ level sampling distribution for training 2024-07-21 19:12:03 -05:00
mrq
ba7ee8c0ee added demo link to readme 2024-07-19 21:22:30 -05:00
mrq
9ec88d9444 validated passing URI path for assets instead of base64 encoding them 2024-07-19 21:07:17 -05:00
mrq
d87b492295 added rudimentary demo page creator (currently just embeds base64 wavs into the page, need to test not doing that) 2024-07-19 20:49:40 -05:00
mrq
d53038a9e4 actually have split classifiers working 2024-07-19 15:33:31 -05:00
mrq
692d09f9c1 eval/validation fix for SpeechX tasks 2024-07-19 09:16:37 -05:00
mrq
28a674e0f1 fixes... 2024-07-18 23:25:32 -05:00
mrq
39f961abcd test trainer (vall_e.models.ar_nar) tests some SpeechX features 2024-07-18 18:46:45 -05:00
mrq
83a0954f85 fixes for re-introducing SpeechX tasks (need to actually validate if these all do the right things) 2024-07-18 17:16:32 -05:00
mrq
bccbb77a1a added option to either naively concat codes to concat audio waveforms (prior behavior) or to decode => concat => encode instead (although this only currently happens for prom sampling if an utternace is too small) 2024-07-18 16:48:41 -05:00
mrq
97e768601c re-introducing SpeechX tasks (need to validate them all, everything works with base tts anyways) 2024-07-18 16:16:14 -05:00
mrq
c2b8035e74 oops, kept forgetting to actually pass in lang/tone tokens (despite not really using these at the moment) 2024-07-18 14:18:34 -05:00
mrq
22fe53508c added experimental disjointed position IDs (because I *think* this might help because technically a sequence is made up of several parts, and the position embeddings shouldn't be unified) 2024-07-16 19:52:41 -05:00
mrq
fe0f235335 mechanism to store the model config inside the weights and load them, some other things to allow LoRA training on the RetNet (gradient checkpointing will gripe about inputs not having require_grad and nothing seems to remedy it) 2024-07-16 18:23:13 -05:00
mrq
3acc54df22 allow loading a different model within the web ui (apparently I did not have the web UI in the documentation) 2024-07-15 19:59:48 -05:00
mrq
7b210d9738 sanity cleanup 2024-07-04 15:58:08 -05:00
mrq
1ecf2793f4 (commented-out) support for facebookresearch/AudioDec, but support really didn't wow me (so I commented it out until I figure out why my output audio is super crusty with AudioDec) 2024-07-04 15:40:51 -05:00
mrq
db62e55a38 oops, I forgot to use the new thing for audio_backend 2024-07-04 14:54:11 -05:00
mrq
f770467eb3 stuff 2024-07-01 18:13:29 -05:00
mrq
312a8e3ead add shuffle to samplers that can support it 2024-06-30 11:36:46 -05:00
mrq
396af541c5 ugh 2024-06-30 11:11:58 -05:00
mrq
dced595391 more cleanup 2024-06-30 11:00:12 -05:00
mrq
bc2a6fa756 sanity cleanup: moved experimental features under its own thing 2024-06-30 10:37:33 -05:00
mrq
b21f74a5c5 added summing of external embeddings (at this point i dont think any amount of cope bandaids will get DAC to train nicely, I think the RVQ levels the NAR tends add too much noise if they're not accurate) 2024-06-29 23:42:30 -05:00
mrq
793ccb16fb ugh 2024-06-29 22:14:35 -05:00
mrq
2808f881c8 cleaned up subjugated audio embedding into a flag, flag can also have it include the original, underlying embedding as well (it seems to do better when set to inclusive) 2024-06-29 21:46:35 -05:00
mrq
ec5eaebcbc experimental method of using DACs quantizer ""embeddings"" to see if it helps with model quality 2024-06-29 19:46:11 -05:00
mrq
a8718d35a4 nasty bandaid because some of my DAC dataset only has 8 RVQ levels instead of the full 9 2024-06-29 10:16:37 -05:00
mrq
c4dd523b6f change from chunk-slicing paths for distributed dataloader to instead interleave 2024-06-29 10:10:35 -05:00
mrq
dd40463803 limit eval size because the training batch size seems to be used for the eval dataloader, somehow (bandaid) 2024-06-29 09:11:28 -05:00
mrq
591d3ac848 have eval dataloader use eval batch size for batchedordersampler 2024-06-28 22:44:00 -05:00
mrq
1a392b69f6 local training backend should be a bit more aware of variable batch sizes, maybe 2024-06-28 22:39:05 -05:00
mrq
83075c1505 sort duration buckets to ensure that paths sorted-by-duration are actually sorted by duration (because i didnt know that python dicts can have non-strings as keys), added batching samples based on total duration to ensure best training throughput 2024-06-28 22:28:54 -05:00
mrq
5176ced35f readme tweaks 2024-06-28 21:02:54 -05:00
mrq
8fffb94964 backport fix from tortoise_tts with local trainer + loading state when training lora 2024-06-25 13:41:29 -05:00
mrq
62a53eed64 fixed deducing tokenizer path, added option to default to naive tokenizer (for old models, like ar+nar-retnet-8) 2024-06-18 22:11:14 -05:00
mrq
8a986eb480 load exported LoRA weights if exists (to-do: make a better LoRA loading mechanism) 2024-06-18 21:45:46 -05:00
mrq
2bfe786ebd ban stop token for NAR levels (because sometimes it gets sampled and causes problems) 2024-06-17 22:14:43 -05:00
mrq
7cfb78fa64 enable LoRA for targetted RVQ levels (to experiment with, seems to help) 2024-06-17 21:45:03 -05:00
mrq
7047fcc6e2 actually make deepspeed work with LoRAs 2024-06-17 13:55:37 -05:00
mrq
1d159b1476 updated export routine to split LoRA weights from the state dict (should work with deepspeed) 2024-06-17 13:28:18 -05:00