Commit Graph

450 Commits

Author SHA1 Message Date
mrq
ed373957e2 maybe not 2024-08-09 11:38:08 -05:00
mrq
c658a7b440 make loss scaling opt-in rather than automatically determined (because it seems a DAC-based model really doesnt like loss scaling) 2024-08-09 10:51:36 -05:00
mrq
d04f6911b4 oops 2024-08-08 19:38:55 -05:00
mrq
0aa59e6f3f uncommented block that writes the metadata on HDF5 creation 2024-08-08 19:21:29 -05:00
mrq
79a6781c9e fix vall_e.data --action=hdf5 actually transcribing because past me completely forgot it tried to already put the transcribe/process dataset scripts inside the module before 2024-08-08 07:51:42 -05:00
mrq
949339a3fa do not include SDPA attention if there's no available SDPA backends 2024-08-06 20:42:39 -05:00
mrq
613024ec0d ugh 2024-08-06 20:35:15 -05:00
mrq
eac353cd0b busy work and cleanup while I wait for 1TB of audio to quantize... again. 2024-08-06 20:23:33 -05:00
mrq
f284c7ea9c do mixed-precision for AMP inside the compress function itself, because the loudness function gripes when using a float16 (non-power of 2 lengths) or bfloat16 (something about views for bfloat16) 2024-08-06 15:08:37 -05:00
mrq
b6ba2cc8e7 tweaked vall_e.emb.process to instead process audio one file at a time instead of all the files for a given speaker to avoid OOMing on less-memory-filled systems with --low-memory 2024-08-06 14:24:40 -05:00
mrq
9710b06b74 tweaks and things 2024-08-06 08:17:25 -05:00
mrq
134dac8c2b re-adapted process_libritts.py to a 'better' way (better because it processed without needing to shuffle a bunch of things and adapt to cope or something) 2024-08-05 20:34:58 -05:00
mrq
3f73fcca29 oops 2024-08-05 20:12:13 -05:00
mrq
597441e48b moved transcribe and process dataset scripts to vall_e/emb within the module itself, argparse-ified transcription script 2024-08-05 19:40:50 -05:00
mrq
7cdfa3dc0c updated process_datasets.py, added argparsing so I can mostly stop manually editing things, and some other cleanup 2024-08-05 15:59:25 -05:00
mrq
debcc93e7e add adapted MixtralAttention for when I make a bad decision to actually train a MoE 2024-08-04 22:03:22 -05:00
mrq
10aaf840e7 added export option to convert Llama to MixtralMoE for another dumb experiment 2024-08-04 20:25:06 -05:00
mrq
3a65cc4b22 fix issue with sft and shared tensors... 2024-08-04 19:56:21 -05:00
mrq
23f3b56fda oops 2024-08-04 08:18:57 -05:00
mrq
d19f93a2c0 documentation update 2024-08-04 00:14:49 -05:00
mrq
2cb465018b implicitly load either normal pickled weights or safetensors on loading the model 2024-08-03 23:34:18 -05:00
mrq
c09133d00f added safetensors support (with metadata) and feed whatever torch.load/torch.save into it 2024-08-03 23:15:20 -05:00
mrq
6a733eb2ed changed torch.Tensor().to(device, dtype) to just torch.tensor(..., device, dtype) because it's been bothering my autism that I'm creating tensors then converting rather than creating with the right device/dtype, some 'optimization' to compile the model but it doesnt seem to do anything useful 2024-08-03 22:10:21 -05:00
mrq
ab673e0426 add cap for NAR-len training, to avoid any weird cases in early training where it'll just mess up and generate long lengths 2024-08-03 21:00:32 -05:00
mrq
4d2b88b164 throw exception if training, but no model is set to train (because i ran into this wondering what the hell was happening) 2024-08-03 20:51:23 -05:00
mrq
d0a5c7eca2 more coping with the NAR len 2024-08-03 20:23:36 -05:00
mrq
11fa3da665 some cleanup, fixed the wrapper attention to explicitly use other sdpa backends 2024-08-03 19:51:00 -05:00
mrq
9564ecda43 wrapper attention class for other sdpa backends + xformers seems to have broke... 2024-08-03 15:12:11 -05:00
mrq
9e1989be1b tweaked initial NAR pass's initial token embeddings to use a different value, or osmething 2024-08-03 09:01:37 -05:00
mrq
26f74c5739 somehow fixed non-unified position IDs for the NAR-len 2024-08-03 08:43:42 -05:00
mrq
66407e5bdb tweaks for the NAR-len model, maybe 2024-08-03 08:40:39 -05:00
mrq
97c5241bef fixes, throw an exception when using NAR only model with non-unified position IDs, since for some reason it outputs garbage for the NAR 2024-08-02 22:25:49 -05:00
mrq
4456d3172b that's what I get for testing without hdf5 on my previous machine.... 2024-08-02 20:44:01 -05:00
mrq
7a77978096 oversight with using resize_modules 2024-08-02 20:28:49 -05:00
mrq
808a79ebaf oops 2024-08-01 22:56:04 -05:00
mrq
443422ecb5 ugh, finally got some form of offloading working (need to test if it works on different GPUs, but GPU and CPU offloading seems to work in the test trainer) 2024-08-01 22:43:39 -05:00
mrq
c9ec6b28ef it actually wasn't working because Engines.__init__() automatically moves the entire module to the requested device, which was being called after offloading the model in the test trainer (and it seems I cant do it without injecting a bunch of shit in modeling_llama.py) 2024-08-01 20:56:28 -05:00
mrq
b4c895114c naive model offloading support (handles automatically splitting parts of the model to requested device per memory constraints, either inferred or requested in the yaml, input tensors are automatically migrated to the right device, it SEEMS to work for training under the test trainer when split between GPU and CPU) (this was specifically only because that Flux imagegen model released so I can test it there) 2024-08-01 20:12:06 -05:00
mrq
387358bc8a fixes for the NAR-len model, and documentation some config options, and a better way to handle resizing modules on state_dict load 2024-07-31 20:35:09 -05:00
mrq
52d13b321f I rather have it default to non-strict loading instead so I can clean up YAMLs 2024-07-30 22:24:38 -05:00
mrq
d7c6be6f78 fix weird regression in handling checkpoints when backend is local, but deepspeed checkpoints are in (it was handled with LoRA loading but not real loading...) 2024-07-30 22:15:56 -05:00
mrq
07f8e2ad06 added option to set the causal size (how many tokens to sample per AR step), but requires the model to be trained for this (which explains why recurrent chunk sampling just doesn't work for the retnet tests, obvious in hindsight) 2024-07-30 20:53:51 -05:00
mrq
ebf848d249 possible speedup for samplers that require a list of previous tokens (the DRY sampler made me realize that I should copy the tolist() thing from the rep pen sampler for everything else) 2024-07-29 20:23:26 -05:00
mrq
55b0121b1a trying (and failing) to nail a weird regression in fancier attentions 2024-07-29 19:53:37 -05:00
mrq
c2f5b916fc added what I think is DRY sampling 2024-07-29 19:15:07 -05:00
mrq
ce8bb1e4f7 sanity cleanups with weird off-by-one-ness, cleaned up and validated vall_e.models.experimental works again 2024-07-27 15:36:05 -05:00
mrq
06e948aec1 suppress warning on exit about distributed not being cleaned up (because I updated my system) 2024-07-25 16:50:47 -05:00
mrq
682e4387dc oops (fixed proms being erased from a config oversight) 2024-07-25 12:39:57 -05:00
mrq
1acb0e9c84 added experimental training setting to perform token dropout to MAYBE compensate for errors from the preceding RVQ level (two types: token error offset, token dropout embedding replace) 2024-07-24 19:35:17 -05:00
mrq
611a1c4bdc might help 2024-07-22 20:57:01 -05:00
mrq
188d116222 some weird fixes for an equally weird regression with LoRA loading 2024-07-22 20:47:24 -05:00
mrq
e33c4b0cb1 oops 2024-07-22 19:38:39 -05:00
mrq
75b04686f8 added prom-less training / inferencing, some other things 2024-07-22 19:36:07 -05:00
mrq
491ae2a684 some insanity for sanity checks (some phonemes from phonemizing japanese are not in my tokenizer...) 2024-07-22 00:30:40 -05:00
mrq
ad024f400f actually pass language into dataset process script, fix coercing japanese into hiragana because espeak does not like kanji 2024-07-21 23:21:37 -05:00
mrq
3e5ca3a201 more demo page tweaks 2024-07-21 19:31:13 -05:00
mrq
7366f36f81 oops 2024-07-21 19:17:25 -05:00
mrq
e19aa643a6 cleaned up demo page creation, added option to pass in RVQ level sampling distribution for training 2024-07-21 19:12:03 -05:00
mrq
ba7ee8c0ee added demo link to readme 2024-07-19 21:22:30 -05:00
mrq
9ec88d9444 validated passing URI path for assets instead of base64 encoding them 2024-07-19 21:07:17 -05:00
mrq
d87b492295 added rudimentary demo page creator (currently just embeds base64 wavs into the page, need to test not doing that) 2024-07-19 20:49:40 -05:00
mrq
d53038a9e4 actually have split classifiers working 2024-07-19 15:33:31 -05:00
mrq
692d09f9c1 eval/validation fix for SpeechX tasks 2024-07-19 09:16:37 -05:00
mrq
28a674e0f1 fixes... 2024-07-18 23:25:32 -05:00
mrq
39f961abcd test trainer (vall_e.models.ar_nar) tests some SpeechX features 2024-07-18 18:46:45 -05:00
mrq
83a0954f85 fixes for re-introducing SpeechX tasks (need to actually validate if these all do the right things) 2024-07-18 17:16:32 -05:00
mrq
bccbb77a1a added option to either naively concat codes to concat audio waveforms (prior behavior) or to decode => concat => encode instead (although this only currently happens for prom sampling if an utternace is too small) 2024-07-18 16:48:41 -05:00
mrq
97e768601c re-introducing SpeechX tasks (need to validate them all, everything works with base tts anyways) 2024-07-18 16:16:14 -05:00
mrq
c2b8035e74 oops, kept forgetting to actually pass in lang/tone tokens (despite not really using these at the moment) 2024-07-18 14:18:34 -05:00
mrq
22fe53508c added experimental disjointed position IDs (because I *think* this might help because technically a sequence is made up of several parts, and the position embeddings shouldn't be unified) 2024-07-16 19:52:41 -05:00
mrq
fe0f235335 mechanism to store the model config inside the weights and load them, some other things to allow LoRA training on the RetNet (gradient checkpointing will gripe about inputs not having require_grad and nothing seems to remedy it) 2024-07-16 18:23:13 -05:00
mrq
3acc54df22 allow loading a different model within the web ui (apparently I did not have the web UI in the documentation) 2024-07-15 19:59:48 -05:00
mrq
7b210d9738 sanity cleanup 2024-07-04 15:58:08 -05:00
mrq
1ecf2793f4 (commented-out) support for facebookresearch/AudioDec, but support really didn't wow me (so I commented it out until I figure out why my output audio is super crusty with AudioDec) 2024-07-04 15:40:51 -05:00
mrq
f770467eb3 stuff 2024-07-01 18:13:29 -05:00
mrq
312a8e3ead add shuffle to samplers that can support it 2024-06-30 11:36:46 -05:00
mrq
396af541c5 ugh 2024-06-30 11:11:58 -05:00
mrq
dced595391 more cleanup 2024-06-30 11:00:12 -05:00
mrq
bc2a6fa756 sanity cleanup: moved experimental features under its own thing 2024-06-30 10:37:33 -05:00
mrq
b21f74a5c5 added summing of external embeddings (at this point i dont think any amount of cope bandaids will get DAC to train nicely, I think the RVQ levels the NAR tends add too much noise if they're not accurate) 2024-06-29 23:42:30 -05:00
mrq
793ccb16fb ugh 2024-06-29 22:14:35 -05:00
mrq
2808f881c8 cleaned up subjugated audio embedding into a flag, flag can also have it include the original, underlying embedding as well (it seems to do better when set to inclusive) 2024-06-29 21:46:35 -05:00
mrq
ec5eaebcbc experimental method of using DACs quantizer ""embeddings"" to see if it helps with model quality 2024-06-29 19:46:11 -05:00
mrq
a8718d35a4 nasty bandaid because some of my DAC dataset only has 8 RVQ levels instead of the full 9 2024-06-29 10:16:37 -05:00
mrq
c4dd523b6f change from chunk-slicing paths for distributed dataloader to instead interleave 2024-06-29 10:10:35 -05:00
mrq
dd40463803 limit eval size because the training batch size seems to be used for the eval dataloader, somehow (bandaid) 2024-06-29 09:11:28 -05:00
mrq
591d3ac848 have eval dataloader use eval batch size for batchedordersampler 2024-06-28 22:44:00 -05:00
mrq
1a392b69f6 local training backend should be a bit more aware of variable batch sizes, maybe 2024-06-28 22:39:05 -05:00
mrq
83075c1505 sort duration buckets to ensure that paths sorted-by-duration are actually sorted by duration (because i didnt know that python dicts can have non-strings as keys), added batching samples based on total duration to ensure best training throughput 2024-06-28 22:28:54 -05:00
mrq
8fffb94964 backport fix from tortoise_tts with local trainer + loading state when training lora 2024-06-25 13:41:29 -05:00
mrq
62a53eed64 fixed deducing tokenizer path, added option to default to naive tokenizer (for old models, like ar+nar-retnet-8) 2024-06-18 22:11:14 -05:00
mrq
8a986eb480 load exported LoRA weights if exists (to-do: make a better LoRA loading mechanism) 2024-06-18 21:45:46 -05:00
mrq
2bfe786ebd ban stop token for NAR levels (because sometimes it gets sampled and causes problems) 2024-06-17 22:14:43 -05:00
mrq
7cfb78fa64 enable LoRA for targetted RVQ levels (to experiment with, seems to help) 2024-06-17 21:45:03 -05:00
mrq
7047fcc6e2 actually make deepspeed work with LoRAs 2024-06-17 13:55:37 -05:00
mrq
1d159b1476 updated export routine to split LoRA weights from the state dict (should work with deepspeed) 2024-06-17 13:28:18 -05:00
mrq
726a4b613f naive, rudimentary DeepSpeed support (just live with the LoRA weights living with the original weights, they can be split later) 2024-06-17 13:17:24 -05:00
mrq
bd0bc10ec0 added LoRA policy to decide what layer of the model gets adapted based on simple inclusion/exclusion terms 2024-06-17 13:05:06 -05:00
mrq
be051d9544 added other LoRA method using parametrization rather than linear injection 2024-06-17 09:58:34 -05:00
mrq
45a39fb79f very rudimentary lora support (no deepspeed support, tested training and saving but not loading yet) 2024-06-17 00:09:16 -05:00