|
7cdfa3dc0c
|
updated process_datasets.py, added argparsing so I can mostly stop manually editing things, and some other cleanup
|
2024-08-05 15:59:25 -05:00 |
|
|
debcc93e7e
|
add adapted MixtralAttention for when I make a bad decision to actually train a MoE
|
2024-08-04 22:03:22 -05:00 |
|
|
10aaf840e7
|
added export option to convert Llama to MixtralMoE for another dumb experiment
|
2024-08-04 20:25:06 -05:00 |
|
|
3a65cc4b22
|
fix issue with sft and shared tensors...
|
2024-08-04 19:56:21 -05:00 |
|
|
23f3b56fda
|
oops
|
2024-08-04 08:18:57 -05:00 |
|
|
d19f93a2c0
|
documentation update
|
2024-08-04 00:14:49 -05:00 |
|
|
2cb465018b
|
implicitly load either normal pickled weights or safetensors on loading the model
|
2024-08-03 23:34:18 -05:00 |
|
|
c09133d00f
|
added safetensors support (with metadata) and feed whatever torch.load/torch.save into it
|
2024-08-03 23:15:20 -05:00 |
|
|
6a733eb2ed
|
changed torch.Tensor().to(device, dtype) to just torch.tensor(..., device, dtype) because it's been bothering my autism that I'm creating tensors then converting rather than creating with the right device/dtype, some 'optimization' to compile the model but it doesnt seem to do anything useful
|
2024-08-03 22:10:21 -05:00 |
|
|
ab673e0426
|
add cap for NAR-len training, to avoid any weird cases in early training where it'll just mess up and generate long lengths
|
2024-08-03 21:00:32 -05:00 |
|
|
4d2b88b164
|
throw exception if training, but no model is set to train (because i ran into this wondering what the hell was happening)
|
2024-08-03 20:51:23 -05:00 |
|
|
d0a5c7eca2
|
more coping with the NAR len
|
2024-08-03 20:23:36 -05:00 |
|
|
11fa3da665
|
some cleanup, fixed the wrapper attention to explicitly use other sdpa backends
|
2024-08-03 19:51:00 -05:00 |
|
|
9564ecda43
|
wrapper attention class for other sdpa backends + xformers seems to have broke...
|
2024-08-03 15:12:11 -05:00 |
|
|
9e1989be1b
|
tweaked initial NAR pass's initial token embeddings to use a different value, or osmething
|
2024-08-03 09:01:37 -05:00 |
|
|
26f74c5739
|
somehow fixed non-unified position IDs for the NAR-len
|
2024-08-03 08:43:42 -05:00 |
|
|
66407e5bdb
|
tweaks for the NAR-len model, maybe
|
2024-08-03 08:40:39 -05:00 |
|
|
97c5241bef
|
fixes, throw an exception when using NAR only model with non-unified position IDs, since for some reason it outputs garbage for the NAR
|
2024-08-02 22:25:49 -05:00 |
|
|
4456d3172b
|
that's what I get for testing without hdf5 on my previous machine....
|
2024-08-02 20:44:01 -05:00 |
|
|
7a77978096
|
oversight with using resize_modules
|
2024-08-02 20:28:49 -05:00 |
|
|
808a79ebaf
|
oops
|
2024-08-01 22:56:04 -05:00 |
|
|
443422ecb5
|
ugh, finally got some form of offloading working (need to test if it works on different GPUs, but GPU and CPU offloading seems to work in the test trainer)
|
2024-08-01 22:43:39 -05:00 |
|
|
c9ec6b28ef
|
it actually wasn't working because Engines.__init__() automatically moves the entire module to the requested device, which was being called after offloading the model in the test trainer (and it seems I cant do it without injecting a bunch of shit in modeling_llama.py)
|
2024-08-01 20:56:28 -05:00 |
|
|
b4c895114c
|
naive model offloading support (handles automatically splitting parts of the model to requested device per memory constraints, either inferred or requested in the yaml, input tensors are automatically migrated to the right device, it SEEMS to work for training under the test trainer when split between GPU and CPU) (this was specifically only because that Flux imagegen model released so I can test it there)
|
2024-08-01 20:12:06 -05:00 |
|
|
387358bc8a
|
fixes for the NAR-len model, and documentation some config options, and a better way to handle resizing modules on state_dict load
|
2024-07-31 20:35:09 -05:00 |
|
|
52d13b321f
|
I rather have it default to non-strict loading instead so I can clean up YAMLs
|
2024-07-30 22:24:38 -05:00 |
|
|
d7c6be6f78
|
fix weird regression in handling checkpoints when backend is local, but deepspeed checkpoints are in (it was handled with LoRA loading but not real loading...)
|
2024-07-30 22:15:56 -05:00 |
|
|
07f8e2ad06
|
added option to set the causal size (how many tokens to sample per AR step), but requires the model to be trained for this (which explains why recurrent chunk sampling just doesn't work for the retnet tests, obvious in hindsight)
|
2024-07-30 20:53:51 -05:00 |
|
|
ebf848d249
|
possible speedup for samplers that require a list of previous tokens (the DRY sampler made me realize that I should copy the tolist() thing from the rep pen sampler for everything else)
|
2024-07-29 20:23:26 -05:00 |
|
|
55b0121b1a
|
trying (and failing) to nail a weird regression in fancier attentions
|
2024-07-29 19:53:37 -05:00 |
|
|
c2f5b916fc
|
added what I think is DRY sampling
|
2024-07-29 19:15:07 -05:00 |
|
|
ce8bb1e4f7
|
sanity cleanups with weird off-by-one-ness, cleaned up and validated vall_e.models.experimental works again
|
2024-07-27 15:36:05 -05:00 |
|
|
06e948aec1
|
suppress warning on exit about distributed not being cleaned up (because I updated my system)
|
2024-07-25 16:50:47 -05:00 |
|
|
682e4387dc
|
oops (fixed proms being erased from a config oversight)
|
2024-07-25 12:39:57 -05:00 |
|
|
1acb0e9c84
|
added experimental training setting to perform token dropout to MAYBE compensate for errors from the preceding RVQ level (two types: token error offset, token dropout embedding replace)
|
2024-07-24 19:35:17 -05:00 |
|
|
611a1c4bdc
|
might help
|
2024-07-22 20:57:01 -05:00 |
|
|
188d116222
|
some weird fixes for an equally weird regression with LoRA loading
|
2024-07-22 20:47:24 -05:00 |
|
|
e33c4b0cb1
|
oops
|
2024-07-22 19:38:39 -05:00 |
|
|
75b04686f8
|
added prom-less training / inferencing, some other things
|
2024-07-22 19:36:07 -05:00 |
|
|
491ae2a684
|
some insanity for sanity checks (some phonemes from phonemizing japanese are not in my tokenizer...)
|
2024-07-22 00:30:40 -05:00 |
|
|
ad024f400f
|
actually pass language into dataset process script, fix coercing japanese into hiragana because espeak does not like kanji
|
2024-07-21 23:21:37 -05:00 |
|
|
3e5ca3a201
|
more demo page tweaks
|
2024-07-21 19:31:13 -05:00 |
|
|
7366f36f81
|
oops
|
2024-07-21 19:17:25 -05:00 |
|
|
e19aa643a6
|
cleaned up demo page creation, added option to pass in RVQ level sampling distribution for training
|
2024-07-21 19:12:03 -05:00 |
|
|
ba7ee8c0ee
|
added demo link to readme
|
2024-07-19 21:22:30 -05:00 |
|
|
9ec88d9444
|
validated passing URI path for assets instead of base64 encoding them
|
2024-07-19 21:07:17 -05:00 |
|
|
d87b492295
|
added rudimentary demo page creator (currently just embeds base64 wavs into the page, need to test not doing that)
|
2024-07-19 20:49:40 -05:00 |
|
|
d53038a9e4
|
actually have split classifiers working
|
2024-07-19 15:33:31 -05:00 |
|
|
692d09f9c1
|
eval/validation fix for SpeechX tasks
|
2024-07-19 09:16:37 -05:00 |
|
|
28a674e0f1
|
fixes...
|
2024-07-18 23:25:32 -05:00 |
|