|
54547b74d8
|
experimental implementation of STT (need to actually test on a model, test trainer seems to work)
|
2024-09-05 20:43:20 -05:00 |
|
|
32287710a2
|
moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)
|
2024-08-29 13:27:16 -05:00 |
|
|
6a733eb2ed
|
changed torch.Tensor().to(device, dtype) to just torch.tensor(..., device, dtype) because it's been bothering my autism that I'm creating tensors then converting rather than creating with the right device/dtype, some 'optimization' to compile the model but it doesnt seem to do anything useful
|
2024-08-03 22:10:21 -05:00 |
|
|
11fa3da665
|
some cleanup, fixed the wrapper attention to explicitly use other sdpa backends
|
2024-08-03 19:51:00 -05:00 |
|
|
66407e5bdb
|
tweaks for the NAR-len model, maybe
|
2024-08-03 08:40:39 -05:00 |
|
|
97c5241bef
|
fixes, throw an exception when using NAR only model with non-unified position IDs, since for some reason it outputs garbage for the NAR
|
2024-08-02 22:25:49 -05:00 |
|
|
387358bc8a
|
fixes for the NAR-len model, and documentation some config options, and a better way to handle resizing modules on state_dict load
|
2024-07-31 20:35:09 -05:00 |
|
|
ce8bb1e4f7
|
sanity cleanups with weird off-by-one-ness, cleaned up and validated vall_e.models.experimental works again
|
2024-07-27 15:36:05 -05:00 |
|
|
e19aa643a6
|
cleaned up demo page creation, added option to pass in RVQ level sampling distribution for training
|
2024-07-21 19:12:03 -05:00 |
|
|
d87b492295
|
added rudimentary demo page creator (currently just embeds base64 wavs into the page, need to test not doing that)
|
2024-07-19 20:49:40 -05:00 |
|
|
97e768601c
|
re-introducing SpeechX tasks (need to validate them all, everything works with base tts anyways)
|
2024-07-18 16:16:14 -05:00 |
|
|
3acc54df22
|
allow loading a different model within the web ui (apparently I did not have the web UI in the documentation)
|
2024-07-15 19:59:48 -05:00 |
|
|
7b210d9738
|
sanity cleanup
|
2024-07-04 15:58:08 -05:00 |
|
|
dced595391
|
more cleanup
|
2024-06-30 11:00:12 -05:00 |
|
|
bc2a6fa756
|
sanity cleanup: moved experimental features under its own thing
|
2024-06-30 10:37:33 -05:00 |
|
|
bcf3910a17
|
the NAR only dream is dead (it just won't work)
|
2024-06-12 19:49:47 -05:00 |
|
|
a9353cf9fa
|
ugh
|
2024-06-12 00:14:29 -05:00 |
|
|
cca542a4c0
|
ugh
|
2024-06-11 23:59:28 -05:00 |
|
|
132a02c48b
|
sanity cleanup, backup config yaml for each log file
|
2024-06-09 11:22:52 -05:00 |
|
|
8d068fa3f9
|
reticulating splines
|
2024-06-08 20:30:15 -05:00 |
|
|
545162195b
|
deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things
|
2024-04-15 19:54:32 -05:00 |
|
|
08bae355eb
|
actually use langs from the dataloader
|
2023-10-11 21:21:50 -05:00 |
|
|
8740cdefc6
|
added initial support for languages (still testing, marked as model version 3), added experimental 'context extend by limiting the resp context' (untested)
|
2023-10-11 20:38:40 -05:00 |
|
|
7facacf7c9
|
separated samplers into its own file, don't bother copying the logits back to the GPU after sampling, it's not necessary
|
2023-10-11 12:25:31 -05:00 |
|
|
e727b6e5c1
|
changed dynamic temperature trigger to be a min-(n)ar-temp value between [0,(n)ar-temp), flags to set min temp, checkbox in web UI to request it
|
2023-10-10 17:02:33 -05:00 |
|
|
87db03dd93
|
trim the input prompt to 3 seconds when training NAR tasks (marked as experimental; the paper mentions doing so, but I don't know how much this would harm the retention heads)
|
2023-10-09 22:03:58 -05:00 |
|
|
c0b25541e3
|
restructured some things with the model to remove dead weights
|
2023-09-20 19:10:59 -05:00 |
|
|
a6bfe43590
|
added mirostat sampling (given a partially trained model, it got far decent output than I expected, need to test on a better trained model)
|
2023-09-18 18:55:41 -05:00 |
|
|
2567e082b5
|
UGH
|
2023-09-16 00:26:13 -05:00 |
|
|
23a5fdd645
|
implemented a naive beam search (I really should be taking a break)
|
2023-09-12 21:28:07 -05:00 |
|
|
40ef34e1ca
|
this embedding class definitely works, and migrating from the previous embedding weights seems to work.
|
2023-09-11 14:13:42 -05:00 |
|
|
a1f250ffac
|
set default max_levels for NAR to 0 and implicitly set it to max resps levels because the previous way was implicitly assuming all models were outputting at 1+7 RVQ bins.
|
2023-09-10 20:33:33 -05:00 |
|
|
ba71020318
|
added option to limit (or exceed) inferenced RVQ-bin levels through the NAR
|
2023-09-10 13:50:13 -05:00 |
|
|
10c34c5b98
|
added a length-based decay factor for repetition penalty
|
2023-09-08 21:02:00 -05:00 |
|
|
14c78bae39
|
added lots of sampling options (top-k/top-p, repetition penalty, length penalty)
|
2023-09-08 20:30:54 -05:00 |
|
|
ab5134f385
|
tweaks and fixes
|
2023-09-07 17:08:38 -05:00 |
|
|
b2c2dec291
|
added homebrewed per-RVQ-bin embedding solutions
|
2023-09-07 16:48:02 -05:00 |
|
|
e7a67410d1
|
oops
|
2023-09-07 09:14:03 -05:00 |
|
|
7ce06432fd
|
fixed the AR+NAR dual model, the resp_emb has to be split up (classifier might too)
|
2023-09-06 19:33:39 -05:00 |
|
|
100ca6b7d0
|
added option to use SGD optimizer through the YAML, added option to pass in additional optimizer parameters through the YAML, added experimental unified AR+NAR model (does not seem fruitful in testing)
|
2023-09-06 18:58:35 -05:00 |
|
|
2f9cd0842f
|
merged dedicated interleaved AR code with the normal AR code
|
2023-09-03 22:46:08 -05:00 |
|
|
8a6c203277
|
added per-speaker samplers
|
2023-09-03 21:27:13 -05:00 |
|
|
2f06166ddd
|
cleanups
|
2023-09-01 21:33:51 -05:00 |
|
|
e40c0d34a0
|
somewhat got recurrent forward working (it's as accurate as chunkwise forward: it's not accurate at all), added option to use AMP instead of blanket setting the weight's dtype
|
2023-09-01 20:58:29 -05:00 |
|
|
2bc2d08b09
|
(need to verify) added modifying model size and config bool to align with VALL-E continuous' methodology
|
2023-09-01 17:19:34 -05:00 |
|
|
165a1154e0
|
Undo naive=False test flag, this shouldn't have made its way in
|
2023-08-26 22:00:43 -05:00 |
|
|
2d1a9f10c0
|
nightmare of spaghetti that might break compat; mechanism to increase RVQ bins of an existing model without retraining, keeps sampled proms/resps at max RVQ level and trim off excess levels according to what model receives them, some other things I already forgot (I really hope no one else has weights being baked right now)
|
2023-08-19 15:06:33 -05:00 |
|
|
2a71486cb6
|
preparing for SpeechX extensions
|
2023-08-18 20:58:07 -05:00 |
|
|
c85101403f
|
big cleanup
|
2023-08-03 20:26:36 -05:00 |
|
|
7a06b27a9c
|
Tweaks
|
2023-08-02 22:06:39 +00:00 |
|