|
4a8e3ccf06
|
README tweaks, added --input-prompt-prefix as an experiment (its literally better to just not do this, but i'll retain it in case i have a revelation on how to improve it)
|
2024-10-04 18:57:19 -05:00 |
|
|
9da630f73a
|
swap order of demo entries, as the model prioritizes adhering to the speaker prompt more (instead of trying to match the ground truth magically)
|
2024-09-25 23:31:24 -05:00 |
|
|
31e8b7edb8
|
tweaks and fixes for lora stuffs
|
2024-09-08 18:05:21 -05:00 |
|
|
5d66a7db52
|
webui cleanup, more tweaks, default to safetensors in config
|
2024-09-07 21:45:05 -05:00 |
|
|
32287710a2
|
moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)
|
2024-08-29 13:27:16 -05:00 |
|
|
0d706ec6a1
|
added fused_attn (triton-based fused attention) and simply just query for flash_attn under rocm
|
2024-08-26 19:13:34 -05:00 |
|
|
d636edd3a2
|
added flash_attn LlamaAttention (including flash_attn==1.0.9)
|
2024-08-18 20:51:14 -05:00 |
|
|
eac353cd0b
|
busy work and cleanup while I wait for 1TB of audio to quantize... again.
|
2024-08-06 20:23:33 -05:00 |
|
|
d19f93a2c0
|
documentation update
|
2024-08-04 00:14:49 -05:00 |
|
|
11fa3da665
|
some cleanup, fixed the wrapper attention to explicitly use other sdpa backends
|
2024-08-03 19:51:00 -05:00 |
|
|
ebf848d249
|
possible speedup for samplers that require a list of previous tokens (the DRY sampler made me realize that I should copy the tolist() thing from the rep pen sampler for everything else)
|
2024-07-29 20:23:26 -05:00 |
|
|
55b0121b1a
|
trying (and failing) to nail a weird regression in fancier attentions
|
2024-07-29 19:53:37 -05:00 |
|
|
ba7ee8c0ee
|
added demo link to readme
|
2024-07-19 21:22:30 -05:00 |
|
|
3acc54df22
|
allow loading a different model within the web ui (apparently I did not have the web UI in the documentation)
|
2024-07-15 19:59:48 -05:00 |
|
|
7b210d9738
|
sanity cleanup
|
2024-07-04 15:58:08 -05:00 |
|
|
bc2a6fa756
|
sanity cleanup: moved experimental features under its own thing
|
2024-06-30 10:37:33 -05:00 |
|
|
dd40463803
|
limit eval size because the training batch size seems to be used for the eval dataloader, somehow (bandaid)
|
2024-06-29 09:11:28 -05:00 |
|
|
5176ced35f
|
readme tweaks
|
2024-06-28 21:02:54 -05:00 |
|
|
7cfb78fa64
|
enable LoRA for targetted RVQ levels (to experiment with, seems to help)
|
2024-06-17 21:45:03 -05:00 |
|
|
da8242d086
|
finally got around to removing omegaconf
|
2024-06-07 20:23:53 -05:00 |
|
|
880b4ecd1b
|
cleanup, putting some thoughts in comments before I forget about them
|
2024-06-05 19:50:06 -05:00 |
|
|
ddbacde0d1
|
DAC just doesn't work well enough......
|
2024-05-25 11:07:52 -05:00 |
|
|
230da8b559
|
should be the final things to scramble around for, DAC's 24KHz model is unusable for this, but both encodec's 24KHz and DAC's 44KHz work
|
2024-05-12 13:22:08 -05:00 |
|
|
2437a86efa
|
ugh
|
2024-05-12 13:02:15 -05:00 |
|
|
71e373064f
|
remove redundant loss, tweak readme
|
2024-05-11 15:02:47 -05:00 |
|
|
33b7f81b94
|
small cleanups
|
2024-05-04 22:37:22 -05:00 |
|
|
8aa1b2dabf
|
documentation update
|
2024-05-04 21:03:46 -05:00 |
|
|
277dcec484
|
apparently I got an error for trying to serialize an errant tensor that made its way into the json, this could be remedied easily with recursively traversing the dict and coercing any objects to primitives, but I'm tired and I just want to start training and nap
|
2024-05-04 12:33:43 -05:00 |
|
|
545162195b
|
deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things
|
2024-04-15 19:54:32 -05:00 |
|
|
35d78a2bb0
|
Yet Another Underlying Transformer Implementation (BitNet, will give it a few days to see how it fares)
|
2024-02-29 20:29:17 -06:00 |
|
|
0db3203b21
|
added LLaMA/Mixtral (if experts>1) model arches, utilize XMoE's loss as well, set MoE frequency to 1 to make every layer MoE'd for RetNet, etc. (going to do tests without burning out again to see how things go)
|
2023-12-22 19:27:36 -06:00 |
|
|
6c51a629cc
|
resetting step count resets the samples processed and other metrics
|
2023-10-29 12:11:19 -05:00 |
|
|
fb467b19ba
|
exposed rolling resp context to the web UI, added passing in language to inferencing command line
|
2023-10-12 23:21:01 -05:00 |
|
|
99e980d323
|
documentation and more better-er attribution
|
2023-10-10 17:15:16 -05:00 |
|
|
1fd91b6437
|
cleanup
|
2023-10-06 10:13:54 -05:00 |
|
|
d12877ee09
|
added option to set probability of selecting the AR during training under a monolithic AR+NAR, added some more to-dos while I have them in mind
|
2023-10-02 16:52:42 -05:00 |
|
|
4abd6564d1
|
fixed training stats not loading from exported weights, a bit of a readme cleanup, updated example training yaml
|
2023-09-23 19:59:00 -05:00 |
|
|
a6bfe43590
|
added mirostat sampling (given a partially trained model, it got far decent output than I expected, need to test on a better trained model)
|
2023-09-18 18:55:41 -05:00 |
|
|
23a5fdd645
|
implemented a naive beam search (I really should be taking a break)
|
2023-09-12 21:28:07 -05:00 |
|
|
5ac119a6e7
|
added light web UI (need to port the telemetry disabling bandaids from aivc)
|
2023-09-09 16:17:20 -05:00 |
|
|
10c34c5b98
|
added a length-based decay factor for repetition penalty
|
2023-09-08 21:02:00 -05:00 |
|
|
b922f35b6b
|
added documentation on how these new sampling parameters are very iffy and you really need to know what you are doing to use them because this is audio generation and not text generation
|
2023-09-08 20:43:36 -05:00 |
|
|
4613781e23
|
integrated plot script, added tts-c task token to help the model be able to mix between normal VALL-E and VALL-E continuous
|
2023-09-02 16:29:53 -05:00 |
|
|
f7e942ec99
|
modified plotting script to be more agnostic to X
|
2023-09-02 13:59:43 -05:00 |
|
|
6455a2f9d7
|
I think I fixed a bug?
|
2023-08-24 23:33:36 -05:00 |
|
|
f3fbed5ffd
|
updated notices tailored for windows / low VRAM cards
|
2023-08-24 17:19:10 -05:00 |
|
|
00ad4af651
|
updated draconian requirement for espeak-ng to be installed and the env var set to the dll for Windows
|
2023-08-24 14:57:01 -05:00 |
|
|
9c5a33bfd2
|
added repo with my weights so far
|
2023-08-22 13:09:44 -05:00 |
|
|
f7f6d3bf6d
|
validated that SpeechX tasks cse and nse works, added a method to test each task by invoking python3 -m vall_e.data --action=tasks --tasks='sr,se,cse,nse'
|
2023-08-19 09:50:07 -05:00 |
|
|
0b46c1e312
|
god I am inexperienced with retaining compat from previous weights, I hope no one actually has weights
|
2023-08-18 21:29:20 -05:00 |
|