|
e5f9da2221
|
oops
|
2025-01-21 11:59:24 -06:00 |
|
|
69c1d2991f
|
updated mixtral backend (need this for something else)
|
2025-01-20 21:50:56 -06:00 |
|
|
1a26f789a5
|
added option to playback audio directly, removed no-phonemize option since I swear it worked in testing but it doesn't actually work
|
2025-01-12 21:52:49 -06:00 |
|
|
9fa87c417a
|
added option to use raw text rather than the IPA phonemes (it requires a model trained on raw text)
|
2025-01-06 00:10:43 -06:00 |
|
|
3ab11bdc7b
|
oops
|
2025-01-05 23:53:17 -06:00 |
|
|
b445f4abb6
|
experimental
|
2025-01-05 19:05:00 -06:00 |
|
|
2e6a7625e4
|
experimental
|
2025-01-05 12:47:03 -06:00 |
|
|
31cfef59c4
|
when you do more training thinking the original model that can do NS/SR got deleted but it was actually a string not having its quotes in the right place.......
|
2024-12-27 18:16:57 -06:00 |
|
|
9b0d2ccbe1
|
|
2024-12-26 21:42:17 -06:00 |
|
|
25a02f2c3f
|
oops
|
2024-12-25 00:36:19 -06:00 |
|
|
b9d2cd5513
|
vall_e.cpp cli
|
2024-12-25 00:28:34 -06:00 |
|
|
59f56ad099
|
cleaup
|
2024-12-24 23:14:32 -06:00 |
|
|
6bf59bbd8b
|
vall_e.cpp phonemizing and tokenizing
|
2024-12-24 22:39:32 -06:00 |
|
|
8516bab15c
|
cleanup
|
2024-12-24 20:29:03 -06:00 |
|
|
82e8592f2a
|
working vall_e.cpp
|
2024-12-24 17:54:48 -06:00 |
|
|
2b4d783299
|
ugh
|
2024-12-23 23:42:44 -06:00 |
|
|
532200de2a
|
nvm fixed
|
2024-12-23 22:23:43 -06:00 |
|
|
f62f99b8de
|
more work on vall_e.cpp (need to resolve why the embeddings (and maybe the weights as a whole) are different from the base model)
|
2024-12-23 20:36:40 -06:00 |
|
|
6ecdb715b6
|
more work on vall_e.cpp (some more cleanup, NAR-len demasking, but still need to iron out some kinks)
|
2024-12-23 17:20:04 -06:00 |
|
|
a6945f981d
|
vall_e.cpp cleanup (having to keep a map of something that can work without touching llama.cpp AND something minimally invasive, AND adhere to a C++ style that isn't mine, is making me bipolar)
|
2024-12-23 14:16:16 -06:00 |
|
|
497bdfc67b
|
more work (the wall is non-causal decoding......)
|
2024-12-22 20:11:31 -06:00 |
|
|
5f289db275
|
ugh
|
2024-12-22 16:15:24 -06:00 |
|
|
0d4329d2e3
|
sanity cleanup
|
2024-12-22 15:05:45 -06:00 |
|
|
353e478e68
|
agony
|
2024-12-21 22:52:10 -06:00 |
|
|
2542ed067d
|
ugh
|
2024-12-21 19:59:56 -06:00 |
|
|
70a0f5724b
|
i hate learning APIs so much
|
2024-12-21 19:40:19 -06:00 |
|
|
1b4a69ce29
|
more updates to vall_e.cpp
|
2024-12-21 19:16:44 -06:00 |
|
|
503124d0d3
|
crammed encodec.cpp in
|
2024-12-21 15:48:12 -06:00 |
|
|
979c1f797c
|
quant
|
2024-12-21 11:56:22 -06:00 |
|
|
5788db849b
|
added extremely barebones vall_e.cpp so I can stop having to juggle this file around so much
|
2024-12-21 10:57:02 -06:00 |
|
|
91caf00212
|
ugh
|
2024-12-20 17:13:37 -06:00 |
|
|
d85273609e
|
corrected export.py's --hf
|
2024-12-20 15:17:13 -06:00 |
|
|
59bf6b8b33
|
exposed additional task (ns, sr, vc) (vc is experimental)
|
2024-12-20 11:15:29 -06:00 |
|
|
53230efd74
|
changed prompt_inject_noise to prompt_inject_noise_p so I can have another reason to do this post-training
|
2024-12-19 19:28:50 -06:00 |
|
|
e7e7f48043
|
livid
|
2024-12-19 19:25:27 -06:00 |
|
|
8838babcba
|
sanity checks (and I realized that the model actually had langs set to 4 in the yaml for KO/ZH so................
|
2024-12-19 19:08:57 -06:00 |
|
|
7617b6485f
|
instead just compute a bunch of stuff on the transcriptions to store later in different names so I can just retrieve what I want, also added tongue twisters for nefarious reasons
|
2024-12-18 23:43:11 -06:00 |
|
|
4775edaa41
|
added text cleaning/normalization for wer purposes but it amounts to nothing desu
|
2024-12-18 19:58:53 -06:00 |
|
|
9f2bd7f6e4
|
ugh
|
2024-12-17 23:17:12 -06:00 |
|
|
9090c34f10
|
cringe script to process seed-tts-eval's eval dataset into something i can easily use
|
2024-12-17 22:47:12 -06:00 |
|
|
ed152f78df
|
tweaks to prompt duration to allow me to divorce how i use it for training with how I'm using it for the demo page, and demo page tweaks to make my life easier
|
2024-12-17 19:33:04 -06:00 |
|
|
7129582303
|
actually do proper wer/cer calculation by un-normalizing the scores
|
2024-12-17 14:22:30 -06:00 |
|
|
c2c6d912ac
|
actually do speaker verification
|
2024-12-17 10:11:14 -06:00 |
|
|
c2e17e287b
|
really shoddy voice conversion implementation (it sort of works...)
|
2024-12-16 22:54:53 -06:00 |
|
|
8515038968
|
imagine my disappointment when the epoch finished just for it to throw an exception
|
2024-12-16 18:28:01 -06:00 |
|
|
4a65ac9eb7
|
oops
|
2024-12-15 17:21:51 -06:00 |
|
|
cd4a5f427c
|
KO/ZH model soon
|
2024-12-15 17:01:14 -06:00 |
|
|
4800e7179a
|
remove nan checks because it causes problems in distributed training because I'm not syncing between GPUs (and nan losses gets ignored anyways with loss scaling)
|
2024-12-15 09:42:54 -06:00 |
|
|
2ba6b483dc
|
ugh
|
2024-12-14 22:43:51 -06:00 |
|
|
3dd31e74d1
|
finally figured out a clean way to handle "resuming" the tqdm bar
|
2024-12-14 18:44:43 -06:00 |
|