|
6ecdb715b6
|
more work on vall_e.cpp (some more cleanup, NAR-len demasking, but still need to iron out some kinks)
|
2024-12-23 17:20:04 -06:00 |
|
|
a6945f981d
|
vall_e.cpp cleanup (having to keep a map of something that can work without touching llama.cpp AND something minimally invasive, AND adhere to a C++ style that isn't mine, is making me bipolar)
|
2024-12-23 14:16:16 -06:00 |
|
|
497bdfc67b
|
more work (the wall is non-causal decoding......)
|
2024-12-22 20:11:31 -06:00 |
|
|
5f289db275
|
ugh
|
2024-12-22 16:15:24 -06:00 |
|
|
0d4329d2e3
|
sanity cleanup
|
2024-12-22 15:05:45 -06:00 |
|
|
353e478e68
|
agony
|
2024-12-21 22:52:10 -06:00 |
|
|
2542ed067d
|
ugh
|
2024-12-21 19:59:56 -06:00 |
|
|
70a0f5724b
|
i hate learning APIs so much
|
2024-12-21 19:40:19 -06:00 |
|
|
1b4a69ce29
|
more updates to vall_e.cpp
|
2024-12-21 19:16:44 -06:00 |
|
|
503124d0d3
|
crammed encodec.cpp in
|
2024-12-21 15:48:12 -06:00 |
|
|
979c1f797c
|
quant
|
2024-12-21 11:56:22 -06:00 |
|
|
5788db849b
|
added extremely barebones vall_e.cpp so I can stop having to juggle this file around so much
|
2024-12-21 10:57:02 -06:00 |
|
|
91caf00212
|
ugh
|
2024-12-20 17:13:37 -06:00 |
|
|
d85273609e
|
corrected export.py's --hf
|
2024-12-20 15:17:13 -06:00 |
|
|
59bf6b8b33
|
exposed additional task (ns, sr, vc) (vc is experimental)
|
2024-12-20 11:15:29 -06:00 |
|
|
53230efd74
|
changed prompt_inject_noise to prompt_inject_noise_p so I can have another reason to do this post-training
|
2024-12-19 19:28:50 -06:00 |
|
|
e7e7f48043
|
livid
|
2024-12-19 19:25:27 -06:00 |
|
|
8838babcba
|
sanity checks (and I realized that the model actually had langs set to 4 in the yaml for KO/ZH so................
|
2024-12-19 19:08:57 -06:00 |
|
|
7617b6485f
|
instead just compute a bunch of stuff on the transcriptions to store later in different names so I can just retrieve what I want, also added tongue twisters for nefarious reasons
|
2024-12-18 23:43:11 -06:00 |
|
|
4775edaa41
|
added text cleaning/normalization for wer purposes but it amounts to nothing desu
|
2024-12-18 19:58:53 -06:00 |
|
|
9f2bd7f6e4
|
ugh
|
2024-12-17 23:17:12 -06:00 |
|
|
9090c34f10
|
cringe script to process seed-tts-eval's eval dataset into something i can easily use
|
2024-12-17 22:47:12 -06:00 |
|
|
ed152f78df
|
tweaks to prompt duration to allow me to divorce how i use it for training with how I'm using it for the demo page, and demo page tweaks to make my life easier
|
2024-12-17 19:33:04 -06:00 |
|
|
7129582303
|
actually do proper wer/cer calculation by un-normalizing the scores
|
2024-12-17 14:22:30 -06:00 |
|
|
c2c6d912ac
|
actually do speaker verification
|
2024-12-17 10:11:14 -06:00 |
|
|
c2e17e287b
|
really shoddy voice conversion implementation (it sort of works...)
|
2024-12-16 22:54:53 -06:00 |
|
|
8515038968
|
imagine my disappointment when the epoch finished just for it to throw an exception
|
2024-12-16 18:28:01 -06:00 |
|
|
4a65ac9eb7
|
oops
|
2024-12-15 17:21:51 -06:00 |
|
|
cd4a5f427c
|
KO/ZH model soon
|
2024-12-15 17:01:14 -06:00 |
|
|
4800e7179a
|
remove nan checks because it causes problems in distributed training because I'm not syncing between GPUs (and nan losses gets ignored anyways with loss scaling)
|
2024-12-15 09:42:54 -06:00 |
|
|
2ba6b483dc
|
ugh
|
2024-12-14 22:43:51 -06:00 |
|
|
3dd31e74d1
|
finally figured out a clean way to handle "resuming" the tqdm bar
|
2024-12-14 18:44:43 -06:00 |
|
|
35389481ee
|
move lazy-stored ortho matrix to the grad device for apollo because agony
|
2024-12-13 23:22:26 -06:00 |
|
|
09804ecc16
|
APOLLO tweaks to make it work with deepspeed
|
2024-12-13 23:03:52 -06:00 |
|
|
64c67160a3
|
tweaks
|
2024-12-13 19:00:35 -06:00 |
|
|
0fbfb8bbe8
|
actually save the optimizer for the local engine backend because safetensors doesn't save it
|
2024-12-12 17:12:59 -06:00 |
|
|
f41251f648
|
more fixes for local engine backend
|
2024-12-12 14:38:42 -06:00 |
|
|
6b237ae5e3
|
tweaks for the local engine orchestrator (that I never caught since I always used the deepspeed backend)
|
2024-12-12 13:37:38 -06:00 |
|
|
9a62e3b824
|
APOLLO cringe (doesn't want to work with deepspeed)
|
2024-12-12 00:31:58 -06:00 |
|
|
cddf8ca814
|
sort batches to try and reduce number of padded tokens in batched inference (also commented out F5 samples getting added to the demo page because I would have to regenerate them)
|
2024-12-11 22:45:38 -06:00 |
|
|
20b87bfbd0
|
store metrics and only recalculate them if the output file is newer than the metrics file
|
2024-12-11 20:55:43 -06:00 |
|
|
0c69e798f7
|
template cleanup
|
2024-12-11 20:06:55 -06:00 |
|
|
7e54e897f7
|
also shifted to transformer's pipeline for transcribing
|
2024-12-11 19:57:53 -06:00 |
|
|
b81a98799b
|
uplifting transformer's WavLM stuff to do speaker verification instead
|
2024-12-11 19:30:05 -06:00 |
|
|
6468e5d124
|
lol
|
2024-12-11 19:10:32 -06:00 |
|
|
6f1ee0c6fa
|
Added CER, transcription/similarity model args in demo
|
2024-12-10 21:00:51 -06:00 |
|
|
8568a93dad
|
added WER/SIM-O metrics, added APOLLO but I need to test it
|
2024-12-10 20:13:21 -06:00 |
|
|
fc5e6d8599
|
fixes to process_emilia.py script
|
2024-12-09 14:38:09 -06:00 |
|
|
a6c745bafb
|
chinese (mandarin?) support added (I guess I don't need pinyin, but tone markers are handled), korean validated, vocab adjusted
|
2024-12-09 14:26:19 -06:00 |
|
|
3ef8894290
|
oops
|
2024-12-08 15:24:21 -06:00 |
|