|
e7e7f48043
|
livid
|
2024-12-19 19:25:27 -06:00 |
|
|
8838babcba
|
sanity checks (and I realized that the model actually had langs set to 4 in the yaml for KO/ZH so................
|
2024-12-19 19:08:57 -06:00 |
|
|
7617b6485f
|
instead just compute a bunch of stuff on the transcriptions to store later in different names so I can just retrieve what I want, also added tongue twisters for nefarious reasons
|
2024-12-18 23:43:11 -06:00 |
|
|
4775edaa41
|
added text cleaning/normalization for wer purposes but it amounts to nothing desu
|
2024-12-18 19:58:53 -06:00 |
|
|
9f2bd7f6e4
|
ugh
|
2024-12-17 23:17:12 -06:00 |
|
|
9090c34f10
|
cringe script to process seed-tts-eval's eval dataset into something i can easily use
|
2024-12-17 22:47:12 -06:00 |
|
|
ed152f78df
|
tweaks to prompt duration to allow me to divorce how i use it for training with how I'm using it for the demo page, and demo page tweaks to make my life easier
|
2024-12-17 19:33:04 -06:00 |
|
|
7129582303
|
actually do proper wer/cer calculation by un-normalizing the scores
|
2024-12-17 14:22:30 -06:00 |
|
|
c2c6d912ac
|
actually do speaker verification
|
2024-12-17 10:11:14 -06:00 |
|
|
c2e17e287b
|
really shoddy voice conversion implementation (it sort of works...)
|
2024-12-16 22:54:53 -06:00 |
|
|
8515038968
|
imagine my disappointment when the epoch finished just for it to throw an exception
|
2024-12-16 18:28:01 -06:00 |
|
|
4a65ac9eb7
|
oops
|
2024-12-15 17:21:51 -06:00 |
|
|
cd4a5f427c
|
KO/ZH model soon
|
2024-12-15 17:01:14 -06:00 |
|
|
4800e7179a
|
remove nan checks because it causes problems in distributed training because I'm not syncing between GPUs (and nan losses gets ignored anyways with loss scaling)
|
2024-12-15 09:42:54 -06:00 |
|
|
2ba6b483dc
|
ugh
|
2024-12-14 22:43:51 -06:00 |
|
|
3dd31e74d1
|
finally figured out a clean way to handle "resuming" the tqdm bar
|
2024-12-14 18:44:43 -06:00 |
|
|
35389481ee
|
move lazy-stored ortho matrix to the grad device for apollo because agony
|
2024-12-13 23:22:26 -06:00 |
|
|
09804ecc16
|
APOLLO tweaks to make it work with deepspeed
|
2024-12-13 23:03:52 -06:00 |
|
|
64c67160a3
|
tweaks
|
2024-12-13 19:00:35 -06:00 |
|
|
0fbfb8bbe8
|
actually save the optimizer for the local engine backend because safetensors doesn't save it
|
2024-12-12 17:12:59 -06:00 |
|
|
f41251f648
|
more fixes for local engine backend
|
2024-12-12 14:38:42 -06:00 |
|
|
6b237ae5e3
|
tweaks for the local engine orchestrator (that I never caught since I always used the deepspeed backend)
|
2024-12-12 13:37:38 -06:00 |
|
|
9a62e3b824
|
APOLLO cringe (doesn't want to work with deepspeed)
|
2024-12-12 00:31:58 -06:00 |
|
|
cddf8ca814
|
sort batches to try and reduce number of padded tokens in batched inference (also commented out F5 samples getting added to the demo page because I would have to regenerate them)
|
2024-12-11 22:45:38 -06:00 |
|
|
20b87bfbd0
|
store metrics and only recalculate them if the output file is newer than the metrics file
|
2024-12-11 20:55:43 -06:00 |
|
|
0c69e798f7
|
template cleanup
|
2024-12-11 20:06:55 -06:00 |
|
|
7e54e897f7
|
also shifted to transformer's pipeline for transcribing
|
2024-12-11 19:57:53 -06:00 |
|
|
b81a98799b
|
uplifting transformer's WavLM stuff to do speaker verification instead
|
2024-12-11 19:30:05 -06:00 |
|
|
6468e5d124
|
lol
|
2024-12-11 19:10:32 -06:00 |
|
|
6f1ee0c6fa
|
Added CER, transcription/similarity model args in demo
|
2024-12-10 21:00:51 -06:00 |
|
|
8568a93dad
|
added WER/SIM-O metrics, added APOLLO but I need to test it
|
2024-12-10 20:13:21 -06:00 |
|
|
fc5e6d8599
|
fixes to process_emilia.py script
|
2024-12-09 14:38:09 -06:00 |
|
|
a6c745bafb
|
chinese (mandarin?) support added (I guess I don't need pinyin, but tone markers are handled), korean validated, vocab adjusted
|
2024-12-09 14:26:19 -06:00 |
|
|
3ef8894290
|
oops
|
2024-12-08 15:24:21 -06:00 |
|
|
1d460b9fe3
|
logic fixes, I feel like output is better? (also NAR can have a temperature, I imagine it couldn't because it was having a causal masked passed to it for the longest time before I caught it a month ago)
|
2024-12-08 14:52:47 -06:00 |
|
|
0c5a458b00
|
deduce language per line to allow for a cheap way to allow for cross-lingual switching, kinda
|
2024-12-07 22:57:29 -06:00 |
|
|
a032ff588f
|
doc update, added automatically deducing language from a given text, also checks if the input is already phonemized text to allow direct control without being cringe (procrastinating adding WER/SIM-O)
|
2024-12-07 22:34:25 -06:00 |
|
|
5d80a2d0d4
|
fixed NAR-len issues with non-english maybe (langs weren't being passed), added interface to inference in batches through tts.batched_inference (no support for rolling context/prefixes because there's no way to do that), demo page uses batched inferencing now
|
2024-12-07 19:21:05 -06:00 |
|
|
1f54bf5b40
|
revert sageattn back to optional dependency because it's not on windows, force resize_modules on by default because I broke something
|
2024-12-07 17:09:39 -06:00 |
|
|
218d0e29fd
|
ugh (batchmean actually expects batch=seq_len, and not the actual batch)
|
2024-12-07 12:39:01 -06:00 |
|
|
61ed662856
|
ACTUALLY actually fix KD-loss (the -inf in the logits was caused by cringecode)
|
2024-12-07 12:31:54 -06:00 |
|
|
f97e8b0c7f
|
ACTUALLY do KD-loss because of an oversight with masked_select outputting 1D tensors that get softmax'd in total
|
2024-12-07 09:52:51 -06:00 |
|
|
34a66e1052
|
agnostified KD
|
2024-12-06 23:53:46 -06:00 |
|
|
953d3eb030
|
ugh
|
2024-12-06 22:35:30 -06:00 |
|
|
42fafbaaca
|
actually fixed knowledge distillation because of errant -inf logits causing problems and needed to be filtered (and splitting text language / output audio language because it helps)
|
2024-12-06 21:55:20 -06:00 |
|
|
23d402bf01
|
added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......)
|
2024-12-05 23:05:52 -06:00 |
|
|
4e21df8092
|
oops
|
2024-12-04 21:24:22 -06:00 |
|
|
c66a53492c
|
forgot to add NTLK as a dependency, promoted sageattn as a default dependency since it works fine enough and seems agnostic
|
2024-12-04 20:33:25 -06:00 |
|
|
93d27be539
|
rolling context finally (use last N utterances as the prefix for the next gen), option to split input text prompt by sentences instead of lines (or no splitting)
|
2024-12-04 20:31:44 -06:00 |
|
|
9dff68c0c5
|
NAR-len tweaks (remasks a small amount of tokens per step, it seems to help with reducing the number of steps needed some of the time?, disable CFG for the first half to speed things up)
|
2024-12-04 09:30:29 -06:00 |
|