|
ff7a1b4163
|
coerce into path for other sampler_types (it's required for sampling for similar utterances)
|
2024-09-26 18:37:56 -05:00 |
|
|
f24547ad4e
|
add top_k sampling / offset for prompt similar utterance sampling
|
2024-09-26 16:26:40 -05:00 |
|
|
9da630f73a
|
swap order of demo entries, as the model prioritizes adhering to the speaker prompt more (instead of trying to match the ground truth magically)
|
2024-09-25 23:31:24 -05:00 |
|
|
e84d466261
|
vall_e.plot tweaks
|
2024-09-24 20:05:10 -05:00 |
|
|
2266d34818
|
oops
|
2024-09-21 16:06:01 -05:00 |
|
|
c5e9142863
|
added option to retokenize phonemes for hdf5 (to save having to remake my hdf5 file)
|
2024-09-21 13:08:01 -05:00 |
|
|
536c11c4ac
|
actually validated and fixed sampling similar utterances for the prompt (hopefully nothing else is needed)
|
2024-09-21 12:59:51 -05:00 |
|
|
d31f27119a
|
regex replace out the (lang) markers in espeak, updated tokenizer vocab as lazily as possible to not have unk tokens
|
2024-09-21 12:29:28 -05:00 |
|
|
769f67dcfe
|
actually fix validation of phonemes in the symmap
|
2024-09-21 12:19:34 -05:00 |
|
|
c8d4716a9f
|
ugh
|
2024-09-18 21:40:57 -05:00 |
|
|
fe241f6a99
|
support for wildcard in training/validation/noise dataset array (to-do: a better way to query between metadata folder and data folder)
|
2024-09-18 21:34:43 -05:00 |
|
|
b5bec0c9ce
|
oops, turns out these are not split by speaker names already........ (also added sampling the dataset in the webui for easy viewing)
|
2024-09-18 20:19:46 -05:00 |
|
|
fa9d3f6c06
|
lang fixes / reworked phoneme symmap validation
|
2024-09-18 19:36:03 -05:00 |
|
|
84647f588a
|
more tweaks
|
2024-09-18 16:43:57 -05:00 |
|
|
ebac1db16c
|
maybe final tweaks, I really needed to unify my json read/write and orjson is proven to be fast enough for me to try and rely on it more
|
2024-09-17 22:57:04 -05:00 |
|
|
6ceed866b5
|
*faster*
|
2024-09-17 22:44:36 -05:00 |
|
|
f00283440c
|
faster
|
2024-09-17 22:26:31 -05:00 |
|
|
be22b65300
|
solved my problem
|
2024-09-17 21:58:44 -05:00 |
|
|
8f41d1b324
|
more tweaks
|
2024-09-17 16:26:30 -05:00 |
|
|
804ddb5182
|
optimizations (6 hours to do cosine similarities on a speaker set of just 17k utterances................)
|
2024-09-17 15:51:45 -05:00 |
|
|
a9fbe81f98
|
oops
|
2024-09-17 15:25:12 -05:00 |
|
|
c440c4fe7e
|
relegated processing similarity data into vall_e.emb.similarity since it's easier, seems to work?
|
2024-09-17 14:37:21 -05:00 |
|
|
56f25f7a9b
|
more stuff for similar-speaker prompt sampling (to-do: actually test if this works...)
|
2024-09-16 23:10:29 -05:00 |
|
|
69f140ba45
|
fix oversight with phonemizing french because espeak defines french as fr-fr instead of fr (even though spain spanish is es and not es-sp or some shit, but portugal portuguese is pt-pt)
|
2024-09-13 12:53:36 -05:00 |
|
|
4f3c7a37c8
|
also do text similarities (dont know what use I'll have for this)
|
2024-09-10 16:45:59 -05:00 |
|
|
1c615a0f52
|
helper script (vall_e.emb.similar) to figure out the best way to compute similarity scores for audio (iunno how to go about it desu)
|
2024-09-10 16:34:23 -05:00 |
|
|
17487ad70a
|
weird quirk in process_emilia.py where language gets mutated, somehow (I hate python)
|
2024-09-10 14:00:27 -05:00 |
|
|
d059f6f56d
|
added helper script to process Emilia (amphion/Emilia-Dataset), clean up espeak phonemes for non-English transcriptions with English words (because for some reason espeak injects (en){word}(lang) markers and it's annoying)
|
2024-09-09 09:57:32 -05:00 |
|
|
31e8b7edb8
|
tweaks and fixes for lora stuffs
|
2024-09-08 18:05:21 -05:00 |
|
|
54203c059d
|
validated rep pen for STT (sometimes needed to wrangle the model)
|
2024-09-08 08:30:30 -05:00 |
|
|
6a967f91b9
|
oops
|
2024-09-07 22:13:49 -05:00 |
|
|
5d66a7db52
|
webui cleanup, more tweaks, default to safetensors in config
|
2024-09-07 21:45:05 -05:00 |
|
|
a6ad0577b8
|
cleanup the resultant text from STT
|
2024-09-06 18:44:25 -05:00 |
|
|
fa93061b3e
|
more fixes, moved sampler state dict to a better place, eval works again
|
2024-09-06 16:59:56 -05:00 |
|
|
4bd9bb39c8
|
webui for STT (still need to bake the model to handle it better, a few hours so far has it generate what looks like a normal transcription but does not correlate to the audio right now)
|
2024-09-06 15:13:04 -05:00 |
|
|
d33a906119
|
cleanup for AR_NAR inferencing to allow both TTS and STT tasks simultaneously (need to have training eval do this to though)
|
2024-09-06 14:30:12 -05:00 |
|
|
341e19162b
|
fixes, again
|
2024-09-06 11:41:41 -05:00 |
|
|
94cf81d38c
|
tweak
|
2024-09-05 23:21:18 -05:00 |
|
|
413097f5f7
|
fixes
|
2024-09-05 21:42:59 -05:00 |
|
|
54547b74d8
|
experimental implementation of STT (need to actually test on a model, test trainer seems to work)
|
2024-09-05 20:43:20 -05:00 |
|
|
d319d33368
|
haha
|
2024-09-04 14:52:26 -05:00 |
|
|
619369236b
|
ugh
|
2024-08-30 21:10:57 -05:00 |
|
|
168e203942
|
ugh
|
2024-08-30 14:39:07 -05:00 |
|
|
685f4faec0
|
ugh
|
2024-08-30 10:46:26 -05:00 |
|
|
32287710a2
|
moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)
|
2024-08-29 13:27:16 -05:00 |
|
|
d423bc03c2
|
fixed attentions for MoE
|
2024-08-27 17:02:42 -05:00 |
|
|
b7b99a25f1
|
added ability to specify attention backend for CLI and webui (because im tired of editing the yaml)
|
2024-08-26 19:33:51 -05:00 |
|
|
0d706ec6a1
|
added fused_attn (triton-based fused attention) and simply just query for flash_attn under rocm
|
2024-08-26 19:13:34 -05:00 |
|
|
6b0891448c
|
pain (some shit to try and get some flash attention for ROCm (gfx1100) through triton fused attention but no good)
|
2024-08-25 20:07:27 -05:00 |
|
|
40e1799adc
|
fixed xformers and flash_attn to actually work now
|
2024-08-19 01:03:35 -05:00 |
|