Commit Graph

102 Commits

Author SHA1 Message Date
mrq
c0b46b82eb tweaks 2025-02-10 21:48:29 -06:00
mrq
d6a679ca5c tweaks 2025-02-10 20:53:08 -06:00
mrq
276a2342a4 tweaks to processing script 2025-02-10 19:18:13 -06:00
mrq
b3f9b76fd9 invalidate a path if loading via metadata and entry is not in hdf5 (to avoid reparsing my metadata since I'm using a partial copy of my dataset at the moment) 2025-02-10 14:43:15 -06:00
mrq
075ffef68a ugh 2025-02-09 13:02:51 -06:00
mrq
953015748f ugh 2025-02-07 20:49:28 -06:00
mrq
ed94b261dc could have sworn i had 'vall_e.emb.process --dtype' working, also possible RAM optimization so I can stop locking up my server when firing four encoding processes 2025-02-07 18:52:19 -06:00
mrq
67a9401cce oops 2025-02-06 15:14:14 -06:00
mrq
712ce4af5d maybe fixed errors with DAC backend, added option to limit by duration in emb.process (because I only really need short utternaces right now and I'm not ready to spend a week on processing everything again) 2025-02-06 12:37:18 -06:00
mrq
299cc88821 re-added amp encoding/decoding for audio, possible bad idea to ignore using amp instead if requested 2025-02-05 21:55:06 -06:00
mrq
7592befc53 updated vall_e.emb.process to allow for batched processing, some typo fixes (it's painfully slow on my 7900XTX...) 2025-02-05 21:13:20 -06:00
mrq
79c504c278 cleaned up encode/decode functions to make them a little more coherent, added option to batch encode/decode (would have been very nice in the past, but this should speed things up for me when i fall for the latest meme codec) 2025-02-05 20:54:31 -06:00
mrq
84174c1c1b oops 2025-02-05 10:25:03 -06:00
mrq
bb2ebe1ca2 fixed issues that may rise from updating transformers with attention, added nvidia/audio-codec-44khz backend support (by gutting everything necessary because I do NOT want to install more dependencies 2025-02-04 20:30:07 -06:00
mrq
c2c6d912ac actually do speaker verification 2024-12-17 10:11:14 -06:00
mrq
cd4a5f427c KO/ZH model soon 2024-12-15 17:01:14 -06:00
mrq
20b87bfbd0 store metrics and only recalculate them if the output file is newer than the metrics file 2024-12-11 20:55:43 -06:00
mrq
0c69e798f7 template cleanup 2024-12-11 20:06:55 -06:00
mrq
7e54e897f7 also shifted to transformer's pipeline for transcribing 2024-12-11 19:57:53 -06:00
mrq
b81a98799b uplifting transformer's WavLM stuff to do speaker verification instead 2024-12-11 19:30:05 -06:00
mrq
6f1ee0c6fa Added CER, transcription/similarity model args in demo 2024-12-10 21:00:51 -06:00
mrq
8568a93dad added WER/SIM-O metrics, added APOLLO but I need to test it 2024-12-10 20:13:21 -06:00
mrq
a6c745bafb chinese (mandarin?) support added (I guess I don't need pinyin, but tone markers are handled), korean validated, vocab adjusted 2024-12-09 14:26:19 -06:00
mrq
a032ff588f doc update, added automatically deducing language from a given text, also checks if the input is already phonemized text to allow direct control without being cringe (procrastinating adding WER/SIM-O) 2024-12-07 22:34:25 -06:00
mrq
48490757da fixes 2024-11-10 20:37:50 -06:00
mrq
bbc2de3713 ugh 2024-11-05 11:50:05 -06:00
mrq
3826f9bae4 saner mask creation? (it doesnt matter, kv cache wont work) 2024-11-02 21:00:21 -05:00
mrq
bef43a0c18 added experimental entropix sampling support 2024-10-11 21:18:26 -05:00
mrq
2ea978f318 added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval 2024-10-10 13:40:25 -05:00
mrq
52299127ab fix vall_e.emb.process 2024-10-08 20:00:34 -05:00
mrq
0656a762af fix vall_e.emb.transcriber 2024-10-08 19:24:43 -05:00
mrq
10df2ef5f3 fixed oversight where input audio does not resample (lol...) 2024-09-27 20:27:53 -05:00
mrq
c8d4716a9f ugh 2024-09-18 21:40:57 -05:00
mrq
fa9d3f6c06 lang fixes / reworked phoneme symmap validation 2024-09-18 19:36:03 -05:00
mrq
84647f588a more tweaks 2024-09-18 16:43:57 -05:00
mrq
ebac1db16c maybe final tweaks, I really needed to unify my json read/write and orjson is proven to be fast enough for me to try and rely on it more 2024-09-17 22:57:04 -05:00
mrq
6ceed866b5 *faster* 2024-09-17 22:44:36 -05:00
mrq
f00283440c faster 2024-09-17 22:26:31 -05:00
mrq
be22b65300 solved my problem 2024-09-17 21:58:44 -05:00
mrq
8f41d1b324 more tweaks 2024-09-17 16:26:30 -05:00
mrq
804ddb5182 optimizations (6 hours to do cosine similarities on a speaker set of just 17k utterances................) 2024-09-17 15:51:45 -05:00
mrq
a9fbe81f98 oops 2024-09-17 15:25:12 -05:00
mrq
c440c4fe7e relegated processing similarity data into vall_e.emb.similarity since it's easier, seems to work? 2024-09-17 14:37:21 -05:00
mrq
56f25f7a9b more stuff for similar-speaker prompt sampling (to-do: actually test if this works...) 2024-09-16 23:10:29 -05:00
mrq
69f140ba45 fix oversight with phonemizing french because espeak defines french as fr-fr instead of fr (even though spain spanish is es and not es-sp or some shit, but portugal portuguese is pt-pt) 2024-09-13 12:53:36 -05:00
mrq
4f3c7a37c8 also do text similarities (dont know what use I'll have for this) 2024-09-10 16:45:59 -05:00
mrq
1c615a0f52 helper script (vall_e.emb.similar) to figure out the best way to compute similarity scores for audio (iunno how to go about it desu) 2024-09-10 16:34:23 -05:00
mrq
32287710a2 moved prints to use logger, edited readme (fused_attn doesnt seem stable for training) 2024-08-29 13:27:16 -05:00
mrq
054d28573a my DAC dataset again managed to only have some utterances with only 8 of 9 RVQ levels, this fixes an oversight from it 2024-08-09 21:18:01 -05:00
mrq
79a6781c9e fix vall_e.data --action=hdf5 actually transcribing because past me completely forgot it tried to already put the transcribe/process dataset scripts inside the module before 2024-08-08 07:51:42 -05:00