|
6ee505cffd
|
fixed dac
|
2025-03-12 23:17:27 -05:00 |
|
|
1d3290b023
|
could have sworn this worked before, might have broke it when i decoupled from omegaconf
|
2025-03-01 19:30:26 -06:00 |
|
|
17094b8002
|
reticulating splines
|
2025-03-01 17:48:51 -06:00 |
|
|
b640fabab5
|
borrowed muon since it might better work under deepspeed and not require cruft (even though it really does not like the masked-NAR, also make the masked-NAR faux-causal since it might better help out for cfg.model.version >= 7
|
2025-02-23 17:23:24 -06:00 |
|
|
ab0abd2b12
|
fixes fixes fixes (a quarter of my recently processed audio returned zero'd tensors......)
|
2025-02-22 09:07:33 -06:00 |
|
|
50506e5ebc
|
oops
|
2025-02-20 20:55:58 -06:00 |
|
|
fc1ec2019d
|
added option to buffer process jobs across multiple speakers to maybe squeeze out some throughput speeds for vall_e.emb.process (in the event of lots of speakers with low file counts, such as Emilia)
|
2025-02-20 14:56:32 -06:00 |
|
|
ce1ca0124a
|
lol...
|
2025-02-20 13:40:36 -06:00 |
|
|
92139b6da9
|
additional cruft, added a note in documentation to be aware of NUMA node topology when running vall_e.emb.process with more than one process
|
2025-02-18 19:56:30 -06:00 |
|
|
596c2df11c
|
added arg to skip processing speakers with not enough utterances for whenever I get around to processing my subest of Emilia for nvidia/audio-codec-44khz (because Emilia has a ton of low-utternace speaker counts and right now my focus with the nemo model is on getting it to actually speak without much problems rather than feed it a gorillion speakers)
|
2025-02-18 10:49:21 -06:00 |
|
|
8331eee6fa
|
added arg to limit vall_e.emb.process batch size since there's some speaker groups in LibriLight/Speech/whatever that have 10K utterances and I'm going impatient
|
2025-02-18 10:19:17 -06:00 |
|
|
8f86cf0e4e
|
possible logic optimization so I don't spend another 15 minutes simply iterating back to the point I was at in vall_e.emb.process
|
2025-02-16 11:34:05 -06:00 |
|
|
d4a6709fb4
|
stopgap cringe to get this training session working (it does not seem fruitful)
|
2025-02-11 13:45:09 -06:00 |
|
|
c0b46b82eb
|
tweaks
|
2025-02-10 21:48:29 -06:00 |
|
|
d6a679ca5c
|
tweaks
|
2025-02-10 20:53:08 -06:00 |
|
|
276a2342a4
|
tweaks to processing script
|
2025-02-10 19:18:13 -06:00 |
|
|
b3f9b76fd9
|
invalidate a path if loading via metadata and entry is not in hdf5 (to avoid reparsing my metadata since I'm using a partial copy of my dataset at the moment)
|
2025-02-10 14:43:15 -06:00 |
|
|
075ffef68a
|
ugh
|
2025-02-09 13:02:51 -06:00 |
|
|
953015748f
|
ugh
|
2025-02-07 20:49:28 -06:00 |
|
|
ed94b261dc
|
could have sworn i had 'vall_e.emb.process --dtype' working, also possible RAM optimization so I can stop locking up my server when firing four encoding processes
|
2025-02-07 18:52:19 -06:00 |
|
|
67a9401cce
|
oops
|
2025-02-06 15:14:14 -06:00 |
|
|
712ce4af5d
|
maybe fixed errors with DAC backend, added option to limit by duration in emb.process (because I only really need short utternaces right now and I'm not ready to spend a week on processing everything again)
|
2025-02-06 12:37:18 -06:00 |
|
|
299cc88821
|
re-added amp encoding/decoding for audio, possible bad idea to ignore using amp instead if requested
|
2025-02-05 21:55:06 -06:00 |
|
|
7592befc53
|
updated vall_e.emb.process to allow for batched processing, some typo fixes (it's painfully slow on my 7900XTX...)
|
2025-02-05 21:13:20 -06:00 |
|
|
79c504c278
|
cleaned up encode/decode functions to make them a little more coherent, added option to batch encode/decode (would have been very nice in the past, but this should speed things up for me when i fall for the latest meme codec)
|
2025-02-05 20:54:31 -06:00 |
|
|
84174c1c1b
|
oops
|
2025-02-05 10:25:03 -06:00 |
|
|
bb2ebe1ca2
|
fixed issues that may rise from updating transformers with attention, added nvidia/audio-codec-44khz backend support (by gutting everything necessary because I do NOT want to install more dependencies
|
2025-02-04 20:30:07 -06:00 |
|
|
c2c6d912ac
|
actually do speaker verification
|
2024-12-17 10:11:14 -06:00 |
|
|
cd4a5f427c
|
KO/ZH model soon
|
2024-12-15 17:01:14 -06:00 |
|
|
20b87bfbd0
|
store metrics and only recalculate them if the output file is newer than the metrics file
|
2024-12-11 20:55:43 -06:00 |
|
|
0c69e798f7
|
template cleanup
|
2024-12-11 20:06:55 -06:00 |
|
|
7e54e897f7
|
also shifted to transformer's pipeline for transcribing
|
2024-12-11 19:57:53 -06:00 |
|
|
b81a98799b
|
uplifting transformer's WavLM stuff to do speaker verification instead
|
2024-12-11 19:30:05 -06:00 |
|
|
6f1ee0c6fa
|
Added CER, transcription/similarity model args in demo
|
2024-12-10 21:00:51 -06:00 |
|
|
8568a93dad
|
added WER/SIM-O metrics, added APOLLO but I need to test it
|
2024-12-10 20:13:21 -06:00 |
|
|
a6c745bafb
|
chinese (mandarin?) support added (I guess I don't need pinyin, but tone markers are handled), korean validated, vocab adjusted
|
2024-12-09 14:26:19 -06:00 |
|
|
a032ff588f
|
doc update, added automatically deducing language from a given text, also checks if the input is already phonemized text to allow direct control without being cringe (procrastinating adding WER/SIM-O)
|
2024-12-07 22:34:25 -06:00 |
|
|
48490757da
|
fixes
|
2024-11-10 20:37:50 -06:00 |
|
|
bbc2de3713
|
ugh
|
2024-11-05 11:50:05 -06:00 |
|
|
3826f9bae4
|
saner mask creation? (it doesnt matter, kv cache wont work)
|
2024-11-02 21:00:21 -05:00 |
|
|
bef43a0c18
|
added experimental entropix sampling support
|
2024-10-11 21:18:26 -05:00 |
|
|
2ea978f318
|
added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval
|
2024-10-10 13:40:25 -05:00 |
|
|
52299127ab
|
fix vall_e.emb.process
|
2024-10-08 20:00:34 -05:00 |
|
|
0656a762af
|
fix vall_e.emb.transcriber
|
2024-10-08 19:24:43 -05:00 |
|
|
10df2ef5f3
|
fixed oversight where input audio does not resample (lol...)
|
2024-09-27 20:27:53 -05:00 |
|
|
c8d4716a9f
|
ugh
|
2024-09-18 21:40:57 -05:00 |
|
|
fa9d3f6c06
|
lang fixes / reworked phoneme symmap validation
|
2024-09-18 19:36:03 -05:00 |
|
|
84647f588a
|
more tweaks
|
2024-09-18 16:43:57 -05:00 |
|
|
ebac1db16c
|
maybe final tweaks, I really needed to unify my json read/write and orjson is proven to be fast enough for me to try and rely on it more
|
2024-09-17 22:57:04 -05:00 |
|
|
6ceed866b5
|
*faster*
|
2024-09-17 22:44:36 -05:00 |
|