|
92139b6da9
|
additional cruft, added a note in documentation to be aware of NUMA node topology when running vall_e.emb.process with more than one process
|
2025-02-18 19:56:30 -06:00 |
|
|
0dc49ef4d5
|
documentation update while I wait for more audio (between 4 and 8 seconds per utterance) quantize for nvidia/audio-codec-44khz (I was foolish to think I can get something servicable with just 4 seconds max for an utterance)
|
2025-02-15 17:42:06 -06:00 |
|
|
04fef5dad5
|
agony
|
2025-02-12 00:18:24 -06:00 |
|
|
1c0ed6abac
|
added notes on this unfruitful experiment
|
2025-02-11 16:21:43 -06:00 |
|
|
59bf6b8b33
|
exposed additional task (ns, sr, vc) (vc is experimental)
|
2024-12-20 11:15:29 -06:00 |
|
|
8568a93dad
|
added WER/SIM-O metrics, added APOLLO but I need to test it
|
2024-12-10 20:13:21 -06:00 |
|
|
a6c745bafb
|
chinese (mandarin?) support added (I guess I don't need pinyin, but tone markers are handled), korean validated, vocab adjusted
|
2024-12-09 14:26:19 -06:00 |
|
|
a032ff588f
|
doc update, added automatically deducing language from a given text, also checks if the input is already phonemized text to allow direct control without being cringe (procrastinating adding WER/SIM-O)
|
2024-12-07 22:34:25 -06:00 |
|
|
39096f8ff3
|
redid loss calculation to be cleaner, and position ID generation, and other things (I might need to train the NAR-len from scratch and not resume from an existing checkpoint.........)
|
2024-11-14 22:17:47 -06:00 |
|
|
f7b8b1e825
|
dropped subtrain dataloader since its useless to duplicate
|
2024-11-11 17:00:49 -06:00 |
|
|
bcabde3454
|
more notes
|
2024-11-06 13:51:28 -06:00 |
|
|
9901c4f8ca
|
documentation under ./docs/
|
2024-11-05 16:11:01 -06:00 |
|