vall-e

History

mrq 7cdfa3dc0c updated process_datasets.py, added argparsing so I can mostly stop manually editing things, and some other cleanup		2024-08-05 15:59:25 -05:00
..
cleanup_dataset.py	documentation update	2024-05-04 21:03:46 -05:00
deduplicate_librilight_libritts.py	added helper scripts to process LibriTTS/LibriLight, detect duplicate speaker+books between them, and script to directly phonemize and quantize LibriTTS	2023-08-26 10:21:12 -05:00
parse_ppp.py	added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup	2023-10-16 19:30:38 -05:00
prepare_librilight.py	dataset preparation script updates, caved and am using HF tokenizer now	2024-04-21 14:49:18 -05:00
prepare_libritts.py	added helper scripts to process LibriTTS/LibriLight, detect duplicate speaker+books between them, and script to directly phonemize and quantize LibriTTS	2023-08-26 10:21:12 -05:00
process_dataset.py	updated process_datasets.py, added argparsing so I can mostly stop manually editing things, and some other cleanup	2024-08-05 15:59:25 -05:00
process_libritts.py	sanity cleanup	2024-07-04 15:58:08 -05:00
run.sh	nasty bandaid if there's no validation dataset specified during training (for example, during finetunes)	2023-08-30 18:23:05 -05:00
setup.sh	documentation update	2024-08-04 00:14:49 -05:00
train_tokenizer.py	final tweaks, hopefully, again	2024-05-15 23:04:19 -05:00
transcribe_dataset.py	documentation update	2024-05-04 21:03:46 -05:00