vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	9f2bd7f6e4	ugh	2024-12-17 23:17:12 -06:00
mrq	9090c34f10	cringe script to process seed-tts-eval's eval dataset into something i can easily use	2024-12-17 22:47:12 -06:00
mrq	fc5e6d8599	fixes to process_emilia.py script	2024-12-09 14:38:09 -06:00
mrq	fe241f6a99	support for wildcard in training/validation/noise dataset array (to-do: a better way to query between metadata folder and data folder)	2024-09-18 21:34:43 -05:00
mrq	b5bec0c9ce	oops, turns out these are not split by speaker names already........ (also added sampling the dataset in the webui for easy viewing)	2024-09-18 20:19:46 -05:00
mrq	56f25f7a9b	more stuff for similar-speaker prompt sampling (to-do: actually test if this works...)	2024-09-16 23:10:29 -05:00
mrq	17487ad70a	weird quirk in process_emilia.py where language gets mutated, somehow (I hate python)	2024-09-10 14:00:27 -05:00
mrq	d059f6f56d	added helper script to process Emilia (amphion/Emilia-Dataset), clean up espeak phonemes for non-English transcriptions with English words (because for some reason espeak injects (en){word}(lang) markers and it's annoying)	2024-09-09 09:57:32 -05:00
mrq	9710b06b74	tweaks and things	2024-08-06 08:17:25 -05:00
mrq	8bac8fe902	oops	2024-08-05 20:38:29 -05:00
mrq	134dac8c2b	re-adapted process_libritts.py to a 'better' way (better because it processed without needing to shuffle a bunch of things and adapt to cope or something)	2024-08-05 20:34:58 -05:00
mrq	597441e48b	moved transcribe and process dataset scripts to vall_e/emb within the module itself, argparse-ified transcription script	2024-08-05 19:40:50 -05:00
mrq	7cdfa3dc0c	updated process_datasets.py, added argparsing so I can mostly stop manually editing things, and some other cleanup	2024-08-05 15:59:25 -05:00
mrq	d19f93a2c0	documentation update	2024-08-04 00:14:49 -05:00
mrq	11fa3da665	some cleanup, fixed the wrapper attention to explicitly use other sdpa backends	2024-08-03 19:51:00 -05:00
mrq	9564ecda43	wrapper attention class for other sdpa backends + xformers seems to have broke...	2024-08-03 15:12:11 -05:00
mrq	ad024f400f	actually pass language into dataset process script, fix coercing japanese into hiragana because espeak does not like kanji	2024-07-21 23:21:37 -05:00
mrq	7b210d9738	sanity cleanup	2024-07-04 15:58:08 -05:00
mrq	db62e55a38	oops, I forgot to use the new thing for audio_backend	2024-07-04 14:54:11 -05:00
mrq	7feeb944a0	probably insane with even entertaining going this route	2024-06-03 20:26:27 -05:00
mrq	ddbacde0d1	DAC just doesn't work well enough......	2024-05-25 11:07:52 -05:00
mrq	74e531d391	ugh	2024-05-18 12:02:56 -05:00
mrq	59ef9461f8	ugh	2024-05-18 10:13:58 -05:00
mrq	d9aabfa3ae	final tweaks, hopefully, again	2024-05-15 23:04:19 -05:00
mrq	2437a86efa	ugh	2024-05-12 13:02:15 -05:00
mrq	4f1593c8db	a bunch of shit to salvage my old encodec-quantized audio because dac-encoded audio just does not want to converge	2024-05-12 10:17:29 -05:00
mrq	c6e0f905b5	final tweaks (again) before training restarts	2024-05-08 02:11:38 -05:00
mrq	8aa1b2dabf	documentation update	2024-05-04 21:03:46 -05:00
mrq	caad7ee3c9	final tweaks, hopefully	2024-04-28 22:28:29 -05:00
mrq	ffc334cf58	added dataset transcription helper script (now I don't ever have to touch ai-voice-cloning) (to-do: unify scripts into the module)	2024-04-21 17:43:20 -05:00
mrq	071fb97777	dataset preparation script updates, caved and am using HF tokenizer now	2024-04-21 14:49:18 -05:00
mrq	a8ffa88844	it slipped my mind that technically DAC can be used at any sample rate, since it models waveforms; make it a config YAML option to allow this behavior	2024-04-19 18:36:54 -05:00
mrq	00804a47e9	Forgot to copy intermediary dataset conversion script	2024-04-18 21:34:28 -05:00
mrq	4f5c9e518a	actually use the passed-through sample rate from encode for DAC because it does its own resampling I guess	2024-04-18 13:32:41 -05:00
mrq	09cda7d3f9	added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup	2023-10-16 19:30:38 -05:00
mrq	2deb995cc9	updated setup script	2023-10-06 20:08:28 -05:00
mrq	1fd91b6437	cleanup	2023-10-06 10:13:54 -05:00
mrq	3db7e7dea1	implicitly load checkpoint if deepspeed checkpoint not found, updated setup script to grab the diskcached dataloader things	2023-10-06 10:02:45 -05:00
mrq	2f2505b12f	updated setup script	2023-10-06 08:08:28 -05:00
mrq	153f8b293c	added min-x and min-y arguments to plot.py, helper script to download from my existing checkpoint	2023-10-04 19:41:37 -05:00
mrq	5ac119a6e7	added light web UI (need to port the telemetry disabling bandaids from aivc)	2023-09-09 16:17:20 -05:00
mrq	4613781e23	integrated plot script, added tts-c task token to help the model be able to mix between normal VALL-E and VALL-E continuous	2023-09-02 16:29:53 -05:00
mrq	f7e942ec99	modified plotting script to be more agnostic to X	2023-09-02 13:59:43 -05:00
mrq	21e5d250cc	fixed up plot script that I forgot about	2023-09-02 13:31:04 -05:00
mrq	5c8694db8e	nasty bandaid if there's no validation dataset specified during training (for example, during finetunes)	2023-08-30 18:23:05 -05:00
mrq	7b3be3d7bf	added helper scripts to process LibriTTS/LibriLight, detect duplicate speaker+books between them, and script to directly phonemize and quantize LibriTTS	2023-08-26 10:21:12 -05:00
mrq	bf8cedc9dd	Rewrite init	2023-08-02 21:53:35 +00:00

47 Commits