vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	712808494f	added support for optional prodigy optimizer (https://github.com/konstmish/prodigy ) although it consumes a lot more VRAM per parameter	2023-09-06 20:33:16 -05:00
mrq	7ce06432fd	fixed the AR+NAR dual model, the resp_emb has to be split up (classifier might too)	2023-09-06 19:33:39 -05:00
mrq	100ca6b7d0	added option to use SGD optimizer through the YAML, added option to pass in additional optimizer parameters through the YAML, added experimental unified AR+NAR model (does not seem fruitful in testing)	2023-09-06 18:58:35 -05:00
mrq	451726fdd5	added ability to disable activation checkpointing through the YAML (it is very VRAM intensive at double layer size)	2023-09-05 15:38:21 -05:00
mrq	143aee7526	removed dedicated interleaved AR code	2023-09-03 22:47:03 -05:00
mrq	2f9cd0842f	merged dedicated interleaved AR code with the normal AR code	2023-09-03 22:46:08 -05:00
mrq	3a6bd50322	haha	2023-09-03 21:36:58 -05:00
mrq	c56ce033d9	work on an interleaved AR (spoiler: it does not work)	2023-09-03 21:27:58 -05:00
mrq	8a6c203277	added per-speaker samplers	2023-09-03 21:27:13 -05:00
mrq	81b05dabb9	accurate epoch metric is now reported (based on samples processed / length of dataset's paths, rather than naive assumptions)	2023-09-03 08:03:36 -05:00
mrq	922404285c	fixed segfault from tts-c task token exceeding being too big (inserted it in the hypothetical svc task token because in reality that is never ever going to be a feasible task to train against)	2023-09-02 19:25:43 -05:00
mrq	4613781e23	integrated plot script, added tts-c task token to help the model be able to mix between normal VALL-E and VALL-E continuous	2023-09-02 16:29:53 -05:00
mrq	f7e942ec99	modified plotting script to be more agnostic to X	2023-09-02 13:59:43 -05:00
mrq	71e68a8528	tweaked tts-continuous task	2023-09-02 13:39:17 -05:00
mrq	21e5d250cc	fixed up plot script that I forgot about	2023-09-02 13:31:04 -05:00
mrq	57db3ccfa8	shuffled VALL-E continuous as a task tts-c instead, logic fixes for it	2023-09-02 12:23:40 -05:00
mrq	2f06166ddd	cleanups	2023-09-01 21:33:51 -05:00
mrq	e40c0d34a0	somewhat got recurrent forward working (it's as accurate as chunkwise forward: it's not accurate at all), added option to use AMP instead of blanket setting the weight's dtype	2023-09-01 20:58:29 -05:00
mrq	2bc2d08b09	(need to verify) added modifying model size and config bool to align with VALL-E continuous' methodology	2023-09-01 17:19:34 -05:00
mrq	5c8694db8e	nasty bandaid if there's no validation dataset specified during training (for example, during finetunes)	2023-08-30 18:23:05 -05:00
mrq	7f4388e591	added total samples processed and tokens processed (len of text tokens + len of target response tokens)	2023-08-28 11:02:45 -05:00
mrq	87c4bfedba	added ability to mark models as disabled for training, and hotloading them for eval/validation (useful if training only one model, or training a model per GPU)	2023-08-27 12:26:12 -05:00
mrq	165a1154e0	Undo naive=False test flag, this shouldn't have made its way in	2023-08-26 22:00:43 -05:00
mrq	78378ed1ce	overhauled dataloading code to be marginally faster, mostly cleaned up, and can leverage a metadata json to help things out	2023-08-26 19:53:23 -05:00
mrq	7b3be3d7bf	added helper scripts to process LibriTTS/LibriLight, detect duplicate speaker+books between them, and script to directly phonemize and quantize LibriTTS	2023-08-26 10:21:12 -05:00
mrq	16e0020901	disabled chunkwise_recurrent for 2x speed gains (I suppose it has been working the entire time, but I have not been properly grabbing things, and this might explain why the output is bad)	2023-08-25 19:50:19 -05:00
mrq	6455a2f9d7	I think I fixed a bug?	2023-08-24 23:33:36 -05:00
mrq	f3fbed5ffd	updated notices tailored for windows / low VRAM cards	2023-08-24 17:19:10 -05:00
mrq	0517d620b8	fixes with the local backend	2023-08-24 17:05:56 -05:00
mrq	00ad4af651	updated draconian requirement for espeak-ng to be installed and the env var set to the dll for Windows	2023-08-24 14:57:01 -05:00
mrq	b6c9686f7d	Do not install DeepSpeed under Windows (to-do: default backend to use local if on Windows)	2023-08-24 14:27:36 -05:00
mrq	22904a8639	more oversights fixed because I've been using a cached dataloader forever now and didn't catch these problems	2023-08-24 10:25:33 -05:00
mrq	5873c27f1a	ops	2023-08-24 09:20:47 -05:00
mrq	501a857d5d	ops	2023-08-23 17:03:25 -05:00
mrq	4585824cd3	tweaks, including exporting on save/quit	2023-08-23 16:43:03 -05:00
mrq	d106598403	do not utilize diskcache if a config yaml is not loaded	2023-08-23 11:02:15 -05:00
mrq	524d289c9c	Forgot to re-add in setting the weight's dtype on model load	2023-08-22 22:57:23 -05:00
mrq	9c5a33bfd2	added repo with my weights so far	2023-08-22 13:09:44 -05:00
mrq	7b1b82e0e5	inferencing cleanup	2023-08-20 21:36:02 -05:00
mrq	a47029065b	I don't know if the lack of start/stop tokens being added was causing my inference tests to fail, but it seems better now	2023-08-20 19:21:54 -05:00
mrq	736c077282	ops	2023-08-20 13:42:18 -05:00
mrq	b105f6211e	added ability to export weights mid-training to avoid CBT to yank the weights while the training script is running	2023-08-20 13:39:58 -05:00
mrq	fc576010ce	wrapped saving the checkpoint in a try/catch so I can stop waking up to the damn trainer crashing because it ran out of disk space; I'd much rather it keep training to give me time to eventually clear up disk space rather than it silently restarting on its own	2023-08-20 06:29:17 -05:00
mrq	2d1a9f10c0	nightmare of spaghetti that might break compat; mechanism to increase RVQ bins of an existing model without retraining, keeps sampled proms/resps at max RVQ level and trim off excess levels according to what model receives them, some other things I already forgot (I really hope no one else has weights being baked right now)	2023-08-19 15:06:33 -05:00
mrq	f7f6d3bf6d	validated that SpeechX tasks cse and nse works, added a method to test each task by invoking `python3 -m vall_e.data --action=tasks --tasks='sr,se,cse,nse'`	2023-08-19 09:50:07 -05:00
mrq	6ca347e1e1	literally had a urethra moment before going to bed with a way to implement cse/nse tasks	2023-08-19 01:16:46 -05:00
mrq	8f42c578c9	setting up for allowing training for a partial amount of the speechx tasks (do NOT try this at home yet without a proper model, as performance is predecated on having a solid base vall-e model for the tasks	2023-08-19 00:16:08 -05:00
mrq	ae9d38aa31	forgot to have it pull from specified noise to the hdf5 dataset	2023-08-18 23:57:07 -05:00
mrq	77292c42f9	tested the training preparation for tasks ns, sr, and tse (I don't expect it to go well with only 2 RVQ bins)	2023-08-18 23:55:40 -05:00
mrq	bbb0563b3d	pseudocode polyfill stub some other flavor of working on adding the tasks	2023-08-18 22:22:13 -05:00

1 2 3 4

183 Commits