vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	f69aad9c65	some day I'll get it right	2023-09-08 15:36:26 -05:00
mrq	b2907ae7e0	seems that my PromEmbedding/RespEmbedding doesn't actually work all that well, naively using dedicated MultiEmbeddings for AR/NAR in the monolithic model is the best way to go	2023-09-08 01:03:24 -05:00
mrq	67617d7d69	also cull frozen_params in the params optimizer receives to reduce VRAM it consumes	2023-09-07 18:27:02 -05:00
mrq	8837bc34d7	added option to specify parameters to freeze per-model in YAML (because I need to see about committing atrocities with convering an AR into an AR+NAR)	2023-09-07 18:19:51 -05:00
mrq	c47fc3274e	added backwards compat flag	2023-09-07 17:12:17 -05:00
mrq	ab5134f385	tweaks and fixes	2023-09-07 17:08:38 -05:00
mrq	b2c2dec291	added homebrewed per-RVQ-bin embedding solutions	2023-09-07 16:48:02 -05:00
mrq	e7a67410d1	oops	2023-09-07 09:14:03 -05:00
mrq	712808494f	added support for optional prodigy optimizer (https://github.com/konstmish/prodigy ) although it consumes a lot more VRAM per parameter	2023-09-06 20:33:16 -05:00
mrq	7ce06432fd	fixed the AR+NAR dual model, the resp_emb has to be split up (classifier might too)	2023-09-06 19:33:39 -05:00
mrq	100ca6b7d0	added option to use SGD optimizer through the YAML, added option to pass in additional optimizer parameters through the YAML, added experimental unified AR+NAR model (does not seem fruitful in testing)	2023-09-06 18:58:35 -05:00
mrq	451726fdd5	added ability to disable activation checkpointing through the YAML (it is very VRAM intensive at double layer size)	2023-09-05 15:38:21 -05:00
mrq	143aee7526	removed dedicated interleaved AR code	2023-09-03 22:47:03 -05:00
mrq	2f9cd0842f	merged dedicated interleaved AR code with the normal AR code	2023-09-03 22:46:08 -05:00
mrq	3a6bd50322	haha	2023-09-03 21:36:58 -05:00
mrq	c56ce033d9	work on an interleaved AR (spoiler: it does not work)	2023-09-03 21:27:58 -05:00
mrq	8a6c203277	added per-speaker samplers	2023-09-03 21:27:13 -05:00
mrq	81b05dabb9	accurate epoch metric is now reported (based on samples processed / length of dataset's paths, rather than naive assumptions)	2023-09-03 08:03:36 -05:00
mrq	922404285c	fixed segfault from tts-c task token exceeding being too big (inserted it in the hypothetical svc task token because in reality that is never ever going to be a feasible task to train against)	2023-09-02 19:25:43 -05:00
mrq	4613781e23	integrated plot script, added tts-c task token to help the model be able to mix between normal VALL-E and VALL-E continuous	2023-09-02 16:29:53 -05:00
mrq	f7e942ec99	modified plotting script to be more agnostic to X	2023-09-02 13:59:43 -05:00
mrq	71e68a8528	tweaked tts-continuous task	2023-09-02 13:39:17 -05:00
mrq	21e5d250cc	fixed up plot script that I forgot about	2023-09-02 13:31:04 -05:00
mrq	57db3ccfa8	shuffled VALL-E continuous as a task tts-c instead, logic fixes for it	2023-09-02 12:23:40 -05:00
mrq	2f06166ddd	cleanups	2023-09-01 21:33:51 -05:00
mrq	e40c0d34a0	somewhat got recurrent forward working (it's as accurate as chunkwise forward: it's not accurate at all), added option to use AMP instead of blanket setting the weight's dtype	2023-09-01 20:58:29 -05:00
mrq	2bc2d08b09	(need to verify) added modifying model size and config bool to align with VALL-E continuous' methodology	2023-09-01 17:19:34 -05:00
mrq	5c8694db8e	nasty bandaid if there's no validation dataset specified during training (for example, during finetunes)	2023-08-30 18:23:05 -05:00
mrq	7f4388e591	added total samples processed and tokens processed (len of text tokens + len of target response tokens)	2023-08-28 11:02:45 -05:00
mrq	87c4bfedba	added ability to mark models as disabled for training, and hotloading them for eval/validation (useful if training only one model, or training a model per GPU)	2023-08-27 12:26:12 -05:00
mrq	165a1154e0	Undo naive=False test flag, this shouldn't have made its way in	2023-08-26 22:00:43 -05:00
mrq	78378ed1ce	overhauled dataloading code to be marginally faster, mostly cleaned up, and can leverage a metadata json to help things out	2023-08-26 19:53:23 -05:00
mrq	7b3be3d7bf	added helper scripts to process LibriTTS/LibriLight, detect duplicate speaker+books between them, and script to directly phonemize and quantize LibriTTS	2023-08-26 10:21:12 -05:00
mrq	16e0020901	disabled chunkwise_recurrent for 2x speed gains (I suppose it has been working the entire time, but I have not been properly grabbing things, and this might explain why the output is bad)	2023-08-25 19:50:19 -05:00
mrq	6455a2f9d7	I think I fixed a bug?	2023-08-24 23:33:36 -05:00
mrq	f3fbed5ffd	updated notices tailored for windows / low VRAM cards	2023-08-24 17:19:10 -05:00
mrq	0517d620b8	fixes with the local backend	2023-08-24 17:05:56 -05:00
mrq	00ad4af651	updated draconian requirement for espeak-ng to be installed and the env var set to the dll for Windows	2023-08-24 14:57:01 -05:00
mrq	b6c9686f7d	Do not install DeepSpeed under Windows (to-do: default backend to use local if on Windows)	2023-08-24 14:27:36 -05:00
mrq	22904a8639	more oversights fixed because I've been using a cached dataloader forever now and didn't catch these problems	2023-08-24 10:25:33 -05:00
mrq	5873c27f1a	ops	2023-08-24 09:20:47 -05:00
mrq	501a857d5d	ops	2023-08-23 17:03:25 -05:00
mrq	4585824cd3	tweaks, including exporting on save/quit	2023-08-23 16:43:03 -05:00
mrq	d106598403	do not utilize diskcache if a config yaml is not loaded	2023-08-23 11:02:15 -05:00
mrq	524d289c9c	Forgot to re-add in setting the weight's dtype on model load	2023-08-22 22:57:23 -05:00
mrq	9c5a33bfd2	added repo with my weights so far	2023-08-22 13:09:44 -05:00
mrq	7b1b82e0e5	inferencing cleanup	2023-08-20 21:36:02 -05:00
mrq	a47029065b	I don't know if the lack of start/stop tokens being added was causing my inference tests to fail, but it seems better now	2023-08-20 19:21:54 -05:00
mrq	736c077282	ops	2023-08-20 13:42:18 -05:00
mrq	b105f6211e	added ability to export weights mid-training to avoid CBT to yank the weights while the training script is running	2023-08-20 13:39:58 -05:00

... 7 8 9 10 11

541 Commits