vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	6845c447c9	added more harvard sentences to load from a text file	2024-11-21 13:18:11 -06:00
mrq	2b29790173	oops	2024-11-18 14:12:26 -06:00
mrq	4a71981456	normalize sampler index by batch size (if not using batched sampler), add option to cap out utterances for a speaker, some other things	2024-11-18 12:46:50 -06:00
mrq	39096f8ff3	redid loss calculation to be cleaner, and position ID generation, and other things (I might need to train the NAR-len from scratch and not resume from an existing checkpoint.........)	2024-11-14 22:17:47 -06:00
mrq	e412e98125	ugh	2024-11-14 07:34:22 -06:00
mrq	c00fc18b62	actually use the right embedding for nar-len	2024-11-13 18:04:04 -06:00
mrq	976ee87f6f	resume iteration step in tqdm trainer, warn to logger if the sampler state dict was invalidated	2024-11-13 09:09:28 -06:00
mrq	0f2584eba7	new meme sampler PogChamp new meme sampler PogChamp (it sort of helps?)	2024-11-12 22:30:09 -06:00
mrq	2495a7ef67	Fixed STT in the web UI	2024-11-12 12:49:53 -06:00
mrq	2f56696506	overhauled inference/sampler kwargs to stop being a bloated mess	2024-11-11 20:21:16 -06:00
mrq	354f8e059d	store dataset hash alongside state dict so it can be ignored if mismatched	2024-11-11 18:16:56 -06:00
mrq	f7b8b1e825	dropped subtrain dataloader since its useless to duplicate	2024-11-11 17:00:49 -06:00
mrq	cf9df71f2c	use homwbrewed caching system for dataloader paths / durations (I'm pretty sure I am now triggering OOM killers with my entire dataset used)	2024-11-11 16:32:08 -06:00
mrq	9def34cd66	lol	2024-11-10 12:48:41 -06:00
mrq	9cb0b6901b	unified nar.py into ar_nar.py	2024-11-10 12:19:48 -06:00
mrq	3826f9bae4	saner mask creation? (it doesnt matter, kv cache wont work)	2024-11-02 21:00:21 -05:00
mrq	ef1c17430f	skip step on nan loss (ironically I have not had a nan loss after adding this), throw exception with invalid cfg.dataset.sample_type and sample_order combination (because I was tricked by this in my yaml and had inconsistent vram usage)	2024-11-01 20:54:53 -05:00
mrq	8eb9a4056b	modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling	2024-10-22 18:12:39 -05:00
mrq	0dfab973e7	oops	2024-10-18 09:40:06 -05:00
mrq	75b90be325	cleaned up unused config flags, allow less strict yaml by pruning missing keys, renamed some dataset configs to be more unified	2024-10-17 17:06:48 -05:00
mrq	f88097ccf6	add config option to set the rate of sampling randomly vs similar speakers during training	2024-10-16 14:27:58 -05:00
mrq	bef43a0c18	added experimental entropix sampling support	2024-10-11 21:18:26 -05:00
mrq	85d85c1351	more arg creep for demo page	2024-10-10 19:40:01 -05:00
mrq	75a4c866d6	more demo page tweaks, added arg to force enable/disable LoRAs for inferencing (to-do: setup arg flags to handle this, and checkbox in web UI)	2024-10-10 19:04:12 -05:00
mrq	96d05be73c	demo page tweaks	2024-10-10 13:52:37 -05:00
mrq	2ea978f318	added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval	2024-10-10 13:40:25 -05:00
mrq	ff7a1b4163	coerce into path for other sampler_types (it's required for sampling for similar utterances)	2024-09-26 18:37:56 -05:00
mrq	f24547ad4e	add top_k sampling / offset for prompt similar utterance sampling	2024-09-26 16:26:40 -05:00
mrq	c5e9142863	added option to retokenize phonemes for hdf5 (to save having to remake my hdf5 file)	2024-09-21 13:08:01 -05:00
mrq	536c11c4ac	actually validated and fixed sampling similar utterances for the prompt (hopefully nothing else is needed)	2024-09-21 12:59:51 -05:00
mrq	d31f27119a	regex replace out the (lang) markers in espeak, updated tokenizer vocab as lazily as possible to not have unk tokens	2024-09-21 12:29:28 -05:00
mrq	769f67dcfe	actually fix validation of phonemes in the symmap	2024-09-21 12:19:34 -05:00
mrq	fe241f6a99	support for wildcard in training/validation/noise dataset array (to-do: a better way to query between metadata folder and data folder)	2024-09-18 21:34:43 -05:00
mrq	fa9d3f6c06	lang fixes / reworked phoneme symmap validation	2024-09-18 19:36:03 -05:00
mrq	84647f588a	more tweaks	2024-09-18 16:43:57 -05:00
mrq	ebac1db16c	maybe final tweaks, I really needed to unify my json read/write and orjson is proven to be fast enough for me to try and rely on it more	2024-09-17 22:57:04 -05:00
mrq	a9fbe81f98	oops	2024-09-17 15:25:12 -05:00
mrq	c440c4fe7e	relegated processing similarity data into vall_e.emb.similarity since it's easier, seems to work?	2024-09-17 14:37:21 -05:00
mrq	56f25f7a9b	more stuff for similar-speaker prompt sampling (to-do: actually test if this works...)	2024-09-16 23:10:29 -05:00
mrq	1c615a0f52	helper script (vall_e.emb.similar) to figure out the best way to compute similarity scores for audio (iunno how to go about it desu)	2024-09-10 16:34:23 -05:00
mrq	d059f6f56d	added helper script to process Emilia (amphion/Emilia-Dataset), clean up espeak phonemes for non-English transcriptions with English words (because for some reason espeak injects (en){word}(lang) markers and it's annoying)	2024-09-09 09:57:32 -05:00
mrq	31e8b7edb8	tweaks and fixes for lora stuffs	2024-09-08 18:05:21 -05:00
mrq	fa93061b3e	more fixes, moved sampler state dict to a better place, eval works again	2024-09-06 16:59:56 -05:00
mrq	341e19162b	fixes, again	2024-09-06 11:41:41 -05:00
mrq	94cf81d38c	tweak	2024-09-05 23:21:18 -05:00
mrq	54547b74d8	experimental implementation of STT (need to actually test on a model, test trainer seems to work)	2024-09-05 20:43:20 -05:00
mrq	32287710a2	moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)	2024-08-29 13:27:16 -05:00
mrq	d636edd3a2	added flash_attn LlamaAttention (including flash_attn==1.0.9)	2024-08-18 20:51:14 -05:00
mrq	2a1794c084	ughghghhhh	2024-08-09 21:15:01 -05:00
mrq	c658a7b440	make loss scaling opt-in rather than automatically determined (because it seems a DAC-based model really doesnt like loss scaling)	2024-08-09 10:51:36 -05:00

1 2 3 4

175 Commits