vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	c8f31db1de	default to greedy sample AR (i should probably test this more but it seems to pass my harvard sentences and tongue twisters)	2024-10-18 16:58:56 -05:00
mrq	fc8dfd8617	made greedy AR sampling viable (and preferable), with caveats (per comment in vall_e.models.ar_nar)	2024-10-18 16:55:00 -05:00
mrq	07f4935a75	more tweaks	2024-10-18 13:19:36 -05:00
mrq	0dfab973e7	oops	2024-10-18 09:40:06 -05:00
mrq	75b90be325	cleaned up unused config flags, allow less strict yaml by pruning missing keys, renamed some dataset configs to be more unified	2024-10-17 17:06:48 -05:00
mrq	8b6095f681	saner defaults, maybe	2024-10-17 14:37:21 -05:00
mrq	f88097ccf6	add config option to set the rate of sampling randomly vs similar speakers during training	2024-10-16 14:27:58 -05:00
mrq	48461833c2	ugh	2024-10-15 19:30:43 -05:00
mrq	eea70f5698	kludge fix for an oversight in the model when trying to train for longer input prompt durations......	2024-10-15 19:25:03 -05:00
mrq	84005c5b00	entropix apparently processes the entire sequence of logits but it falls apart when doing that	2024-10-13 12:01:12 -05:00
mrq	c800d28bb8	respect attention defined in the yaml for web UI (which might explain why theres been a discrepancy in outputs for me)	2024-10-13 11:02:24 -05:00
mrq	ed6b7a690f	ugh.........	2024-10-13 00:26:46 -05:00
mrq	d405f243d4	at wits end in trying to output the right attention scores	2024-10-12 23:53:13 -05:00
mrq	70cf694cfd	output attention scores for SDPA/flash, since naive attention seems broken	2024-10-12 12:09:17 -05:00
mrq	541e45263c	ugh	2024-10-12 11:29:16 -05:00
mrq	04e983b86b	modified demo page to be more modular with demoing comparisons, actually provide a path to use modified naive attention, entropix sampling is not tied to an experimental yaml flag now	2024-10-12 11:27:55 -05:00
mrq	666e8038fb	ugh	2024-10-12 10:41:35 -05:00
mrq	3d6ef9666b	overridden naive llama attention to get the right score values that entropix needs	2024-10-12 10:05:47 -05:00
mrq	40b089daf3	lol	2024-10-12 09:57:34 -05:00
mrq	d6f7c86a5c	entropix tweaks (it doesn't output garbage but it loves to go for silence)	2024-10-12 09:46:18 -05:00
mrq	d0ab7d755a	added min-p (really does not seem useful since it's very sensitive), more tweaks to entropix	2024-10-11 22:36:06 -05:00
mrq	bef43a0c18	added experimental entropix sampling support	2024-10-11 21:18:26 -05:00
mrq	85d85c1351	more arg creep for demo page	2024-10-10 19:40:01 -05:00
mrq	301468f519	<<	2024-10-10 19:13:52 -05:00
mrq	75a4c866d6	more demo page tweaks, added arg to force enable/disable LoRAs for inferencing (to-do: setup arg flags to handle this, and checkbox in web UI)	2024-10-10 19:04:12 -05:00
mrq	96d05be73c	demo page tweaks	2024-10-10 13:52:37 -05:00
mrq	2ea978f318	added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval	2024-10-10 13:40:25 -05:00
mrq	52299127ab	fix vall_e.emb.process	2024-10-08 20:00:34 -05:00
mrq	0656a762af	fix vall_e.emb.transcriber	2024-10-08 19:24:43 -05:00
mrq	acdce66d4e	readme tweaks, set the (unused) default model download URL back to the base ar+nar-llama-8 model, as ar+nar-tts+stt-llama-8 was renamed back to it since it performs well	2024-10-05 22:53:53 -05:00
mrq	84c7419001	faster	2024-10-04 22:30:47 -05:00
mrq	a507b769a1	sped up inferencing by not doing .tolist() for rep pen / length pen (and a bug fix in the web UI from prev commit)	2024-10-04 22:18:20 -05:00
mrq	4a8e3ccf06	README tweaks, added --input-prompt-prefix as an experiment (its literally better to just not do this, but i'll retain it in case i have a revelation on how to improve it)	2024-10-04 18:57:19 -05:00
mrq	a9fa0898a9	tweaked demo page script to sample speakers instead	2024-09-28 10:50:26 -05:00
mrq	2f1dca3089	added language selection in web UI, tweaked demo script	2024-09-28 09:49:45 -05:00
mrq	10df2ef5f3	fixed oversight where input audio does not resample (lol...)	2024-09-27 20:27:53 -05:00
mrq	039482a48e	don't do eval on stt because it's so slow and I don't even bother doing any metrics against it anyways (to-do: make this a flag)	2024-09-26 18:56:57 -05:00
mrq	ff7a1b4163	coerce into path for other sampler_types (it's required for sampling for similar utterances)	2024-09-26 18:37:56 -05:00
mrq	f24547ad4e	add top_k sampling / offset for prompt similar utterance sampling	2024-09-26 16:26:40 -05:00
mrq	9da630f73a	swap order of demo entries, as the model prioritizes adhering to the speaker prompt more (instead of trying to match the ground truth magically)	2024-09-25 23:31:24 -05:00
mrq	e84d466261	vall_e.plot tweaks	2024-09-24 20:05:10 -05:00
mrq	2266d34818	oops	2024-09-21 16:06:01 -05:00
mrq	c5e9142863	added option to retokenize phonemes for hdf5 (to save having to remake my hdf5 file)	2024-09-21 13:08:01 -05:00
mrq	536c11c4ac	actually validated and fixed sampling similar utterances for the prompt (hopefully nothing else is needed)	2024-09-21 12:59:51 -05:00
mrq	d31f27119a	regex replace out the (lang) markers in espeak, updated tokenizer vocab as lazily as possible to not have unk tokens	2024-09-21 12:29:28 -05:00
mrq	769f67dcfe	actually fix validation of phonemes in the symmap	2024-09-21 12:19:34 -05:00
mrq	c8d4716a9f	ugh	2024-09-18 21:40:57 -05:00
mrq	fe241f6a99	support for wildcard in training/validation/noise dataset array (to-do: a better way to query between metadata folder and data folder)	2024-09-18 21:34:43 -05:00
mrq	b5bec0c9ce	oops, turns out these are not split by speaker names already........ (also added sampling the dataset in the webui for easy viewing)	2024-09-18 20:19:46 -05:00
mrq	fa9d3f6c06	lang fixes / reworked phoneme symmap validation	2024-09-18 19:36:03 -05:00

1 2 3 4 5 ...

507 Commits