vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	f7b8b1e825	dropped subtrain dataloader since its useless to duplicate	2024-11-11 17:00:49 -06:00
mrq	cf9df71f2c	use homwbrewed caching system for dataloader paths / durations (I'm pretty sure I am now triggering OOM killers with my entire dataset used)	2024-11-11 16:32:08 -06:00
mrq	9def34cd66	lol	2024-11-10 12:48:41 -06:00
mrq	9cb0b6901b	unified nar.py into ar_nar.py	2024-11-10 12:19:48 -06:00
mrq	3826f9bae4	saner mask creation? (it doesnt matter, kv cache wont work)	2024-11-02 21:00:21 -05:00
mrq	ef1c17430f	skip step on nan loss (ironically I have not had a nan loss after adding this), throw exception with invalid cfg.dataset.sample_type and sample_order combination (because I was tricked by this in my yaml and had inconsistent vram usage)	2024-11-01 20:54:53 -05:00
mrq	8eb9a4056b	modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling	2024-10-22 18:12:39 -05:00
mrq	0dfab973e7	oops	2024-10-18 09:40:06 -05:00
mrq	75b90be325	cleaned up unused config flags, allow less strict yaml by pruning missing keys, renamed some dataset configs to be more unified	2024-10-17 17:06:48 -05:00
mrq	f88097ccf6	add config option to set the rate of sampling randomly vs similar speakers during training	2024-10-16 14:27:58 -05:00
mrq	bef43a0c18	added experimental entropix sampling support	2024-10-11 21:18:26 -05:00
mrq	85d85c1351	more arg creep for demo page	2024-10-10 19:40:01 -05:00
mrq	75a4c866d6	more demo page tweaks, added arg to force enable/disable LoRAs for inferencing (to-do: setup arg flags to handle this, and checkbox in web UI)	2024-10-10 19:04:12 -05:00
mrq	96d05be73c	demo page tweaks	2024-10-10 13:52:37 -05:00
mrq	2ea978f318	added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval	2024-10-10 13:40:25 -05:00
mrq	ff7a1b4163	coerce into path for other sampler_types (it's required for sampling for similar utterances)	2024-09-26 18:37:56 -05:00
mrq	f24547ad4e	add top_k sampling / offset for prompt similar utterance sampling	2024-09-26 16:26:40 -05:00
mrq	c5e9142863	added option to retokenize phonemes for hdf5 (to save having to remake my hdf5 file)	2024-09-21 13:08:01 -05:00
mrq	536c11c4ac	actually validated and fixed sampling similar utterances for the prompt (hopefully nothing else is needed)	2024-09-21 12:59:51 -05:00
mrq	d31f27119a	regex replace out the (lang) markers in espeak, updated tokenizer vocab as lazily as possible to not have unk tokens	2024-09-21 12:29:28 -05:00
mrq	769f67dcfe	actually fix validation of phonemes in the symmap	2024-09-21 12:19:34 -05:00
mrq	fe241f6a99	support for wildcard in training/validation/noise dataset array (to-do: a better way to query between metadata folder and data folder)	2024-09-18 21:34:43 -05:00
mrq	fa9d3f6c06	lang fixes / reworked phoneme symmap validation	2024-09-18 19:36:03 -05:00
mrq	84647f588a	more tweaks	2024-09-18 16:43:57 -05:00
mrq	ebac1db16c	maybe final tweaks, I really needed to unify my json read/write and orjson is proven to be fast enough for me to try and rely on it more	2024-09-17 22:57:04 -05:00
mrq	a9fbe81f98	oops	2024-09-17 15:25:12 -05:00
mrq	c440c4fe7e	relegated processing similarity data into vall_e.emb.similarity since it's easier, seems to work?	2024-09-17 14:37:21 -05:00
mrq	56f25f7a9b	more stuff for similar-speaker prompt sampling (to-do: actually test if this works...)	2024-09-16 23:10:29 -05:00
mrq	1c615a0f52	helper script (vall_e.emb.similar) to figure out the best way to compute similarity scores for audio (iunno how to go about it desu)	2024-09-10 16:34:23 -05:00
mrq	d059f6f56d	added helper script to process Emilia (amphion/Emilia-Dataset), clean up espeak phonemes for non-English transcriptions with English words (because for some reason espeak injects (en){word}(lang) markers and it's annoying)	2024-09-09 09:57:32 -05:00
mrq	31e8b7edb8	tweaks and fixes for lora stuffs	2024-09-08 18:05:21 -05:00
mrq	fa93061b3e	more fixes, moved sampler state dict to a better place, eval works again	2024-09-06 16:59:56 -05:00
mrq	341e19162b	fixes, again	2024-09-06 11:41:41 -05:00
mrq	94cf81d38c	tweak	2024-09-05 23:21:18 -05:00
mrq	54547b74d8	experimental implementation of STT (need to actually test on a model, test trainer seems to work)	2024-09-05 20:43:20 -05:00
mrq	32287710a2	moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)	2024-08-29 13:27:16 -05:00
mrq	d636edd3a2	added flash_attn LlamaAttention (including flash_attn==1.0.9)	2024-08-18 20:51:14 -05:00
mrq	2a1794c084	ughghghhhh	2024-08-09 21:15:01 -05:00
mrq	c658a7b440	make loss scaling opt-in rather than automatically determined (because it seems a DAC-based model really doesnt like loss scaling)	2024-08-09 10:51:36 -05:00
mrq	d04f6911b4	oops	2024-08-08 19:38:55 -05:00
mrq	0aa59e6f3f	uncommented block that writes the metadata on HDF5 creation	2024-08-08 19:21:29 -05:00
mrq	79a6781c9e	fix vall_e.data --action=hdf5 actually transcribing because past me completely forgot it tried to already put the transcribe/process dataset scripts inside the module before	2024-08-08 07:51:42 -05:00
mrq	eac353cd0b	busy work and cleanup while I wait for 1TB of audio to quantize... again.	2024-08-06 20:23:33 -05:00
mrq	c09133d00f	added safetensors support (with metadata) and feed whatever torch.load/torch.save into it	2024-08-03 23:15:20 -05:00
mrq	6a733eb2ed	changed torch.Tensor().to(device, dtype) to just torch.tensor(..., device, dtype) because it's been bothering my autism that I'm creating tensors then converting rather than creating with the right device/dtype, some 'optimization' to compile the model but it doesnt seem to do anything useful	2024-08-03 22:10:21 -05:00
mrq	97c5241bef	fixes, throw an exception when using NAR only model with non-unified position IDs, since for some reason it outputs garbage for the NAR	2024-08-02 22:25:49 -05:00
mrq	4456d3172b	that's what I get for testing without hdf5 on my previous machine....	2024-08-02 20:44:01 -05:00
mrq	ce8bb1e4f7	sanity cleanups with weird off-by-one-ness, cleaned up and validated vall_e.models.experimental works again	2024-07-27 15:36:05 -05:00
mrq	682e4387dc	oops (fixed proms being erased from a config oversight)	2024-07-25 12:39:57 -05:00
mrq	75b04686f8	added prom-less training / inferencing, some other things	2024-07-22 19:36:07 -05:00
mrq	491ae2a684	some insanity for sanity checks (some phonemes from phonemizing japanese are not in my tokenizer...)	2024-07-22 00:30:40 -05:00
mrq	e19aa643a6	cleaned up demo page creation, added option to pass in RVQ level sampling distribution for training	2024-07-21 19:12:03 -05:00
mrq	d87b492295	added rudimentary demo page creator (currently just embeds base64 wavs into the page, need to test not doing that)	2024-07-19 20:49:40 -05:00
mrq	28a674e0f1	fixes...	2024-07-18 23:25:32 -05:00
mrq	39f961abcd	test trainer (vall_e.models.ar_nar) tests some SpeechX features	2024-07-18 18:46:45 -05:00
mrq	83a0954f85	fixes for re-introducing SpeechX tasks (need to actually validate if these all do the right things)	2024-07-18 17:16:32 -05:00
mrq	bccbb77a1a	added option to either naively concat codes to concat audio waveforms (prior behavior) or to decode => concat => encode instead (although this only currently happens for prom sampling if an utternace is too small)	2024-07-18 16:48:41 -05:00
mrq	97e768601c	re-introducing SpeechX tasks (need to validate them all, everything works with base tts anyways)	2024-07-18 16:16:14 -05:00
mrq	3acc54df22	allow loading a different model within the web ui (apparently I did not have the web UI in the documentation)	2024-07-15 19:59:48 -05:00
mrq	312a8e3ead	add shuffle to samplers that can support it	2024-06-30 11:36:46 -05:00
mrq	bc2a6fa756	sanity cleanup: moved experimental features under its own thing	2024-06-30 10:37:33 -05:00
mrq	793ccb16fb	ugh	2024-06-29 22:14:35 -05:00
mrq	c4dd523b6f	change from chunk-slicing paths for distributed dataloader to instead interleave	2024-06-29 10:10:35 -05:00
mrq	dd40463803	limit eval size because the training batch size seems to be used for the eval dataloader, somehow (bandaid)	2024-06-29 09:11:28 -05:00
mrq	591d3ac848	have eval dataloader use eval batch size for batchedordersampler	2024-06-28 22:44:00 -05:00
mrq	83075c1505	sort duration buckets to ensure that paths sorted-by-duration are actually sorted by duration (because i didnt know that python dicts can have non-strings as keys), added batching samples based on total duration to ensure best training throughput	2024-06-28 22:28:54 -05:00
mrq	8fffb94964	backport fix from tortoise_tts with local trainer + loading state when training lora	2024-06-25 13:41:29 -05:00
mrq	19410a919e	ugh	2024-06-15 12:29:03 -05:00
mrq	d343bde09b	residual_in_fp32=False for mamba arch backends because it breaks the classifier (output projection / lm head / what-have-you) under AMP	2024-06-15 12:08:03 -05:00
mrq	31f71fa134	sampler update (some brainworm just never actually had a sampler for sample_type=path)	2024-06-14 16:55:40 -05:00
mrq	b3b67f34ac	added option to sort paths by durations to better group equally lengthed sequences together (and there was maybe a logic error from creating the samplers and then interleave-reordering paths, desyncing them, maybe)	2024-06-13 22:37:34 -05:00
mrq	cca542a4c0	ugh	2024-06-11 23:59:28 -05:00
mrq	65a8960305	option to split classifier per-level instead of sharing one (at this point I'm just scrambling to try and cope with training a DAC model, the NAR is being a pain)	2024-06-11 22:28:59 -05:00
mrq	234f9efc6e	ugh	2024-06-09 11:39:43 -05:00
mrq	132a02c48b	sanity cleanup, backup config yaml for each log file	2024-06-09 11:22:52 -05:00
mrq	4ade2b60ee	ugh	2024-06-06 21:57:11 -05:00
mrq	014e565c4b	tweaks	2024-06-04 20:41:13 -05:00
mrq	6d5bd0156a	fixes	2024-06-04 18:50:48 -05:00
mrq	ed3aeaf3a1	copy pasted from test to actual trainer	2024-06-04 18:40:30 -05:00
mrq	0aa01ba31a	forgot one crucial detail (you need the previous RVQ level to keep coherence between all RVQ levels) (experimental deinterleaved is a bit crusty though)	2024-06-04 18:30:30 -05:00
mrq	406ff7bbe1	re-implemented config.model.interleave for the HF-compat experimental method	2024-06-04 14:19:52 -05:00
mrq	c93d5863fd	fixes	2024-06-04 00:07:00 -05:00
mrq	934672252b	feverish cleanup	2024-06-03 21:28:49 -05:00
mrq	8cf176ab46	ugh	2024-06-01 10:46:42 -05:00
mrq	d0ebce6bac	ugh	2024-06-01 10:30:13 -05:00
mrq	74df2f5332	split sampler dict by global_rank, also handle splitting dataset paths by global_rank if sampler_type == path (because I do not trust DistributedSampler) (need to test)	2024-06-01 09:29:49 -05:00
mrq	ddbacde0d1	DAC just doesn't work well enough......	2024-05-25 11:07:52 -05:00
mrq	e3ef89f5aa	100x better for subtrain/eval to be by group instead	2024-05-19 16:40:14 -05:00
mrq	4bc7e5a6d1	fix loading without needing an hdf5 dataset already prepped (and some other incidental speedups during dataloader prep)	2024-05-18 07:14:26 -05:00
mrq	d88a5ca183	ugh	2024-05-16 07:25:33 -05:00
mrq	d9aabfa3ae	final tweaks, hopefully, again	2024-05-15 23:04:19 -05:00
mrq	2437a86efa	ugh	2024-05-12 13:02:15 -05:00
mrq	4f1593c8db	a bunch of shit to salvage my old encodec-quantized audio because dac-encoded audio just does not want to converge	2024-05-12 10:17:29 -05:00
mrq	14709ac67f	ughh	2024-05-12 07:30:59 -05:00
mrq	3774fcbdee	ugh	2024-05-11 22:58:38 -05:00
mrq	4d93a16ef7	might just be better to explicitly define prompt duration ranges, especially under a "train small contexts then increase it" training paradigm	2024-05-11 09:50:54 -05:00
mrq	0d5d545a40	crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc.	2024-05-09 20:28:20 -05:00
mrq	c6e0f905b5	final tweaks (again) before training restarts	2024-05-08 02:11:38 -05:00
mrq	33b7f81b94	small cleanups	2024-05-04 22:37:22 -05:00
mrq	ffa200eec7	added option to specify frames per second for the given audio representation (Encodec is 75Hz, DAC is 41Hz (at 24K sources))	2024-05-04 12:05:41 -05:00

1 2 3 4 5

214 Commits