vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	fe241f6a99	support for wildcard in training/validation/noise dataset array (to-do: a better way to query between metadata folder and data folder)	2024-09-18 21:34:43 -05:00
mrq	b5bec0c9ce	oops, turns out these are not split by speaker names already........ (also added sampling the dataset in the webui for easy viewing)	2024-09-18 20:19:46 -05:00
mrq	fa9d3f6c06	lang fixes / reworked phoneme symmap validation	2024-09-18 19:36:03 -05:00
mrq	84647f588a	more tweaks	2024-09-18 16:43:57 -05:00
mrq	ebac1db16c	maybe final tweaks, I really needed to unify my json read/write and orjson is proven to be fast enough for me to try and rely on it more	2024-09-17 22:57:04 -05:00
mrq	6ceed866b5	faster	2024-09-17 22:44:36 -05:00
mrq	f00283440c	faster	2024-09-17 22:26:31 -05:00
mrq	be22b65300	solved my problem	2024-09-17 21:58:44 -05:00
mrq	8f41d1b324	more tweaks	2024-09-17 16:26:30 -05:00
mrq	804ddb5182	optimizations (6 hours to do cosine similarities on a speaker set of just 17k utterances................)	2024-09-17 15:51:45 -05:00
mrq	a9fbe81f98	oops	2024-09-17 15:25:12 -05:00
mrq	c440c4fe7e	relegated processing similarity data into vall_e.emb.similarity since it's easier, seems to work?	2024-09-17 14:37:21 -05:00
mrq	56f25f7a9b	more stuff for similar-speaker prompt sampling (to-do: actually test if this works...)	2024-09-16 23:10:29 -05:00
mrq	69f140ba45	fix oversight with phonemizing french because espeak defines french as fr-fr instead of fr (even though spain spanish is es and not es-sp or some shit, but portugal portuguese is pt-pt)	2024-09-13 12:53:36 -05:00
mrq	4f3c7a37c8	also do text similarities (dont know what use I'll have for this)	2024-09-10 16:45:59 -05:00
mrq	1c615a0f52	helper script (vall_e.emb.similar) to figure out the best way to compute similarity scores for audio (iunno how to go about it desu)	2024-09-10 16:34:23 -05:00
mrq	d059f6f56d	added helper script to process Emilia (amphion/Emilia-Dataset), clean up espeak phonemes for non-English transcriptions with English words (because for some reason espeak injects (en){word}(lang) markers and it's annoying)	2024-09-09 09:57:32 -05:00
mrq	31e8b7edb8	tweaks and fixes for lora stuffs	2024-09-08 18:05:21 -05:00
mrq	54203c059d	validated rep pen for STT (sometimes needed to wrangle the model)	2024-09-08 08:30:30 -05:00
mrq	6a967f91b9	oops	2024-09-07 22:13:49 -05:00
mrq	5d66a7db52	webui cleanup, more tweaks, default to safetensors in config	2024-09-07 21:45:05 -05:00
mrq	a6ad0577b8	cleanup the resultant text from STT	2024-09-06 18:44:25 -05:00
mrq	fa93061b3e	more fixes, moved sampler state dict to a better place, eval works again	2024-09-06 16:59:56 -05:00
mrq	4bd9bb39c8	webui for STT (still need to bake the model to handle it better, a few hours so far has it generate what looks like a normal transcription but does not correlate to the audio right now)	2024-09-06 15:13:04 -05:00
mrq	d33a906119	cleanup for AR_NAR inferencing to allow both TTS and STT tasks simultaneously (need to have training eval do this to though)	2024-09-06 14:30:12 -05:00
mrq	341e19162b	fixes, again	2024-09-06 11:41:41 -05:00
mrq	94cf81d38c	tweak	2024-09-05 23:21:18 -05:00
mrq	413097f5f7	fixes	2024-09-05 21:42:59 -05:00
mrq	54547b74d8	experimental implementation of STT (need to actually test on a model, test trainer seems to work)	2024-09-05 20:43:20 -05:00
mrq	d319d33368	haha	2024-09-04 14:52:26 -05:00
mrq	619369236b	ugh	2024-08-30 21:10:57 -05:00
mrq	168e203942	ugh	2024-08-30 14:39:07 -05:00
mrq	685f4faec0	ugh	2024-08-30 10:46:26 -05:00
mrq	32287710a2	moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)	2024-08-29 13:27:16 -05:00
mrq	d423bc03c2	fixed attentions for MoE	2024-08-27 17:02:42 -05:00
mrq	b7b99a25f1	added ability to specify attention backend for CLI and webui (because im tired of editing the yaml)	2024-08-26 19:33:51 -05:00
mrq	0d706ec6a1	added fused_attn (triton-based fused attention) and simply just query for flash_attn under rocm	2024-08-26 19:13:34 -05:00
mrq	6b0891448c	pain (some shit to try and get some flash attention for ROCm (gfx1100) through triton fused attention but no good)	2024-08-25 20:07:27 -05:00
mrq	40e1799adc	fixed xformers and flash_attn to actually work now	2024-08-19 01:03:35 -05:00
mrq	29c35528e5	the sooner I accept there's no FA for V100s the sooner I'll go to bed	2024-08-18 23:54:33 -05:00
mrq	d636edd3a2	added flash_attn LlamaAttention (including flash_attn==1.0.9)	2024-08-18 20:51:14 -05:00
mrq	054d28573a	my DAC dataset again managed to only have some utterances with only 8 of 9 RVQ levels, this fixes an oversight from it	2024-08-09 21:18:01 -05:00
mrq	2a1794c084	ughghghhhh	2024-08-09 21:15:01 -05:00
mrq	ed373957e2	maybe not	2024-08-09 11:38:08 -05:00
mrq	c658a7b440	make loss scaling opt-in rather than automatically determined (because it seems a DAC-based model really doesnt like loss scaling)	2024-08-09 10:51:36 -05:00
mrq	d04f6911b4	oops	2024-08-08 19:38:55 -05:00
mrq	0aa59e6f3f	uncommented block that writes the metadata on HDF5 creation	2024-08-08 19:21:29 -05:00
mrq	79a6781c9e	fix vall_e.data --action=hdf5 actually transcribing because past me completely forgot it tried to already put the transcribe/process dataset scripts inside the module before	2024-08-08 07:51:42 -05:00
mrq	949339a3fa	do not include SDPA attention if there's no available SDPA backends	2024-08-06 20:42:39 -05:00
mrq	613024ec0d	ugh	2024-08-06 20:35:15 -05:00
mrq	eac353cd0b	busy work and cleanup while I wait for 1TB of audio to quantize... again.	2024-08-06 20:23:33 -05:00
mrq	f284c7ea9c	do mixed-precision for AMP inside the compress function itself, because the loudness function gripes when using a float16 (non-power of 2 lengths) or bfloat16 (something about views for bfloat16)	2024-08-06 15:08:37 -05:00
mrq	b6ba2cc8e7	tweaked vall_e.emb.process to instead process audio one file at a time instead of all the files for a given speaker to avoid OOMing on less-memory-filled systems with --low-memory	2024-08-06 14:24:40 -05:00
mrq	9710b06b74	tweaks and things	2024-08-06 08:17:25 -05:00
mrq	134dac8c2b	re-adapted process_libritts.py to a 'better' way (better because it processed without needing to shuffle a bunch of things and adapt to cope or something)	2024-08-05 20:34:58 -05:00
mrq	3f73fcca29	oops	2024-08-05 20:12:13 -05:00
mrq	597441e48b	moved transcribe and process dataset scripts to vall_e/emb within the module itself, argparse-ified transcription script	2024-08-05 19:40:50 -05:00
mrq	7cdfa3dc0c	updated process_datasets.py, added argparsing so I can mostly stop manually editing things, and some other cleanup	2024-08-05 15:59:25 -05:00
mrq	debcc93e7e	add adapted MixtralAttention for when I make a bad decision to actually train a MoE	2024-08-04 22:03:22 -05:00
mrq	10aaf840e7	added export option to convert Llama to MixtralMoE for another dumb experiment	2024-08-04 20:25:06 -05:00
mrq	3a65cc4b22	fix issue with sft and shared tensors...	2024-08-04 19:56:21 -05:00
mrq	23f3b56fda	oops	2024-08-04 08:18:57 -05:00
mrq	d19f93a2c0	documentation update	2024-08-04 00:14:49 -05:00
mrq	2cb465018b	implicitly load either normal pickled weights or safetensors on loading the model	2024-08-03 23:34:18 -05:00
mrq	c09133d00f	added safetensors support (with metadata) and feed whatever torch.load/torch.save into it	2024-08-03 23:15:20 -05:00
mrq	6a733eb2ed	changed torch.Tensor().to(device, dtype) to just torch.tensor(..., device, dtype) because it's been bothering my autism that I'm creating tensors then converting rather than creating with the right device/dtype, some 'optimization' to compile the model but it doesnt seem to do anything useful	2024-08-03 22:10:21 -05:00
mrq	ab673e0426	add cap for NAR-len training, to avoid any weird cases in early training where it'll just mess up and generate long lengths	2024-08-03 21:00:32 -05:00
mrq	4d2b88b164	throw exception if training, but no model is set to train (because i ran into this wondering what the hell was happening)	2024-08-03 20:51:23 -05:00
mrq	d0a5c7eca2	more coping with the NAR len	2024-08-03 20:23:36 -05:00
mrq	11fa3da665	some cleanup, fixed the wrapper attention to explicitly use other sdpa backends	2024-08-03 19:51:00 -05:00
mrq	9564ecda43	wrapper attention class for other sdpa backends + xformers seems to have broke...	2024-08-03 15:12:11 -05:00
mrq	9e1989be1b	tweaked initial NAR pass's initial token embeddings to use a different value, or osmething	2024-08-03 09:01:37 -05:00
mrq	26f74c5739	somehow fixed non-unified position IDs for the NAR-len	2024-08-03 08:43:42 -05:00
mrq	66407e5bdb	tweaks for the NAR-len model, maybe	2024-08-03 08:40:39 -05:00
mrq	97c5241bef	fixes, throw an exception when using NAR only model with non-unified position IDs, since for some reason it outputs garbage for the NAR	2024-08-02 22:25:49 -05:00
mrq	4456d3172b	that's what I get for testing without hdf5 on my previous machine....	2024-08-02 20:44:01 -05:00
mrq	7a77978096	oversight with using resize_modules	2024-08-02 20:28:49 -05:00
mrq	808a79ebaf	oops	2024-08-01 22:56:04 -05:00
mrq	443422ecb5	ugh, finally got some form of offloading working (need to test if it works on different GPUs, but GPU and CPU offloading seems to work in the test trainer)	2024-08-01 22:43:39 -05:00
mrq	c9ec6b28ef	it actually wasn't working because Engines.__init__() automatically moves the entire module to the requested device, which was being called after offloading the model in the test trainer (and it seems I cant do it without injecting a bunch of shit in modeling_llama.py)	2024-08-01 20:56:28 -05:00
mrq	b4c895114c	naive model offloading support (handles automatically splitting parts of the model to requested device per memory constraints, either inferred or requested in the yaml, input tensors are automatically migrated to the right device, it SEEMS to work for training under the test trainer when split between GPU and CPU) (this was specifically only because that Flux imagegen model released so I can test it there)	2024-08-01 20:12:06 -05:00
mrq	387358bc8a	fixes for the NAR-len model, and documentation some config options, and a better way to handle resizing modules on state_dict load	2024-07-31 20:35:09 -05:00
mrq	52d13b321f	I rather have it default to non-strict loading instead so I can clean up YAMLs	2024-07-30 22:24:38 -05:00
mrq	d7c6be6f78	fix weird regression in handling checkpoints when backend is local, but deepspeed checkpoints are in (it was handled with LoRA loading but not real loading...)	2024-07-30 22:15:56 -05:00
mrq	07f8e2ad06	added option to set the causal size (how many tokens to sample per AR step), but requires the model to be trained for this (which explains why recurrent chunk sampling just doesn't work for the retnet tests, obvious in hindsight)	2024-07-30 20:53:51 -05:00
mrq	ebf848d249	possible speedup for samplers that require a list of previous tokens (the DRY sampler made me realize that I should copy the tolist() thing from the rep pen sampler for everything else)	2024-07-29 20:23:26 -05:00
mrq	55b0121b1a	trying (and failing) to nail a weird regression in fancier attentions	2024-07-29 19:53:37 -05:00
mrq	c2f5b916fc	added what I think is DRY sampling	2024-07-29 19:15:07 -05:00
mrq	ce8bb1e4f7	sanity cleanups with weird off-by-one-ness, cleaned up and validated vall_e.models.experimental works again	2024-07-27 15:36:05 -05:00
mrq	06e948aec1	suppress warning on exit about distributed not being cleaned up (because I updated my system)	2024-07-25 16:50:47 -05:00
mrq	682e4387dc	oops (fixed proms being erased from a config oversight)	2024-07-25 12:39:57 -05:00
mrq	1acb0e9c84	added experimental training setting to perform token dropout to MAYBE compensate for errors from the preceding RVQ level (two types: token error offset, token dropout embedding replace)	2024-07-24 19:35:17 -05:00
mrq	611a1c4bdc	might help	2024-07-22 20:57:01 -05:00
mrq	188d116222	some weird fixes for an equally weird regression with LoRA loading	2024-07-22 20:47:24 -05:00
mrq	e33c4b0cb1	oops	2024-07-22 19:38:39 -05:00
mrq	75b04686f8	added prom-less training / inferencing, some other things	2024-07-22 19:36:07 -05:00
mrq	491ae2a684	some insanity for sanity checks (some phonemes from phonemizing japanese are not in my tokenizer...)	2024-07-22 00:30:40 -05:00
mrq	ad024f400f	actually pass language into dataset process script, fix coercing japanese into hiragana because espeak does not like kanji	2024-07-21 23:21:37 -05:00
mrq	3e5ca3a201	more demo page tweaks	2024-07-21 19:31:13 -05:00
mrq	7366f36f81	oops	2024-07-21 19:17:25 -05:00

1 2 3 4 5 ...

493 Commits