vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	31785f4eeb	actually don't default to compute split losses, test bitnet model doesn't seem to be doing things right (despite debug printouts showing theyre roughly the same logit/loss sequences, could just be bitnet linears being not up to par on actual models)	2024-06-01 09:12:51 -05:00
mrq	e9c87060df	oops	2024-05-31 22:22:28 -05:00
mrq	b482ca19ff	added model config option to set KV head count for MQA/GQA instead of MHA for llama-based models (i think its very negligible both ways on such a small model size)	2024-05-31 19:32:37 -05:00
mrq	da473295b7	better way to compute per-segment losses	2024-05-28 19:29:54 -05:00
mrq	5af6f41c94	added loss calcs against prom (requires the right settings for not shit results, disabled by default)	2024-05-27 08:43:00 -05:00
mrq	ddbacde0d1	DAC just doesn't work well enough......	2024-05-25 11:07:52 -05:00
mrq	458b95d196	added option to split between text loss and audio loss (to-do: document this better), because it may or may not be a problem with LLaMA-backed models because my loss hovers around 3.9 / 56% accuracy despite sounding decent at the moment	2024-05-19 11:23:56 -05:00
mrq	8d79f78e0a	god I need to replace omegaconf	2024-05-12 14:01:52 -05:00
mrq	2437a86efa	ugh	2024-05-12 13:02:15 -05:00
mrq	3774fcbdee	ugh	2024-05-11 22:58:38 -05:00
mrq	856545f8bb	nan loss detection (should have added it earlier), loss scaling for local backend + fp16	2024-05-11 22:23:29 -05:00
mrq	3337c69e5a	leverage between xformers and `torch.backends.cuda.sdp_kernel` for attention	2024-05-11 17:14:05 -05:00
mrq	0b6499601b	sanitizing	2024-05-11 16:31:05 -05:00
mrq	04a80d6b55	maybe it's better to be more explicit in deepspeed configs	2024-05-11 13:57:43 -05:00
mrq	4d93a16ef7	might just be better to explicitly define prompt duration ranges, especially under a "train small contexts then increase it" training paradigm	2024-05-11 09:50:54 -05:00
mrq	1547de5020	haha...	2024-05-09 23:15:52 -05:00
mrq	b7bd885651	some possible sanity with deepspeed config	2024-05-09 22:48:42 -05:00
mrq	b6131565ad	autotune?	2024-05-09 21:25:40 -05:00
mrq	6ed6ab8c03	a bit more cleanup for deepspeed ds_cfg creation	2024-05-09 21:00:26 -05:00
mrq	0d5d545a40	crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc.	2024-05-09 20:28:20 -05:00
mrq	215800484d	correcting my wrong of assuming I could just use raw 24Khz audio in the 44Khz DAC without too much of an issue (there are issues)	2024-05-04 23:49:15 -05:00
mrq	33b7f81b94	small cleanups	2024-05-04 22:37:22 -05:00
mrq	ffa200eec7	added option to specify frames per second for the given audio representation (Encodec is 75Hz, DAC is 41Hz (at 24K sources))	2024-05-04 12:05:41 -05:00
mrq	c494894261	simple DDP wrapper (for my NVlink test)	2024-05-04 11:48:26 -05:00
mrq	a7b43b98b5	renamed cfg.bitsandbytes to cfg.optimizations (and having it serve as cfg.optimizations.bitsandbytes)	2024-05-02 20:08:59 -05:00
mrq	b5d1456a09	backwards compat for my shitty old weights (was testing if disabling AudioEmbedding summing magically made things better (it did not))	2024-04-29 22:14:01 -05:00
mrq	5120ffdda7	god it would be nice to know the best way to handle audio embeddings, because I genuinely don't know without skimming through papers or devoting X amount of GPU hours in training	2024-04-29 18:24:05 -05:00
mrq	caad7ee3c9	final tweaks, hopefully	2024-04-28 22:28:29 -05:00
mrq	071fb97777	dataset preparation script updates, caved and am using HF tokenizer now	2024-04-21 14:49:18 -05:00
mrq	a8ffa88844	it slipped my mind that technically DAC can be used at any sample rate, since it models waveforms; make it a config YAML option to allow this behavior	2024-04-19 18:36:54 -05:00
mrq	4f5c9e518a	actually use the passed-through sample rate from encode for DAC because it does its own resampling I guess	2024-04-18 13:32:41 -05:00
mrq	5ff2b4aab5	finally swallowing the Descript-Audio-Codec pill (I guess I'm going to have to regenerate my entire dataset)	2024-04-17 20:39:35 -05:00
mrq	b0bd88833c	refractor cleanup, had a revelation on how I can handle a batch of varying tasks	2024-04-16 21:04:48 -05:00
mrq	aa1e25fbf5	backwards compat for old YAMLs with `models`, option to set flash attention 2 for Llama (and derivatives), included `syncdoth/RetNet`s torchscale retnet for shits and grins, etc.	2024-04-16 10:02:31 -05:00
mrq	545162195b	deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things	2024-04-15 19:54:32 -05:00
mrq	789bb5d11b	add an optional label override for model loading (used for easy testing between 12/16/20/24 layered model)	2024-04-13 12:43:35 -05:00
mrq	f0c4baeb25	added Adagrad (experimenting with it), added 'extended' model size (16 layers instead of 12, experimenting with it)	2024-04-09 22:04:01 -05:00
mrq	9d97eb5104	added FP8 support through `NVIDIA/TransformerEngine`, added RetNet_HF through `syncdoth/RetNet` (as an alternative to branch away from torchscale)	2024-04-08 20:14:51 -05:00
mrq	7075c2a5f0	added an option to allow injecting embeddings from another model, because it dawned upon me how valuable embeddings from a good model can be for subsequent trainings (defined under cfg.models._embeddings as a relative path to the yaml)	2024-04-04 19:11:49 -05:00
mrq	47435207f7	Added cfg.bitsandbytes.replace as a less intrusive alternative to cfg.bitsandbytes.inject to replace all Linear modules in a model	2024-03-01 19:20:10 -06:00
mrq	0427d8d076	logger broke for some reason, added flag to just tqdm.write instead, make cfg.bitsandbytes.bitnet==True yamls denoted since I'm sure they're not interoperable	2024-03-01 10:32:35 -06:00
mrq	35d78a2bb0	Yet Another Underlying Transformer Implementation (BitNet, will give it a few days to see how it fares)	2024-02-29 20:29:17 -06:00
mrq	c690aa509d	fixes and compat (MoE-fying an existing model and retraining from there just ruins it after a second of audio...)	2023-12-25 21:20:32 -06:00
mrq	9c198eb75a	added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works)	2023-12-20 18:45:58 -06:00
mrq	32d4271ca8	fixed issue with training from scratch (oops)	2023-10-21 09:55:38 -05:00
mrq	3195026dba	fixed issue with the 'add another target audio to artificially create longer sequences' for HDF5 just duplicating the utterance initially sampled	2023-10-18 20:38:33 -05:00
mrq	65f500083d	tweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to work	2023-10-12 22:21:43 -05:00
mrq	8740cdefc6	added initial support for languages (still testing, marked as model version 3), added experimental 'context extend by limiting the resp context' (untested)	2023-10-11 20:38:40 -05:00
mrq	6045cbce94	added experimental option to append utterances for training target (emphasis on experimental)	2023-10-11 17:32:45 -05:00
mrq	893a610fad	cleanup, use deepspeed inferencing pathway if requested	2023-10-09 15:24:04 -05:00
mrq	63cc9cf37a	added compat flags for torchscale because the maintainer for torchscale broke compat for existing models	2023-10-05 16:39:46 -05:00
mrq	153f8b293c	added min-x and min-y arguments to plot.py, helper script to download from my existing checkpoint	2023-10-04 19:41:37 -05:00
mrq	d12877ee09	added option to set probability of selecting the AR during training under a monolithic AR+NAR, added some more to-dos while I have them in mind	2023-10-02 16:52:42 -05:00
mrq	c0b25541e3	restructured some things with the model to remove dead weights	2023-09-20 19:10:59 -05:00
mrq	d07c63b9d8	unified more things with training the AR+NAR monolothic model	2023-09-12 15:54:41 -05:00
mrq	40ef34e1ca	this embedding class definitely works, and migrating from the previous embedding weights seems to work.	2023-09-11 14:13:42 -05:00
mrq	671dca88ee	throw error when no reference audio is provided in the web UI because someone keeps doing that in the HF space	2023-09-10 15:50:50 -05:00
mrq	c74fe2f718	tweaks to web UI	2023-09-09 22:27:20 -05:00
mrq	f69aad9c65	some day I'll get it right	2023-09-08 15:36:26 -05:00
mrq	8837bc34d7	added option to specify parameters to freeze per-model in YAML (because I need to see about committing atrocities with convering an AR into an AR+NAR)	2023-09-07 18:19:51 -05:00
mrq	c47fc3274e	added backwards compat flag	2023-09-07 17:12:17 -05:00
mrq	e7a67410d1	oops	2023-09-07 09:14:03 -05:00
mrq	100ca6b7d0	added option to use SGD optimizer through the YAML, added option to pass in additional optimizer parameters through the YAML, added experimental unified AR+NAR model (does not seem fruitful in testing)	2023-09-06 18:58:35 -05:00
mrq	451726fdd5	added ability to disable activation checkpointing through the YAML (it is very VRAM intensive at double layer size)	2023-09-05 15:38:21 -05:00
mrq	2f9cd0842f	merged dedicated interleaved AR code with the normal AR code	2023-09-03 22:46:08 -05:00
mrq	8a6c203277	added per-speaker samplers	2023-09-03 21:27:13 -05:00
mrq	57db3ccfa8	shuffled VALL-E continuous as a task tts-c instead, logic fixes for it	2023-09-02 12:23:40 -05:00
mrq	2f06166ddd	cleanups	2023-09-01 21:33:51 -05:00
mrq	e40c0d34a0	somewhat got recurrent forward working (it's as accurate as chunkwise forward: it's not accurate at all), added option to use AMP instead of blanket setting the weight's dtype	2023-09-01 20:58:29 -05:00
mrq	2bc2d08b09	(need to verify) added modifying model size and config bool to align with VALL-E continuous' methodology	2023-09-01 17:19:34 -05:00
mrq	87c4bfedba	added ability to mark models as disabled for training, and hotloading them for eval/validation (useful if training only one model, or training a model per GPU)	2023-08-27 12:26:12 -05:00
mrq	165a1154e0	Undo naive=False test flag, this shouldn't have made its way in	2023-08-26 22:00:43 -05:00
mrq	78378ed1ce	overhauled dataloading code to be marginally faster, mostly cleaned up, and can leverage a metadata json to help things out	2023-08-26 19:53:23 -05:00
mrq	00ad4af651	updated draconian requirement for espeak-ng to be installed and the env var set to the dll for Windows	2023-08-24 14:57:01 -05:00
mrq	4585824cd3	tweaks, including exporting on save/quit	2023-08-23 16:43:03 -05:00
mrq	d106598403	do not utilize diskcache if a config yaml is not loaded	2023-08-23 11:02:15 -05:00
mrq	7b1b82e0e5	inferencing cleanup	2023-08-20 21:36:02 -05:00
mrq	736c077282	ops	2023-08-20 13:42:18 -05:00
mrq	2d1a9f10c0	nightmare of spaghetti that might break compat; mechanism to increase RVQ bins of an existing model without retraining, keeps sampled proms/resps at max RVQ level and trim off excess levels according to what model receives them, some other things I already forgot (I really hope no one else has weights being baked right now)	2023-08-19 15:06:33 -05:00
mrq	f7f6d3bf6d	validated that SpeechX tasks cse and nse works, added a method to test each task by invoking `python3 -m vall_e.data --action=tasks --tasks='sr,se,cse,nse'`	2023-08-19 09:50:07 -05:00
mrq	8f42c578c9	setting up for allowing training for a partial amount of the speechx tasks (do NOT try this at home yet without a proper model, as performance is predecated on having a solid base vall-e model for the tasks	2023-08-19 00:16:08 -05:00
mrq	ae9d38aa31	forgot to have it pull from specified noise to the hdf5 dataset	2023-08-18 23:57:07 -05:00
mrq	77292c42f9	tested the training preparation for tasks ns, sr, and tse (I don't expect it to go well with only 2 RVQ bins)	2023-08-18 23:55:40 -05:00
mrq	bbb0563b3d	pseudocode polyfill stub some other flavor of working on adding the tasks	2023-08-18 22:22:13 -05:00
mrq	fb4e816823	oops	2023-08-18 21:11:19 -05:00
mrq	2a71486cb6	preparing for SpeechX extensions	2023-08-18 20:58:07 -05:00
mrq	ced31fd9b7	removed the sampler as it's very misleading	2023-08-18 14:47:48 -05:00
mrq	ee58db746f	actually make the evaluation dataset shuffled for sample_type=speaker	2023-08-17 15:04:45 -05:00
mrq	d7152fc7b9	added pruning of old checkpoints if specified (cfg.trainer.keep_last_checkpoints)	2023-08-16 20:12:12 -05:00
mrq	44c08d828e	added sample_type that samples from speakers to truly balance an epoch by speakers rather than the entire dataset and a sampler that tries to balance by speakers	2023-08-16 19:39:21 -05:00
mrq	1e3e1d9315	tweaks	2023-08-15 21:58:16 -05:00
mrq	13571380be	made exporter make more sense	2023-08-13 22:56:28 -05:00
mrq	d7deaf6def	distributed training works now (hopefully)	2023-08-13 22:07:45 -05:00
mrq	d89568a96e	some fixes for the local framework	2023-08-05 03:22:15 +00:00
mrq	5970f254e3	some fixes for the local framework	2023-08-05 02:17:30 +00:00
mrq	608c1970eb	ops	2023-08-03 20:36:19 -05:00
mrq	c85101403f	big cleanup	2023-08-03 20:26:36 -05:00
mrq	f6597e2dfe	adjustments	2023-08-02 18:36:26 -05:00
mrq	bf8cedc9dd	Rewrite init	2023-08-02 21:53:35 +00:00

1 2 3 4

199 Commits