vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	0d5d545a40	crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc.	2024-05-09 20:28:20 -05:00
mrq	277dcec484	apparently I got an error for trying to serialize an errant tensor that made its way into the json, this could be remedied easily with recursively traversing the dict and coercing any objects to primitives, but I'm tired and I just want to start training and nap	2024-05-04 12:33:43 -05:00
mrq	c494894261	simple DDP wrapper (for my NVlink test)	2024-05-04 11:48:26 -05:00
mrq	a7b43b98b5	renamed cfg.bitsandbytes to cfg.optimizations (and having it serve as cfg.optimizations.bitsandbytes)	2024-05-02 20:08:59 -05:00
mrq	467fa1c5ee	wrapper fixes	2024-04-16 10:19:02 -05:00
mrq	f0c4baeb25	added Adagrad (experimenting with it), added 'extended' model size (16 layers instead of 12, experimenting with it)	2024-04-09 22:04:01 -05:00
mrq	4d75ee066c	actually do the Linear replacement with TE's Linear	2024-04-09 14:41:13 -05:00
mrq	9d97eb5104	added FP8 support through `NVIDIA/TransformerEngine`, added RetNet_HF through `syncdoth/RetNet` (as an alternative to branch away from torchscale)	2024-04-08 20:14:51 -05:00
mrq	f3c59c3e7e	cleaner replacement code (because I realized BitNet had an implementation for it too), added calculating gradient norm and performing gradient clipping in local trainer (non-deepspeed)	2024-03-01 20:18:43 -06:00
mrq	47435207f7	Added cfg.bitsandbytes.replace as a less intrusive alternative to cfg.bitsandbytes.inject to replace all Linear modules in a model	2024-03-01 19:20:10 -06:00
mrq	0427d8d076	logger broke for some reason, added flag to just tqdm.write instead, make cfg.bitsandbytes.bitnet==True yamls denoted since I'm sure they're not interoperable	2024-03-01 10:32:35 -06:00
mrq	35d78a2bb0	Yet Another Underlying Transformer Implementation (BitNet, will give it a few days to see how it fares)	2024-02-29 20:29:17 -06:00
mrq	cce929e136	nasty hotfix for transformer's Mixtral throwing an error when batch sizes > 1	2024-01-26 19:41:12 -06:00
mrq	9c198eb75a	added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works)	2023-12-20 18:45:58 -06:00
mrq	32d4271ca8	fixed issue with training from scratch (oops)	2023-10-21 09:55:38 -05:00
mrq	09cda7d3f9	added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup	2023-10-16 19:30:38 -05:00
mrq	65f500083d	tweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to work	2023-10-12 22:21:43 -05:00
mrq	893a610fad	cleanup, use deepspeed inferencing pathway if requested	2023-10-09 15:24:04 -05:00
mrq	3db7e7dea1	implicitly load checkpoint if deepspeed checkpoint not found, updated setup script to grab the diskcached dataloader things	2023-10-06 10:02:45 -05:00
mrq	4abd6564d1	fixed training stats not loading from exported weights, a bit of a readme cleanup, updated example training yaml	2023-09-23 19:59:00 -05:00
mrq	9384900ce6	revert the frankensteined "train one model but hotload the other" since it kept loading the last exported weights and I'm not supporting this usecase anymore anyways	2023-09-22 13:04:17 -05:00
mrq	c0b25541e3	restructured some things with the model to remove dead weights	2023-09-20 19:10:59 -05:00
mrq	5ac119a6e7	added light web UI (need to port the telemetry disabling bandaids from aivc)	2023-09-09 16:17:20 -05:00
mrq	67617d7d69	also cull frozen_params in the params optimizer receives to reduce VRAM it consumes	2023-09-07 18:27:02 -05:00
mrq	8837bc34d7	added option to specify parameters to freeze per-model in YAML (because I need to see about committing atrocities with convering an AR into an AR+NAR)	2023-09-07 18:19:51 -05:00
mrq	ab5134f385	tweaks and fixes	2023-09-07 17:08:38 -05:00
mrq	e7a67410d1	oops	2023-09-07 09:14:03 -05:00
mrq	712808494f	added support for optional prodigy optimizer (https://github.com/konstmish/prodigy ) although it consumes a lot more VRAM per parameter	2023-09-06 20:33:16 -05:00
mrq	100ca6b7d0	added option to use SGD optimizer through the YAML, added option to pass in additional optimizer parameters through the YAML, added experimental unified AR+NAR model (does not seem fruitful in testing)	2023-09-06 18:58:35 -05:00
mrq	8a6c203277	added per-speaker samplers	2023-09-03 21:27:13 -05:00
mrq	81b05dabb9	accurate epoch metric is now reported (based on samples processed / length of dataset's paths, rather than naive assumptions)	2023-09-03 08:03:36 -05:00
mrq	57db3ccfa8	shuffled VALL-E continuous as a task tts-c instead, logic fixes for it	2023-09-02 12:23:40 -05:00
mrq	7f4388e591	added total samples processed and tokens processed (len of text tokens + len of target response tokens)	2023-08-28 11:02:45 -05:00
mrq	87c4bfedba	added ability to mark models as disabled for training, and hotloading them for eval/validation (useful if training only one model, or training a model per GPU)	2023-08-27 12:26:12 -05:00
mrq	0517d620b8	fixes with the local backend	2023-08-24 17:05:56 -05:00
mrq	501a857d5d	ops	2023-08-23 17:03:25 -05:00
mrq	4585824cd3	tweaks, including exporting on save/quit	2023-08-23 16:43:03 -05:00
mrq	b105f6211e	added ability to export weights mid-training to avoid CBT to yank the weights while the training script is running	2023-08-20 13:39:58 -05:00
mrq	2d1a9f10c0	nightmare of spaghetti that might break compat; mechanism to increase RVQ bins of an existing model without retraining, keeps sampled proms/resps at max RVQ level and trim off excess levels according to what model receives them, some other things I already forgot (I really hope no one else has weights being baked right now)	2023-08-19 15:06:33 -05:00
mrq	0b46c1e312	god I am inexperienced with retaining compat from previous weights, I hope no one actually has weights	2023-08-18 21:29:20 -05:00
mrq	2a71486cb6	preparing for SpeechX extensions	2023-08-18 20:58:07 -05:00
mrq	ced31fd9b7	removed the sampler as it's very misleading	2023-08-18 14:47:48 -05:00
mrq	599e47a813	might fix user inputted saving/quitting breaking when distributed	2023-08-15 23:52:20 -05:00
mrq	13571380be	made exporter make more sense	2023-08-13 22:56:28 -05:00
mrq	d7deaf6def	distributed training works now (hopefully)	2023-08-13 22:07:45 -05:00
mrq	2af09d0bef	fixed that mysterious discepancy between the reported losses (I am so freaking mad, my piss is boiling, I had to interrupt halfway through an epoch)	2023-08-05 15:25:41 -05:00
mrq	d89568a96e	some fixes for the local framework	2023-08-05 03:22:15 +00:00
mrq	c85101403f	big cleanup	2023-08-03 20:26:36 -05:00
mrq	2e03e5ac93	Fixed an issue with having fairseq installed at all will brick logging	2023-08-02 22:57:10 -05:00
mrq	f6597e2dfe	adjustments	2023-08-02 18:36:26 -05:00

1 2

52 Commits