Commit Graph

372 Commits

Author SHA1 Message Date
mrq
74e531d391 ugh 2024-05-18 12:02:56 -05:00
mrq
4bc7e5a6d1 fix loading without needing an hdf5 dataset already prepped (and some other incidental speedups during dataloader prep) 2024-05-18 07:14:26 -05:00
mrq
d88a5ca183 ugh 2024-05-16 07:25:33 -05:00
mrq
d9aabfa3ae final tweaks, hopefully, again 2024-05-15 23:04:19 -05:00
mrq
8d79f78e0a god I need to replace omegaconf 2024-05-12 14:01:52 -05:00
mrq
5eb5db7f7f just don't use DAC 24Khz, it's bad 2024-05-12 13:41:17 -05:00
mrq
230da8b559 should be the final things to scramble around for, DAC's 24KHz model is unusable for this, but both encodec's 24KHz and DAC's 44KHz work 2024-05-12 13:22:08 -05:00
mrq
2437a86efa ugh 2024-05-12 13:02:15 -05:00
mrq
4f1593c8db a bunch of shit to salvage my old encodec-quantized audio because dac-encoded audio just does not want to converge 2024-05-12 10:17:29 -05:00
mrq
917eeb40d2 ughhh 2024-05-12 08:22:39 -05:00
mrq
9910c75d5a checkpointing for bitnet impl 2024-05-12 07:52:54 -05:00
mrq
14709ac67f ughh 2024-05-12 07:30:59 -05:00
mrq
3774fcbdee ugh 2024-05-11 22:58:38 -05:00
mrq
856545f8bb nan loss detection (should have added it earlier), loss scaling for local backend + fp16 2024-05-11 22:23:29 -05:00
mrq
a755eb3c62 ugh 2024-05-11 17:34:45 -05:00
mrq
88e9b9caff local ddp fix 2024-05-11 17:29:01 -05:00
mrq
3337c69e5a leverage between xformers and torch.backends.cuda.sdp_kernel for attention 2024-05-11 17:14:05 -05:00
mrq
d33c7bb7cf ugh 2024-05-11 16:47:19 -05:00
mrq
0b6499601b sanitizing 2024-05-11 16:31:05 -05:00
mrq
71e373064f remove redundant loss, tweak readme 2024-05-11 15:02:47 -05:00
mrq
04a80d6b55 maybe it's better to be more explicit in deepspeed configs 2024-05-11 13:57:43 -05:00
mrq
4d93a16ef7 might just be better to explicitly define prompt duration ranges, especially under a "train small contexts then increase it" training paradigm 2024-05-11 09:50:54 -05:00
mrq
bd0a36ba8d I swear I keep seeing tqdm flicker back a number 2024-05-10 18:36:01 -05:00
mrq
2109712e5b resolve deprecation warning that doesn't show on my old training rig but does on my new one 2024-05-09 23:25:44 -05:00
mrq
1547de5020 haha... 2024-05-09 23:15:52 -05:00
mrq
b7bd885651 some possible sanity with deepspeed config 2024-05-09 22:48:42 -05:00
mrq
c4b696ebeb oops 2024-05-09 22:33:40 -05:00
mrq
c22a177cf8 forgot to pass warmup to schedule free 2024-05-09 22:18:49 -05:00
mrq
b6131565ad autotune? 2024-05-09 21:25:40 -05:00
mrq
6ed6ab8c03 a bit more cleanup for deepspeed ds_cfg creation 2024-05-09 21:00:26 -05:00
mrq
0d5d545a40 crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc. 2024-05-09 20:28:20 -05:00
mrq
c6e0f905b5 final tweaks (again) before training restarts 2024-05-08 02:11:38 -05:00
mrq
215800484d correcting my wrong of assuming I could just use raw 24Khz audio in the 44Khz DAC without too much of an issue (there are issues) 2024-05-04 23:49:15 -05:00
mrq
9f738fbd5b seems I actually don't need RVQ bins 9-32 with the 24Khz DAC model........ (time to requantize my audio...) 2024-05-04 23:09:18 -05:00
mrq
33b7f81b94 small cleanups 2024-05-04 22:37:22 -05:00
mrq
8aa1b2dabf documentation update 2024-05-04 21:03:46 -05:00
mrq
253441b750 forgot to disable verbose flag 2024-05-04 13:13:52 -05:00
mrq
3dca1125f5 implemented xformers in HF's Llama (because theres no flash attention for Volta cards) 2024-05-04 13:07:45 -05:00
mrq
277dcec484 apparently I got an error for trying to serialize an errant tensor that made its way into the json, this could be remedied easily with recursively traversing the dict and coercing any objects to primitives, but I'm tired and I just want to start training and nap 2024-05-04 12:33:43 -05:00
mrq
ffa200eec7 added option to specify frames per second for the given audio representation (Encodec is 75Hz, DAC is 41Hz (at 24K sources)) 2024-05-04 12:05:41 -05:00
mrq
c494894261 simple DDP wrapper (for my NVlink test) 2024-05-04 11:48:26 -05:00
mrq
a7b43b98b5 renamed cfg.bitsandbytes to cfg.optimizations (and having it serve as cfg.optimizations.bitsandbytes) 2024-05-02 20:08:59 -05:00
mrq
b5d1456a09 backwards compat for my shitty old weights (was testing if disabling AudioEmbedding summing magically made things better (it did not)) 2024-04-29 22:14:01 -05:00
mrq
5120ffdda7 god it would be nice to know the best way to handle audio embeddings, because I genuinely don't know without skimming through papers or devoting X amount of GPU hours in training 2024-04-29 18:24:05 -05:00
mrq
6a11bc9cb6 update tokenizer because, for some reason, it had the wrong order for the special tokens to where eos = unk 2024-04-29 09:09:26 -05:00
mrq
57810e4ba4 metadata only path (might drop HDF5 since its giving file sizes twice as large as my actual unpacked dataset) 2024-04-28 23:03:09 -05:00
mrq
caad7ee3c9 final tweaks, hopefully 2024-04-28 22:28:29 -05:00
mrq
ffc334cf58 added dataset transcription helper script (now I don't ever have to touch ai-voice-cloning) (to-do: unify scripts into the module) 2024-04-21 17:43:20 -05:00
mrq
b251669536 forgot to fix up the test trainer 2024-04-21 14:58:04 -05:00
mrq
071fb97777 dataset preparation script updates, caved and am using HF tokenizer now 2024-04-21 14:49:18 -05:00
mrq
a8ffa88844 it slipped my mind that technically DAC can be used at any sample rate, since it models waveforms; make it a config YAML option to allow this behavior 2024-04-19 18:36:54 -05:00
mrq
8214aa23d7 converting over to a different intermediary dataset format 2024-04-18 21:24:06 -05:00
mrq
4f5c9e518a actually use the passed-through sample rate from encode for DAC because it does its own resampling I guess 2024-04-18 13:32:41 -05:00
mrq
2e9e6e68f7 Forgot I need to use the DAC's 44K model because 24K model has 32 codebooks instead of 9. 2024-04-17 20:59:25 -05:00
mrq
5ff2b4aab5 finally swallowing the Descript-Audio-Codec pill (I guess I'm going to have to regenerate my entire dataset) 2024-04-17 20:39:35 -05:00
mrq
b0bd88833c refractor cleanup, had a revelation on how I can handle a batch of varying tasks 2024-04-16 21:04:48 -05:00
mrq
467fa1c5ee wrapper fixes 2024-04-16 10:19:02 -05:00
mrq
aa1e25fbf5 backwards compat for old YAMLs with models, option to set flash attention 2 for Llama (and derivatives), included syncdoth/RetNets torchscale retnet for shits and grins, etc. 2024-04-16 10:02:31 -05:00
mrq
545162195b deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things 2024-04-15 19:54:32 -05:00
mrq
d69a00e389 Properly pass retention_mask for retnet-HF, attempt to fix recurrent forward for retnet (doesn't work still) 2024-04-14 13:12:50 -05:00
mrq
789bb5d11b add an optional label override for model loading (used for easy testing between 12/16/20/24 layered model) 2024-04-13 12:43:35 -05:00
mrq
f0c4baeb25 added Adagrad (experimenting with it), added 'extended' model size (16 layers instead of 12, experimenting with it) 2024-04-09 22:04:01 -05:00
mrq
4d75ee066c actually do the Linear replacement with TE's Linear 2024-04-09 14:41:13 -05:00
mrq
9d97eb5104 added FP8 support through NVIDIA/TransformerEngine, added RetNet_HF through syncdoth/RetNet (as an alternative to branch away from torchscale) 2024-04-08 20:14:51 -05:00
mrq
7075c2a5f0 added an option to allow injecting embeddings from another model, because it dawned upon me how valuable embeddings from a good model can be for subsequent trainings (defined under cfg.models._embeddings as a relative path to the yaml) 2024-04-04 19:11:49 -05:00
mrq
91062361af tweaks 2024-03-01 20:38:06 -06:00
mrq
f3c59c3e7e cleaner replacement code (because I realized BitNet had an implementation for it too), added calculating gradient norm and performing gradient clipping in local trainer (non-deepspeed) 2024-03-01 20:18:43 -06:00
mrq
47435207f7 Added cfg.bitsandbytes.replace as a less intrusive alternative to cfg.bitsandbytes.inject to replace all Linear modules in a model 2024-03-01 19:20:10 -06:00
mrq
0427d8d076 logger broke for some reason, added flag to just tqdm.write instead, make cfg.bitsandbytes.bitnet==True yamls denoted since I'm sure they're not interoperable 2024-03-01 10:32:35 -06:00
mrq
35d78a2bb0 Yet Another Underlying Transformer Implementation (BitNet, will give it a few days to see how it fares) 2024-02-29 20:29:17 -06:00
mrq
3da1518ace added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work) 2024-01-31 21:48:36 -06:00
mrq
cce929e136 nasty hotfix for transformer's Mixtral throwing an error when batch sizes > 1 2024-01-26 19:41:12 -06:00
mrq
e799665759 experimental weighting of prom/resp embeds 2024-01-25 12:18:48 -06:00
mrq
c690aa509d fixes and compat (MoE-fying an existing model and retraining from there just ruins it after a second of audio...) 2023-12-25 21:20:32 -06:00
mrq
e513d2ef19 experts weren't forwarded into constructer (wasted a few days of training garbage) 2023-12-23 16:08:17 -06:00
mrq
0db3203b21 added LLaMA/Mixtral (if experts>1) model arches, utilize XMoE's loss as well, set MoE frequency to 1 to make every layer MoE'd for RetNet, etc. (going to do tests without burning out again to see how things go) 2023-12-22 19:27:36 -06:00
mrq
9c198eb75a added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works) 2023-12-20 18:45:58 -06:00
mrq
6c51a629cc resetting step count resets the samples processed and other metrics 2023-10-29 12:11:19 -05:00
mrq
0aa2a3cc07 evaluation/validation passes language ID during training (oops) 2023-10-29 12:00:40 -05:00
mrq
ed54f4ebec un 'experimental' the better target sequence preparation 2023-10-22 09:06:59 -05:00
mrq
9a6040383e make validation samplers ignore sampler type 2023-10-22 09:01:47 -05:00
mrq
32d4271ca8 fixed issue with training from scratch (oops) 2023-10-21 09:55:38 -05:00
mrq
3195026dba fixed issue with the 'add another target audio to artificially create longer sequences' for HDF5 just duplicating the utterance initially sampled 2023-10-18 20:38:33 -05:00
mrq
09cda7d3f9 added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup 2023-10-16 19:30:38 -05:00
mrq
a539f6889f mucked around with the loss calculation, this seems better? 2023-10-13 18:22:21 -05:00
mrq
fb467b19ba exposed rolling resp context to the web UI, added passing in language to inferencing command line 2023-10-12 23:21:01 -05:00
mrq
298fd9a5f9 fixed issue with webui 2023-10-12 22:49:25 -05:00
mrq
65f500083d tweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to work 2023-10-12 22:21:43 -05:00
mrq
08bae355eb actually use langs from the dataloader 2023-10-11 21:21:50 -05:00
mrq
3af19d79fd oops 2023-10-11 20:49:54 -05:00
mrq
8740cdefc6 added initial support for languages (still testing, marked as model version 3), added experimental 'context extend by limiting the resp context' (untested) 2023-10-11 20:38:40 -05:00
mrq
6045cbce94 added experimental option to append utterances for training target (emphasis on experimental) 2023-10-11 17:32:45 -05:00
mrq
7facacf7c9 separated samplers into its own file, don't bother copying the logits back to the GPU after sampling, it's not necessary 2023-10-11 12:25:31 -05:00
mrq
100dd164e6 apply phoneme cleanup in inferencing as well 2023-10-10 19:21:19 -05:00
mrq
b4405c98ea remove double spaces in the text phonemes (might have caused problems.........) 2023-10-10 19:18:24 -05:00
mrq
47b3077415 fixed mirostat issue 2023-10-10 18:09:49 -05:00
mrq
99e980d323 documentation and more better-er attribution 2023-10-10 17:15:16 -05:00
mrq
e727b6e5c1 changed dynamic temperature trigger to be a min-(n)ar-temp value between [0,(n)ar-temp), flags to set min temp, checkbox in web UI to request it 2023-10-10 17:02:33 -05:00
mrq
ec25f56bd9 used torch.max fixes things, somehow, for dynamic temp sampling 2023-10-10 16:42:24 -05:00
mrq
87db03dd93 trim the input prompt to 3 seconds when training NAR tasks (marked as experimental; the paper mentions doing so, but I don't know how much this would harm the retention heads) 2023-10-09 22:03:58 -05:00
mrq
893a610fad cleanup, use deepspeed inferencing pathway if requested 2023-10-09 15:24:04 -05:00
mrq
26fbb92ec6 reduced dynamic temperature threshold to > 1.0, as it seems to not quite be useful for audio LMs, sped up any sampling that touches logits by copying them to CPU first, as accessing tensors on the GPU is slow as balls) 2023-10-09 14:46:17 -05:00
mrq
29873e6ded extend the max temps in the web UI to actually allow dynamic temp sampling 2023-10-09 13:30:45 -05:00
mrq
27483e56f0 disabled preparing of SpeechX tasks, added dynamic temperature testing (to-do: test it, credited in the function) 2023-10-09 13:01:40 -05:00
mrq
2deb995cc9 updated setup script 2023-10-06 20:08:28 -05:00
mrq
3db7e7dea1 implicitly load checkpoint if deepspeed checkpoint not found, updated setup script to grab the diskcached dataloader things 2023-10-06 10:02:45 -05:00
mrq
82f02ae9b1 oops 2023-10-06 09:26:52 -05:00
mrq
63cc9cf37a added compat flags for torchscale because the maintainer for torchscale broke compat for existing models 2023-10-05 16:39:46 -05:00
mrq
153f8b293c added min-x and min-y arguments to plot.py, helper script to download from my existing checkpoint 2023-10-04 19:41:37 -05:00
mrq
777ba43305 oops 2023-10-03 15:01:37 -05:00
mrq
d12877ee09 added option to set probability of selecting the AR during training under a monolithic AR+NAR, added some more to-dos while I have them in mind 2023-10-02 16:52:42 -05:00
mrq
e85b798fbf set default NAR levels to max for the web UI 2023-09-29 19:14:16 -05:00
mrq
c7fb740d41 do not specify a default dtype for the web UI, let it implicitly load from the yaml instead 2023-09-24 17:54:03 -05:00
mrq
4abd6564d1 fixed training stats not loading from exported weights, a bit of a readme cleanup, updated example training yaml 2023-09-23 19:59:00 -05:00
mrq
9384900ce6 revert the frankensteined "train one model but hotload the other" since it kept loading the last exported weights and I'm not supporting this usecase anymore anyways 2023-09-22 13:04:17 -05:00
mrq
e7da1eb90d edge case 2023-09-20 19:20:17 -05:00
mrq
c0b25541e3 restructured some things with the model to remove dead weights 2023-09-20 19:10:59 -05:00
mrq
a6bfe43590 added mirostat sampling (given a partially trained model, it got far decent output than I expected, need to test on a better trained model) 2023-09-18 18:55:41 -05:00
mrq
2567e082b5 UGH 2023-09-16 00:26:13 -05:00
mrq
22ffaf3a33 have loss for the NAR not-ignore the text prompt, I imagine this should help the NAR and explain why it's always had a bit of an issue with training 2023-09-15 19:08:44 -05:00
mrq
4aef798135 added picking final candidate based on sum of score instead of first candidate (this changes nothing). 2023-09-13 13:19:11 -05:00
mrq
23a5fdd645 implemented a naive beam search (I really should be taking a break) 2023-09-12 21:28:07 -05:00
mrq
a6ae344e5b some comments 2023-09-12 16:04:45 -05:00
mrq
d07c63b9d8 unified more things with training the AR+NAR monolothic model 2023-09-12 15:54:41 -05:00
mrq
40ef34e1ca this embedding class definitely works, and migrating from the previous embedding weights seems to work. 2023-09-11 14:13:42 -05:00
mrq
a1f250ffac set default max_levels for NAR to 0 and implicitly set it to max resps levels because the previous way was implicitly assuming all models were outputting at 1+7 RVQ bins. 2023-09-10 20:33:33 -05:00
mrq
671dca88ee throw error when no reference audio is provided in the web UI because someone keeps doing that in the HF space 2023-09-10 15:50:50 -05:00
mrq
ba71020318 added option to limit (or exceed) inferenced RVQ-bin levels through the NAR 2023-09-10 13:50:13 -05:00
mrq
c74fe2f718 tweaks to web UI 2023-09-09 22:27:20 -05:00
mrq
7f8bd2b936 added printing elasped inference time 2023-09-09 20:05:03 -05:00
mrq
4f61f5c889 added option to set the trim length for an input prompt 2023-09-09 18:04:44 -05:00
mrq
d10053d11f render README.md markdown for huggingface space 2023-09-09 17:04:51 -05:00
mrq
bc30026377 added advanced sampler parameters to the web UI 2023-09-09 16:51:36 -05:00
mrq
5ac119a6e7 added light web UI (need to port the telemetry disabling bandaids from aivc) 2023-09-09 16:17:20 -05:00
mrq
10c34c5b98 added a length-based decay factor for repetition penalty 2023-09-08 21:02:00 -05:00
mrq
b922f35b6b added documentation on how these new sampling parameters are very iffy and you really need to know what you are doing to use them because this is audio generation and not text generation 2023-09-08 20:43:36 -05:00
mrq
14c78bae39 added lots of sampling options (top-k/top-p, repetition penalty, length penalty) 2023-09-08 20:30:54 -05:00
mrq
f69aad9c65 some day I'll get it right 2023-09-08 15:36:26 -05:00
mrq
b2907ae7e0 seems that my PromEmbedding/RespEmbedding doesn't actually work all that well, naively using dedicated MultiEmbeddings for AR/NAR in the monolithic model is the best way to go 2023-09-08 01:03:24 -05:00
mrq
67617d7d69 also cull frozen_params in the params optimizer receives to reduce VRAM it consumes 2023-09-07 18:27:02 -05:00
mrq
8837bc34d7 added option to specify parameters to freeze per-model in YAML (because I need to see about committing atrocities with convering an AR into an AR+NAR) 2023-09-07 18:19:51 -05:00
mrq
c47fc3274e added backwards compat flag 2023-09-07 17:12:17 -05:00
mrq
ab5134f385 tweaks and fixes 2023-09-07 17:08:38 -05:00
mrq
b2c2dec291 added homebrewed per-RVQ-bin embedding solutions 2023-09-07 16:48:02 -05:00
mrq
e7a67410d1 oops 2023-09-07 09:14:03 -05:00
mrq
712808494f added support for optional prodigy optimizer (https://github.com/konstmish/prodigy) although it consumes a lot more VRAM per parameter 2023-09-06 20:33:16 -05:00
mrq
7ce06432fd fixed the AR+NAR dual model, the resp_emb has to be split up (classifier might too) 2023-09-06 19:33:39 -05:00
mrq
100ca6b7d0 added option to use SGD optimizer through the YAML, added option to pass in additional optimizer parameters through the YAML, added experimental unified AR+NAR model (does not seem fruitful in testing) 2023-09-06 18:58:35 -05:00
mrq
451726fdd5 added ability to disable activation checkpointing through the YAML (it is very VRAM intensive at double layer size) 2023-09-05 15:38:21 -05:00
mrq
143aee7526 removed dedicated interleaved AR code 2023-09-03 22:47:03 -05:00
mrq
2f9cd0842f merged dedicated interleaved AR code with the normal AR code 2023-09-03 22:46:08 -05:00
mrq
3a6bd50322 haha 2023-09-03 21:36:58 -05:00
mrq
c56ce033d9 work on an interleaved AR (spoiler: it does not work) 2023-09-03 21:27:58 -05:00
mrq
8a6c203277 added per-speaker samplers 2023-09-03 21:27:13 -05:00
mrq
81b05dabb9 accurate epoch metric is now reported (based on samples processed / length of dataset's paths, rather than naive assumptions) 2023-09-03 08:03:36 -05:00
mrq
922404285c fixed segfault from tts-c task token exceeding being too big (inserted it in the hypothetical svc task token because in reality that is never ever going to be a feasible task to train against) 2023-09-02 19:25:43 -05:00
mrq
4613781e23 integrated plot script, added tts-c task token to help the model be able to mix between normal VALL-E and VALL-E continuous 2023-09-02 16:29:53 -05:00
mrq
71e68a8528 tweaked tts-continuous task 2023-09-02 13:39:17 -05:00
mrq
57db3ccfa8 shuffled VALL-E continuous as a task tts-c instead, logic fixes for it 2023-09-02 12:23:40 -05:00
mrq
2f06166ddd cleanups 2023-09-01 21:33:51 -05:00
mrq
e40c0d34a0 somewhat got recurrent forward working (it's as accurate as chunkwise forward: it's not accurate at all), added option to use AMP instead of blanket setting the weight's dtype 2023-09-01 20:58:29 -05:00
mrq
2bc2d08b09 (need to verify) added modifying model size and config bool to align with VALL-E continuous' methodology 2023-09-01 17:19:34 -05:00
mrq
5c8694db8e nasty bandaid if there's no validation dataset specified during training (for example, during finetunes) 2023-08-30 18:23:05 -05:00
mrq
7f4388e591 added total samples processed and tokens processed (len of text tokens + len of target response tokens) 2023-08-28 11:02:45 -05:00
mrq
87c4bfedba added ability to mark models as disabled for training, and hotloading them for eval/validation (useful if training only one model, or training a model per GPU) 2023-08-27 12:26:12 -05:00
mrq
165a1154e0 Undo naive=False test flag, this shouldn't have made its way in 2023-08-26 22:00:43 -05:00
mrq
78378ed1ce overhauled dataloading code to be marginally faster, mostly cleaned up, and can leverage a metadata json to help things out 2023-08-26 19:53:23 -05:00
mrq
16e0020901 disabled chunkwise_recurrent for 2x speed gains (I suppose it has been working the entire time, but I have not been properly grabbing things, and this might explain why the output is bad) 2023-08-25 19:50:19 -05:00
mrq
6455a2f9d7 I think I fixed a bug? 2023-08-24 23:33:36 -05:00
mrq
0517d620b8 fixes with the local backend 2023-08-24 17:05:56 -05:00
mrq
00ad4af651 updated draconian requirement for espeak-ng to be installed and the env var set to the dll for Windows 2023-08-24 14:57:01 -05:00
mrq
22904a8639 more oversights fixed because I've been using a cached dataloader forever now and didn't catch these problems 2023-08-24 10:25:33 -05:00
mrq
5873c27f1a ops 2023-08-24 09:20:47 -05:00
mrq
501a857d5d ops 2023-08-23 17:03:25 -05:00
mrq
4585824cd3 tweaks, including exporting on save/quit 2023-08-23 16:43:03 -05:00
mrq
d106598403 do not utilize diskcache if a config yaml is not loaded 2023-08-23 11:02:15 -05:00
mrq
524d289c9c Forgot to re-add in setting the weight's dtype on model load 2023-08-22 22:57:23 -05:00
mrq
9c5a33bfd2 added repo with my weights so far 2023-08-22 13:09:44 -05:00
mrq
7b1b82e0e5 inferencing cleanup 2023-08-20 21:36:02 -05:00
mrq
a47029065b I don't know if the lack of start/stop tokens being added was causing my inference tests to fail, but it seems better now 2023-08-20 19:21:54 -05:00
mrq
736c077282 ops 2023-08-20 13:42:18 -05:00
mrq
b105f6211e added ability to export weights mid-training to avoid CBT to yank the weights while the training script is running 2023-08-20 13:39:58 -05:00
mrq
fc576010ce wrapped saving the checkpoint in a try/catch so I can stop waking up to the damn trainer crashing because it ran out of disk space; I'd much rather it keep training to give me time to eventually clear up disk space rather than it silently restarting on its own 2023-08-20 06:29:17 -05:00
mrq
2d1a9f10c0 nightmare of spaghetti that might break compat; mechanism to increase RVQ bins of an existing model without retraining, keeps sampled proms/resps at max RVQ level and trim off excess levels according to what model receives them, some other things I already forgot (I really hope no one else has weights being baked right now) 2023-08-19 15:06:33 -05:00
mrq
f7f6d3bf6d validated that SpeechX tasks cse and nse works, added a method to test each task by invoking python3 -m vall_e.data --action=tasks --tasks='sr,se,cse,nse' 2023-08-19 09:50:07 -05:00
mrq
6ca347e1e1 literally had a urethra moment before going to bed with a way to implement cse/nse tasks 2023-08-19 01:16:46 -05:00
mrq
8f42c578c9 setting up for allowing training for a partial amount of the speechx tasks (do NOT try this at home yet without a proper model, as performance is predecated on having a solid base vall-e model for the tasks 2023-08-19 00:16:08 -05:00
mrq
ae9d38aa31 forgot to have it pull from specified noise to the hdf5 dataset 2023-08-18 23:57:07 -05:00
mrq
77292c42f9 tested the training preparation for tasks ns, sr, and tse (I don't expect it to go well with only 2 RVQ bins) 2023-08-18 23:55:40 -05:00
mrq
bbb0563b3d pseudocode polyfill stub some other flavor of working on adding the tasks 2023-08-18 22:22:13 -05:00
mrq
0b46c1e312 god I am inexperienced with retaining compat from previous weights, I hope no one actually has weights 2023-08-18 21:29:20 -05:00
mrq
508677fcd5 repaired auraloss loss calc during eval/val 2023-08-18 21:19:47 -05:00
mrq
fb4e816823 oops 2023-08-18 21:11:19 -05:00
mrq
2a71486cb6 preparing for SpeechX extensions 2023-08-18 20:58:07 -05:00
mrq
ced31fd9b7 removed the sampler as it's very misleading 2023-08-18 14:47:48 -05:00
mrq
8e7f900210 forgot the = 2023-08-17 19:07:59 -05:00
mrq
3ff7cf8341 maybe fix evaluation dataset not being capped to cfg.evaluation.size 2023-08-17 18:56:37 -05:00
mrq
ee58db746f actually make the evaluation dataset shuffled for sample_type=speaker 2023-08-17 15:04:45 -05:00
mrq
18403a3523 maybe fixes eval dataloader not shuffling under distributed 2023-08-17 13:41:53 -05:00
mrq
03872b823f why did I type rglob, another 10 bucks down the drain... 2023-08-17 00:11:29 -05:00