Commit Graph

149 Commits

Author SHA1 Message Date
mrq
83075c1505 sort duration buckets to ensure that paths sorted-by-duration are actually sorted by duration (because i didnt know that python dicts can have non-strings as keys), added batching samples based on total duration to ensure best training throughput 2024-06-28 22:28:54 -05:00
mrq
8fffb94964 backport fix from tortoise_tts with local trainer + loading state when training lora 2024-06-25 13:41:29 -05:00
mrq
19410a919e ugh 2024-06-15 12:29:03 -05:00
mrq
d343bde09b residual_in_fp32=False for mamba arch backends because it breaks the classifier (output projection / lm head / what-have-you) under AMP 2024-06-15 12:08:03 -05:00
mrq
31f71fa134 sampler update (some brainworm just never actually had a sampler for sample_type=path) 2024-06-14 16:55:40 -05:00
mrq
b3b67f34ac added option to sort paths by durations to better group equally lengthed sequences together (and there was maybe a logic error from creating the samplers and then interleave-reordering paths, desyncing them, maybe) 2024-06-13 22:37:34 -05:00
mrq
cca542a4c0 ugh 2024-06-11 23:59:28 -05:00
mrq
65a8960305 option to split classifier per-level instead of sharing one (at this point I'm just scrambling to try and cope with training a DAC model, the NAR is being a pain) 2024-06-11 22:28:59 -05:00
mrq
234f9efc6e ugh 2024-06-09 11:39:43 -05:00
mrq
132a02c48b sanity cleanup, backup config yaml for each log file 2024-06-09 11:22:52 -05:00
mrq
4ade2b60ee ugh 2024-06-06 21:57:11 -05:00
mrq
014e565c4b tweaks 2024-06-04 20:41:13 -05:00
mrq
6d5bd0156a fixes 2024-06-04 18:50:48 -05:00
mrq
ed3aeaf3a1 copy pasted from test to actual trainer 2024-06-04 18:40:30 -05:00
mrq
0aa01ba31a forgot one crucial detail (you *need* the previous RVQ level to keep coherence between all RVQ levels) (experimental deinterleaved is a bit crusty though) 2024-06-04 18:30:30 -05:00
mrq
406ff7bbe1 re-implemented config.model.interleave for the HF-compat experimental method 2024-06-04 14:19:52 -05:00
mrq
c93d5863fd fixes 2024-06-04 00:07:00 -05:00
mrq
934672252b feverish cleanup 2024-06-03 21:28:49 -05:00
mrq
8cf176ab46 ugh 2024-06-01 10:46:42 -05:00
mrq
d0ebce6bac ugh 2024-06-01 10:30:13 -05:00
mrq
74df2f5332 split sampler dict by global_rank, also handle splitting dataset paths by global_rank if sampler_type == path (because I do not trust DistributedSampler) (need to test) 2024-06-01 09:29:49 -05:00
mrq
ddbacde0d1 DAC just doesn't work well enough...... 2024-05-25 11:07:52 -05:00
mrq
e3ef89f5aa 100x better for subtrain/eval to be by group instead 2024-05-19 16:40:14 -05:00
mrq
4bc7e5a6d1 fix loading without needing an hdf5 dataset already prepped (and some other incidental speedups during dataloader prep) 2024-05-18 07:14:26 -05:00
mrq
d88a5ca183 ugh 2024-05-16 07:25:33 -05:00
mrq
d9aabfa3ae final tweaks, hopefully, again 2024-05-15 23:04:19 -05:00
mrq
2437a86efa ugh 2024-05-12 13:02:15 -05:00
mrq
4f1593c8db a bunch of shit to salvage my old encodec-quantized audio because dac-encoded audio just does not want to converge 2024-05-12 10:17:29 -05:00
mrq
14709ac67f ughh 2024-05-12 07:30:59 -05:00
mrq
3774fcbdee ugh 2024-05-11 22:58:38 -05:00
mrq
4d93a16ef7 might just be better to explicitly define prompt duration ranges, especially under a "train small contexts then increase it" training paradigm 2024-05-11 09:50:54 -05:00
mrq
0d5d545a40 crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc. 2024-05-09 20:28:20 -05:00
mrq
c6e0f905b5 final tweaks (again) before training restarts 2024-05-08 02:11:38 -05:00
mrq
33b7f81b94 small cleanups 2024-05-04 22:37:22 -05:00
mrq
ffa200eec7 added option to specify frames per second for the given audio representation (Encodec is 75Hz, DAC is 41Hz (at 24K sources)) 2024-05-04 12:05:41 -05:00
mrq
b5d1456a09 backwards compat for my shitty old weights (was testing if disabling AudioEmbedding summing magically made things better (it did not)) 2024-04-29 22:14:01 -05:00
mrq
6a11bc9cb6 update tokenizer because, for some reason, it had the wrong order for the special tokens to where eos = unk 2024-04-29 09:09:26 -05:00
mrq
57810e4ba4 metadata only path (might drop HDF5 since its giving file sizes twice as large as my actual unpacked dataset) 2024-04-28 23:03:09 -05:00
mrq
caad7ee3c9 final tweaks, hopefully 2024-04-28 22:28:29 -05:00
mrq
ffc334cf58 added dataset transcription helper script (now I don't ever have to touch ai-voice-cloning) (to-do: unify scripts into the module) 2024-04-21 17:43:20 -05:00
mrq
071fb97777 dataset preparation script updates, caved and am using HF tokenizer now 2024-04-21 14:49:18 -05:00
mrq
8214aa23d7 converting over to a different intermediary dataset format 2024-04-18 21:24:06 -05:00
mrq
4f5c9e518a actually use the passed-through sample rate from encode for DAC because it does its own resampling I guess 2024-04-18 13:32:41 -05:00
mrq
545162195b deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things 2024-04-15 19:54:32 -05:00
mrq
9c198eb75a added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works) 2023-12-20 18:45:58 -06:00
mrq
0aa2a3cc07 evaluation/validation passes language ID during training (oops) 2023-10-29 12:00:40 -05:00
mrq
9a6040383e make validation samplers ignore sampler type 2023-10-22 09:01:47 -05:00
mrq
3195026dba fixed issue with the 'add another target audio to artificially create longer sequences' for HDF5 just duplicating the utterance initially sampled 2023-10-18 20:38:33 -05:00
mrq
09cda7d3f9 added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup 2023-10-16 19:30:38 -05:00
mrq
65f500083d tweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to work 2023-10-12 22:21:43 -05:00
mrq
8740cdefc6 added initial support for languages (still testing, marked as model version 3), added experimental 'context extend by limiting the resp context' (untested) 2023-10-11 20:38:40 -05:00
mrq
6045cbce94 added experimental option to append utterances for training target (emphasis on experimental) 2023-10-11 17:32:45 -05:00
mrq
b4405c98ea remove double spaces in the text phonemes (might have caused problems.........) 2023-10-10 19:18:24 -05:00
mrq
87db03dd93 trim the input prompt to 3 seconds when training NAR tasks (marked as experimental; the paper mentions doing so, but I don't know how much this would harm the retention heads) 2023-10-09 22:03:58 -05:00
mrq
893a610fad cleanup, use deepspeed inferencing pathway if requested 2023-10-09 15:24:04 -05:00
mrq
27483e56f0 disabled preparing of SpeechX tasks, added dynamic temperature testing (to-do: test it, credited in the function) 2023-10-09 13:01:40 -05:00
mrq
82f02ae9b1 oops 2023-10-06 09:26:52 -05:00
mrq
d12877ee09 added option to set probability of selecting the AR during training under a monolithic AR+NAR, added some more to-dos while I have them in mind 2023-10-02 16:52:42 -05:00
mrq
a6bfe43590 added mirostat sampling (given a partially trained model, it got far decent output than I expected, need to test on a better trained model) 2023-09-18 18:55:41 -05:00
mrq
d07c63b9d8 unified more things with training the AR+NAR monolothic model 2023-09-12 15:54:41 -05:00
mrq
40ef34e1ca this embedding class definitely works, and migrating from the previous embedding weights seems to work. 2023-09-11 14:13:42 -05:00
mrq
8a6c203277 added per-speaker samplers 2023-09-03 21:27:13 -05:00
mrq
922404285c fixed segfault from tts-c task token exceeding being too big (inserted it in the hypothetical svc task token because in reality that is never ever going to be a feasible task to train against) 2023-09-02 19:25:43 -05:00
mrq
4613781e23 integrated plot script, added tts-c task token to help the model be able to mix between normal VALL-E and VALL-E continuous 2023-09-02 16:29:53 -05:00
mrq
71e68a8528 tweaked tts-continuous task 2023-09-02 13:39:17 -05:00
mrq
57db3ccfa8 shuffled VALL-E continuous as a task tts-c instead, logic fixes for it 2023-09-02 12:23:40 -05:00
mrq
2bc2d08b09 (need to verify) added modifying model size and config bool to align with VALL-E continuous' methodology 2023-09-01 17:19:34 -05:00
mrq
5c8694db8e nasty bandaid if there's no validation dataset specified during training (for example, during finetunes) 2023-08-30 18:23:05 -05:00
mrq
7f4388e591 added total samples processed and tokens processed (len of text tokens + len of target response tokens) 2023-08-28 11:02:45 -05:00
mrq
87c4bfedba added ability to mark models as disabled for training, and hotloading them for eval/validation (useful if training only one model, or training a model per GPU) 2023-08-27 12:26:12 -05:00
mrq
165a1154e0 Undo naive=False test flag, this shouldn't have made its way in 2023-08-26 22:00:43 -05:00
mrq
78378ed1ce overhauled dataloading code to be marginally faster, mostly cleaned up, and can leverage a metadata json to help things out 2023-08-26 19:53:23 -05:00
mrq
0517d620b8 fixes with the local backend 2023-08-24 17:05:56 -05:00
mrq
22904a8639 more oversights fixed because I've been using a cached dataloader forever now and didn't catch these problems 2023-08-24 10:25:33 -05:00
mrq
5873c27f1a ops 2023-08-24 09:20:47 -05:00
mrq
4585824cd3 tweaks, including exporting on save/quit 2023-08-23 16:43:03 -05:00
mrq
9c5a33bfd2 added repo with my weights so far 2023-08-22 13:09:44 -05:00
mrq
7b1b82e0e5 inferencing cleanup 2023-08-20 21:36:02 -05:00
mrq
a47029065b I don't know if the lack of start/stop tokens being added was causing my inference tests to fail, but it seems better now 2023-08-20 19:21:54 -05:00
mrq
fc576010ce wrapped saving the checkpoint in a try/catch so I can stop waking up to the damn trainer crashing because it ran out of disk space; I'd much rather it keep training to give me time to eventually clear up disk space rather than it silently restarting on its own 2023-08-20 06:29:17 -05:00
mrq
2d1a9f10c0 nightmare of spaghetti that might break compat; mechanism to increase RVQ bins of an existing model without retraining, keeps sampled proms/resps at max RVQ level and trim off excess levels according to what model receives them, some other things I already forgot (I really hope no one else has weights being baked right now) 2023-08-19 15:06:33 -05:00
mrq
f7f6d3bf6d validated that SpeechX tasks cse and nse works, added a method to test each task by invoking python3 -m vall_e.data --action=tasks --tasks='sr,se,cse,nse' 2023-08-19 09:50:07 -05:00
mrq
6ca347e1e1 literally had a urethra moment before going to bed with a way to implement cse/nse tasks 2023-08-19 01:16:46 -05:00
mrq
8f42c578c9 setting up for allowing training for a partial amount of the speechx tasks (do NOT try this at home yet without a proper model, as performance is predecated on having a solid base vall-e model for the tasks 2023-08-19 00:16:08 -05:00
mrq
ae9d38aa31 forgot to have it pull from specified noise to the hdf5 dataset 2023-08-18 23:57:07 -05:00
mrq
77292c42f9 tested the training preparation for tasks ns, sr, and tse (I don't expect it to go well with only 2 RVQ bins) 2023-08-18 23:55:40 -05:00
mrq
bbb0563b3d pseudocode polyfill stub some other flavor of working on adding the tasks 2023-08-18 22:22:13 -05:00
mrq
2a71486cb6 preparing for SpeechX extensions 2023-08-18 20:58:07 -05:00
mrq
ced31fd9b7 removed the sampler as it's very misleading 2023-08-18 14:47:48 -05:00
mrq
8e7f900210 forgot the = 2023-08-17 19:07:59 -05:00
mrq
3ff7cf8341 maybe fix evaluation dataset not being capped to cfg.evaluation.size 2023-08-17 18:56:37 -05:00
mrq
ee58db746f actually make the evaluation dataset shuffled for sample_type=speaker 2023-08-17 15:04:45 -05:00
mrq
18403a3523 maybe fixes eval dataloader not shuffling under distributed 2023-08-17 13:41:53 -05:00
mrq
b5f247aa11 just nuked about 9 hours of progress because I didn't make sure it pruned only on the global leader 2023-08-16 23:37:52 -05:00
mrq
44c08d828e added sample_type that samples from speakers to truly balance an epoch by speakers rather than the entire dataset and a sampler that tries to balance by speakers 2023-08-16 19:39:21 -05:00
mrq
277c759ab1 fixed issue with non-distributed training, oops 2023-08-14 21:42:35 -05:00
mrq
5fa86182b5 oops 2023-08-14 10:50:40 -05:00
mrq
d7deaf6def distributed training works now (hopefully) 2023-08-13 22:07:45 -05:00
mrq
bf8cedc9dd Rewrite init 2023-08-02 21:53:35 +00:00