Commit Graph

105 Commits

Author SHA1 Message Date
mrq
312a8e3ead add shuffle to samplers that can support it 2024-06-30 11:36:46 -05:00
mrq
bc2a6fa756 sanity cleanup: moved experimental features under its own thing 2024-06-30 10:37:33 -05:00
mrq
793ccb16fb ugh 2024-06-29 22:14:35 -05:00
mrq
c4dd523b6f change from chunk-slicing paths for distributed dataloader to instead interleave 2024-06-29 10:10:35 -05:00
mrq
dd40463803 limit eval size because the training batch size seems to be used for the eval dataloader, somehow (bandaid) 2024-06-29 09:11:28 -05:00
mrq
591d3ac848 have eval dataloader use eval batch size for batchedordersampler 2024-06-28 22:44:00 -05:00
mrq
83075c1505 sort duration buckets to ensure that paths sorted-by-duration are actually sorted by duration (because i didnt know that python dicts can have non-strings as keys), added batching samples based on total duration to ensure best training throughput 2024-06-28 22:28:54 -05:00
mrq
8fffb94964 backport fix from tortoise_tts with local trainer + loading state when training lora 2024-06-25 13:41:29 -05:00
mrq
19410a919e ugh 2024-06-15 12:29:03 -05:00
mrq
d343bde09b residual_in_fp32=False for mamba arch backends because it breaks the classifier (output projection / lm head / what-have-you) under AMP 2024-06-15 12:08:03 -05:00
mrq
31f71fa134 sampler update (some brainworm just never actually had a sampler for sample_type=path) 2024-06-14 16:55:40 -05:00
mrq
b3b67f34ac added option to sort paths by durations to better group equally lengthed sequences together (and there was maybe a logic error from creating the samplers and then interleave-reordering paths, desyncing them, maybe) 2024-06-13 22:37:34 -05:00
mrq
cca542a4c0 ugh 2024-06-11 23:59:28 -05:00
mrq
65a8960305 option to split classifier per-level instead of sharing one (at this point I'm just scrambling to try and cope with training a DAC model, the NAR is being a pain) 2024-06-11 22:28:59 -05:00
mrq
234f9efc6e ugh 2024-06-09 11:39:43 -05:00
mrq
132a02c48b sanity cleanup, backup config yaml for each log file 2024-06-09 11:22:52 -05:00
mrq
4ade2b60ee ugh 2024-06-06 21:57:11 -05:00
mrq
014e565c4b tweaks 2024-06-04 20:41:13 -05:00
mrq
6d5bd0156a fixes 2024-06-04 18:50:48 -05:00
mrq
ed3aeaf3a1 copy pasted from test to actual trainer 2024-06-04 18:40:30 -05:00
mrq
0aa01ba31a forgot one crucial detail (you *need* the previous RVQ level to keep coherence between all RVQ levels) (experimental deinterleaved is a bit crusty though) 2024-06-04 18:30:30 -05:00
mrq
406ff7bbe1 re-implemented config.model.interleave for the HF-compat experimental method 2024-06-04 14:19:52 -05:00
mrq
c93d5863fd fixes 2024-06-04 00:07:00 -05:00
mrq
934672252b feverish cleanup 2024-06-03 21:28:49 -05:00
mrq
8cf176ab46 ugh 2024-06-01 10:46:42 -05:00
mrq
d0ebce6bac ugh 2024-06-01 10:30:13 -05:00
mrq
74df2f5332 split sampler dict by global_rank, also handle splitting dataset paths by global_rank if sampler_type == path (because I do not trust DistributedSampler) (need to test) 2024-06-01 09:29:49 -05:00
mrq
ddbacde0d1 DAC just doesn't work well enough...... 2024-05-25 11:07:52 -05:00
mrq
e3ef89f5aa 100x better for subtrain/eval to be by group instead 2024-05-19 16:40:14 -05:00
mrq
4bc7e5a6d1 fix loading without needing an hdf5 dataset already prepped (and some other incidental speedups during dataloader prep) 2024-05-18 07:14:26 -05:00
mrq
d88a5ca183 ugh 2024-05-16 07:25:33 -05:00
mrq
d9aabfa3ae final tweaks, hopefully, again 2024-05-15 23:04:19 -05:00
mrq
2437a86efa ugh 2024-05-12 13:02:15 -05:00
mrq
4f1593c8db a bunch of shit to salvage my old encodec-quantized audio because dac-encoded audio just does not want to converge 2024-05-12 10:17:29 -05:00
mrq
14709ac67f ughh 2024-05-12 07:30:59 -05:00
mrq
3774fcbdee ugh 2024-05-11 22:58:38 -05:00
mrq
4d93a16ef7 might just be better to explicitly define prompt duration ranges, especially under a "train small contexts then increase it" training paradigm 2024-05-11 09:50:54 -05:00
mrq
0d5d545a40 crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc. 2024-05-09 20:28:20 -05:00
mrq
c6e0f905b5 final tweaks (again) before training restarts 2024-05-08 02:11:38 -05:00
mrq
33b7f81b94 small cleanups 2024-05-04 22:37:22 -05:00
mrq
ffa200eec7 added option to specify frames per second for the given audio representation (Encodec is 75Hz, DAC is 41Hz (at 24K sources)) 2024-05-04 12:05:41 -05:00
mrq
b5d1456a09 backwards compat for my shitty old weights (was testing if disabling AudioEmbedding summing magically made things better (it did not)) 2024-04-29 22:14:01 -05:00
mrq
6a11bc9cb6 update tokenizer because, for some reason, it had the wrong order for the special tokens to where eos = unk 2024-04-29 09:09:26 -05:00
mrq
57810e4ba4 metadata only path (might drop HDF5 since its giving file sizes twice as large as my actual unpacked dataset) 2024-04-28 23:03:09 -05:00
mrq
caad7ee3c9 final tweaks, hopefully 2024-04-28 22:28:29 -05:00
mrq
ffc334cf58 added dataset transcription helper script (now I don't ever have to touch ai-voice-cloning) (to-do: unify scripts into the module) 2024-04-21 17:43:20 -05:00
mrq
071fb97777 dataset preparation script updates, caved and am using HF tokenizer now 2024-04-21 14:49:18 -05:00
mrq
8214aa23d7 converting over to a different intermediary dataset format 2024-04-18 21:24:06 -05:00
mrq
4f5c9e518a actually use the passed-through sample rate from encode for DAC because it does its own resampling I guess 2024-04-18 13:32:41 -05:00
mrq
545162195b deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things 2024-04-15 19:54:32 -05:00