|
74e531d391
|
ugh
|
2024-05-18 12:02:56 -05:00 |
|
|
4bc7e5a6d1
|
fix loading without needing an hdf5 dataset already prepped (and some other incidental speedups during dataloader prep)
|
2024-05-18 07:14:26 -05:00 |
|
|
d88a5ca183
|
ugh
|
2024-05-16 07:25:33 -05:00 |
|
|
d9aabfa3ae
|
final tweaks, hopefully, again
|
2024-05-15 23:04:19 -05:00 |
|
|
8d79f78e0a
|
god I need to replace omegaconf
|
2024-05-12 14:01:52 -05:00 |
|
|
5eb5db7f7f
|
just don't use DAC 24Khz, it's bad
|
2024-05-12 13:41:17 -05:00 |
|
|
230da8b559
|
should be the final things to scramble around for, DAC's 24KHz model is unusable for this, but both encodec's 24KHz and DAC's 44KHz work
|
2024-05-12 13:22:08 -05:00 |
|
|
2437a86efa
|
ugh
|
2024-05-12 13:02:15 -05:00 |
|
|
4f1593c8db
|
a bunch of shit to salvage my old encodec-quantized audio because dac-encoded audio just does not want to converge
|
2024-05-12 10:17:29 -05:00 |
|
|
917eeb40d2
|
ughhh
|
2024-05-12 08:22:39 -05:00 |
|
|
9910c75d5a
|
checkpointing for bitnet impl
|
2024-05-12 07:52:54 -05:00 |
|
|
14709ac67f
|
ughh
|
2024-05-12 07:30:59 -05:00 |
|
|
3774fcbdee
|
ugh
|
2024-05-11 22:58:38 -05:00 |
|
|
856545f8bb
|
nan loss detection (should have added it earlier), loss scaling for local backend + fp16
|
2024-05-11 22:23:29 -05:00 |
|
|
a755eb3c62
|
ugh
|
2024-05-11 17:34:45 -05:00 |
|
|
88e9b9caff
|
local ddp fix
|
2024-05-11 17:29:01 -05:00 |
|
|
3337c69e5a
|
leverage between xformers and torch.backends.cuda.sdp_kernel for attention
|
2024-05-11 17:14:05 -05:00 |
|
|
d33c7bb7cf
|
ugh
|
2024-05-11 16:47:19 -05:00 |
|
|
0b6499601b
|
sanitizing
|
2024-05-11 16:31:05 -05:00 |
|
|
71e373064f
|
remove redundant loss, tweak readme
|
2024-05-11 15:02:47 -05:00 |
|
|
04a80d6b55
|
maybe it's better to be more explicit in deepspeed configs
|
2024-05-11 13:57:43 -05:00 |
|
|
4d93a16ef7
|
might just be better to explicitly define prompt duration ranges, especially under a "train small contexts then increase it" training paradigm
|
2024-05-11 09:50:54 -05:00 |
|
|
bd0a36ba8d
|
I swear I keep seeing tqdm flicker back a number
|
2024-05-10 18:36:01 -05:00 |
|
|
2109712e5b
|
resolve deprecation warning that doesn't show on my old training rig but does on my new one
|
2024-05-09 23:25:44 -05:00 |
|
|
1547de5020
|
haha...
|
2024-05-09 23:15:52 -05:00 |
|
|
b7bd885651
|
some possible sanity with deepspeed config
|
2024-05-09 22:48:42 -05:00 |
|
|
c4b696ebeb
|
oops
|
2024-05-09 22:33:40 -05:00 |
|
|
c22a177cf8
|
forgot to pass warmup to schedule free
|
2024-05-09 22:18:49 -05:00 |
|
|
b6131565ad
|
autotune?
|
2024-05-09 21:25:40 -05:00 |
|
|
6ed6ab8c03
|
a bit more cleanup for deepspeed ds_cfg creation
|
2024-05-09 21:00:26 -05:00 |
|
|
0d5d545a40
|
crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc.
|
2024-05-09 20:28:20 -05:00 |
|
|
c6e0f905b5
|
final tweaks (again) before training restarts
|
2024-05-08 02:11:38 -05:00 |
|
|
215800484d
|
correcting my wrong of assuming I could just use raw 24Khz audio in the 44Khz DAC without too much of an issue (there are issues)
|
2024-05-04 23:49:15 -05:00 |
|
|
9f738fbd5b
|
seems I actually don't need RVQ bins 9-32 with the 24Khz DAC model........ (time to requantize my audio...)
|
2024-05-04 23:09:18 -05:00 |
|
|
33b7f81b94
|
small cleanups
|
2024-05-04 22:37:22 -05:00 |
|
|
8aa1b2dabf
|
documentation update
|
2024-05-04 21:03:46 -05:00 |
|
|
253441b750
|
forgot to disable verbose flag
|
2024-05-04 13:13:52 -05:00 |
|
|
3dca1125f5
|
implemented xformers in HF's Llama (because theres no flash attention for Volta cards)
|
2024-05-04 13:07:45 -05:00 |
|
|
277dcec484
|
apparently I got an error for trying to serialize an errant tensor that made its way into the json, this could be remedied easily with recursively traversing the dict and coercing any objects to primitives, but I'm tired and I just want to start training and nap
|
2024-05-04 12:33:43 -05:00 |
|
|
ffa200eec7
|
added option to specify frames per second for the given audio representation (Encodec is 75Hz, DAC is 41Hz (at 24K sources))
|
2024-05-04 12:05:41 -05:00 |
|
|
c494894261
|
simple DDP wrapper (for my NVlink test)
|
2024-05-04 11:48:26 -05:00 |
|
|
a7b43b98b5
|
renamed cfg.bitsandbytes to cfg.optimizations (and having it serve as cfg.optimizations.bitsandbytes)
|
2024-05-02 20:08:59 -05:00 |
|
|
b5d1456a09
|
backwards compat for my shitty old weights (was testing if disabling AudioEmbedding summing magically made things better (it did not))
|
2024-04-29 22:14:01 -05:00 |
|
|
5120ffdda7
|
god it would be nice to know the best way to handle audio embeddings, because I genuinely don't know without skimming through papers or devoting X amount of GPU hours in training
|
2024-04-29 18:24:05 -05:00 |
|
|
6a11bc9cb6
|
update tokenizer because, for some reason, it had the wrong order for the special tokens to where eos = unk
|
2024-04-29 09:09:26 -05:00 |
|
|
57810e4ba4
|
metadata only path (might drop HDF5 since its giving file sizes twice as large as my actual unpacked dataset)
|
2024-04-28 23:03:09 -05:00 |
|
|
caad7ee3c9
|
final tweaks, hopefully
|
2024-04-28 22:28:29 -05:00 |
|
|
ffc334cf58
|
added dataset transcription helper script (now I don't ever have to touch ai-voice-cloning) (to-do: unify scripts into the module)
|
2024-04-21 17:43:20 -05:00 |
|
|
b251669536
|
forgot to fix up the test trainer
|
2024-04-21 14:58:04 -05:00 |
|
|
071fb97777
|
dataset preparation script updates, caved and am using HF tokenizer now
|
2024-04-21 14:49:18 -05:00 |
|
|
a8ffa88844
|
it slipped my mind that technically DAC can be used at any sample rate, since it models waveforms; make it a config YAML option to allow this behavior
|
2024-04-19 18:36:54 -05:00 |
|
|
8214aa23d7
|
converting over to a different intermediary dataset format
|
2024-04-18 21:24:06 -05:00 |
|
|
4f5c9e518a
|
actually use the passed-through sample rate from encode for DAC because it does its own resampling I guess
|
2024-04-18 13:32:41 -05:00 |
|
|
2e9e6e68f7
|
Forgot I need to use the DAC's 44K model because 24K model has 32 codebooks instead of 9.
|
2024-04-17 20:59:25 -05:00 |
|
|
5ff2b4aab5
|
finally swallowing the Descript-Audio-Codec pill (I guess I'm going to have to regenerate my entire dataset)
|
2024-04-17 20:39:35 -05:00 |
|
|
b0bd88833c
|
refractor cleanup, had a revelation on how I can handle a batch of varying tasks
|
2024-04-16 21:04:48 -05:00 |
|
|
467fa1c5ee
|
wrapper fixes
|
2024-04-16 10:19:02 -05:00 |
|
|
aa1e25fbf5
|
backwards compat for old YAMLs with models , option to set flash attention 2 for Llama (and derivatives), included syncdoth/RetNet s torchscale retnet for shits and grins, etc.
|
2024-04-16 10:02:31 -05:00 |
|
|
545162195b
|
deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things
|
2024-04-15 19:54:32 -05:00 |
|
|
d69a00e389
|
Properly pass retention_mask for retnet-HF, attempt to fix recurrent forward for retnet (doesn't work still)
|
2024-04-14 13:12:50 -05:00 |
|
|
789bb5d11b
|
add an optional label override for model loading (used for easy testing between 12/16/20/24 layered model)
|
2024-04-13 12:43:35 -05:00 |
|
|
f0c4baeb25
|
added Adagrad (experimenting with it), added 'extended' model size (16 layers instead of 12, experimenting with it)
|
2024-04-09 22:04:01 -05:00 |
|
|
4d75ee066c
|
actually do the Linear replacement with TE's Linear
|
2024-04-09 14:41:13 -05:00 |
|
|
9d97eb5104
|
added FP8 support through NVIDIA/TransformerEngine , added RetNet_HF through syncdoth/RetNet (as an alternative to branch away from torchscale)
|
2024-04-08 20:14:51 -05:00 |
|
|
7075c2a5f0
|
added an option to allow injecting embeddings from another model, because it dawned upon me how valuable embeddings from a good model can be for subsequent trainings (defined under cfg.models._embeddings as a relative path to the yaml)
|
2024-04-04 19:11:49 -05:00 |
|
|
91062361af
|
tweaks
|
2024-03-01 20:38:06 -06:00 |
|
|
f3c59c3e7e
|
cleaner replacement code (because I realized BitNet had an implementation for it too), added calculating gradient norm and performing gradient clipping in local trainer (non-deepspeed)
|
2024-03-01 20:18:43 -06:00 |
|
|
47435207f7
|
Added cfg.bitsandbytes.replace as a less intrusive alternative to cfg.bitsandbytes.inject to replace all Linear modules in a model
|
2024-03-01 19:20:10 -06:00 |
|
|
0427d8d076
|
logger broke for some reason, added flag to just tqdm.write instead, make cfg.bitsandbytes.bitnet==True yamls denoted since I'm sure they're not interoperable
|
2024-03-01 10:32:35 -06:00 |
|
|
35d78a2bb0
|
Yet Another Underlying Transformer Implementation (BitNet, will give it a few days to see how it fares)
|
2024-02-29 20:29:17 -06:00 |
|
|
3da1518ace
|
added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work)
|
2024-01-31 21:48:36 -06:00 |
|
|
cce929e136
|
nasty hotfix for transformer's Mixtral throwing an error when batch sizes > 1
|
2024-01-26 19:41:12 -06:00 |
|
|
e799665759
|
experimental weighting of prom/resp embeds
|
2024-01-25 12:18:48 -06:00 |
|
|
c690aa509d
|
fixes and compat (MoE-fying an existing model and retraining from there just ruins it after a second of audio...)
|
2023-12-25 21:20:32 -06:00 |
|
|
e513d2ef19
|
experts weren't forwarded into constructer (wasted a few days of training garbage)
|
2023-12-23 16:08:17 -06:00 |
|
|
0db3203b21
|
added LLaMA/Mixtral (if experts>1) model arches, utilize XMoE's loss as well, set MoE frequency to 1 to make every layer MoE'd for RetNet, etc. (going to do tests without burning out again to see how things go)
|
2023-12-22 19:27:36 -06:00 |
|
|
9c198eb75a
|
added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works)
|
2023-12-20 18:45:58 -06:00 |
|
|
6c51a629cc
|
resetting step count resets the samples processed and other metrics
|
2023-10-29 12:11:19 -05:00 |
|
|
0aa2a3cc07
|
evaluation/validation passes language ID during training (oops)
|
2023-10-29 12:00:40 -05:00 |
|
|
ed54f4ebec
|
un 'experimental' the better target sequence preparation
|
2023-10-22 09:06:59 -05:00 |
|
|
9a6040383e
|
make validation samplers ignore sampler type
|
2023-10-22 09:01:47 -05:00 |
|
|
32d4271ca8
|
fixed issue with training from scratch (oops)
|
2023-10-21 09:55:38 -05:00 |
|
|
3195026dba
|
fixed issue with the 'add another target audio to artificially create longer sequences' for HDF5 just duplicating the utterance initially sampled
|
2023-10-18 20:38:33 -05:00 |
|
|
09cda7d3f9
|
added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup
|
2023-10-16 19:30:38 -05:00 |
|
|
a539f6889f
|
mucked around with the loss calculation, this seems better?
|
2023-10-13 18:22:21 -05:00 |
|
|
fb467b19ba
|
exposed rolling resp context to the web UI, added passing in language to inferencing command line
|
2023-10-12 23:21:01 -05:00 |
|
|
298fd9a5f9
|
fixed issue with webui
|
2023-10-12 22:49:25 -05:00 |
|
|
65f500083d
|
tweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to work
|
2023-10-12 22:21:43 -05:00 |
|
|
08bae355eb
|
actually use langs from the dataloader
|
2023-10-11 21:21:50 -05:00 |
|
|
3af19d79fd
|
oops
|
2023-10-11 20:49:54 -05:00 |
|
|
8740cdefc6
|
added initial support for languages (still testing, marked as model version 3), added experimental 'context extend by limiting the resp context' (untested)
|
2023-10-11 20:38:40 -05:00 |
|
|
6045cbce94
|
added experimental option to append utterances for training target (emphasis on experimental)
|
2023-10-11 17:32:45 -05:00 |
|
|
7facacf7c9
|
separated samplers into its own file, don't bother copying the logits back to the GPU after sampling, it's not necessary
|
2023-10-11 12:25:31 -05:00 |
|
|
100dd164e6
|
apply phoneme cleanup in inferencing as well
|
2023-10-10 19:21:19 -05:00 |
|
|
b4405c98ea
|
remove double spaces in the text phonemes (might have caused problems.........)
|
2023-10-10 19:18:24 -05:00 |
|
|
47b3077415
|
fixed mirostat issue
|
2023-10-10 18:09:49 -05:00 |
|
|
99e980d323
|
documentation and more better-er attribution
|
2023-10-10 17:15:16 -05:00 |
|
|
e727b6e5c1
|
changed dynamic temperature trigger to be a min-(n)ar-temp value between [0,(n)ar-temp), flags to set min temp, checkbox in web UI to request it
|
2023-10-10 17:02:33 -05:00 |
|
|
ec25f56bd9
|
used torch.max fixes things, somehow, for dynamic temp sampling
|
2023-10-10 16:42:24 -05:00 |
|
|
87db03dd93
|
trim the input prompt to 3 seconds when training NAR tasks (marked as experimental; the paper mentions doing so, but I don't know how much this would harm the retention heads)
|
2023-10-09 22:03:58 -05:00 |
|
|
893a610fad
|
cleanup, use deepspeed inferencing pathway if requested
|
2023-10-09 15:24:04 -05:00 |
|
|
26fbb92ec6
|
reduced dynamic temperature threshold to > 1.0, as it seems to not quite be useful for audio LMs, sped up any sampling that touches logits by copying them to CPU first, as accessing tensors on the GPU is slow as balls)
|
2023-10-09 14:46:17 -05:00 |
|
|
29873e6ded
|
extend the max temps in the web UI to actually allow dynamic temp sampling
|
2023-10-09 13:30:45 -05:00 |
|
|
27483e56f0
|
disabled preparing of SpeechX tasks, added dynamic temperature testing (to-do: test it, credited in the function)
|
2023-10-09 13:01:40 -05:00 |
|
|
2deb995cc9
|
updated setup script
|
2023-10-06 20:08:28 -05:00 |
|
|
3db7e7dea1
|
implicitly load checkpoint if deepspeed checkpoint not found, updated setup script to grab the diskcached dataloader things
|
2023-10-06 10:02:45 -05:00 |
|
|
82f02ae9b1
|
oops
|
2023-10-06 09:26:52 -05:00 |
|
|
63cc9cf37a
|
added compat flags for torchscale because the maintainer for torchscale broke compat for existing models
|
2023-10-05 16:39:46 -05:00 |
|
|
153f8b293c
|
added min-x and min-y arguments to plot.py, helper script to download from my existing checkpoint
|
2023-10-04 19:41:37 -05:00 |
|
|
777ba43305
|
oops
|
2023-10-03 15:01:37 -05:00 |
|
|
d12877ee09
|
added option to set probability of selecting the AR during training under a monolithic AR+NAR, added some more to-dos while I have them in mind
|
2023-10-02 16:52:42 -05:00 |
|
|
e85b798fbf
|
set default NAR levels to max for the web UI
|
2023-09-29 19:14:16 -05:00 |
|
|
c7fb740d41
|
do not specify a default dtype for the web UI, let it implicitly load from the yaml instead
|
2023-09-24 17:54:03 -05:00 |
|
|
4abd6564d1
|
fixed training stats not loading from exported weights, a bit of a readme cleanup, updated example training yaml
|
2023-09-23 19:59:00 -05:00 |
|
|
9384900ce6
|
revert the frankensteined "train one model but hotload the other" since it kept loading the last exported weights and I'm not supporting this usecase anymore anyways
|
2023-09-22 13:04:17 -05:00 |
|
|
e7da1eb90d
|
edge case
|
2023-09-20 19:20:17 -05:00 |
|
|
c0b25541e3
|
restructured some things with the model to remove dead weights
|
2023-09-20 19:10:59 -05:00 |
|
|
a6bfe43590
|
added mirostat sampling (given a partially trained model, it got far decent output than I expected, need to test on a better trained model)
|
2023-09-18 18:55:41 -05:00 |
|
|
2567e082b5
|
UGH
|
2023-09-16 00:26:13 -05:00 |
|
|
22ffaf3a33
|
have loss for the NAR not-ignore the text prompt, I imagine this should help the NAR and explain why it's always had a bit of an issue with training
|
2023-09-15 19:08:44 -05:00 |
|
|
4aef798135
|
added picking final candidate based on sum of score instead of first candidate (this changes nothing).
|
2023-09-13 13:19:11 -05:00 |
|
|
23a5fdd645
|
implemented a naive beam search (I really should be taking a break)
|
2023-09-12 21:28:07 -05:00 |
|
|
a6ae344e5b
|
some comments
|
2023-09-12 16:04:45 -05:00 |
|
|
d07c63b9d8
|
unified more things with training the AR+NAR monolothic model
|
2023-09-12 15:54:41 -05:00 |
|
|
40ef34e1ca
|
this embedding class definitely works, and migrating from the previous embedding weights seems to work.
|
2023-09-11 14:13:42 -05:00 |
|
|
a1f250ffac
|
set default max_levels for NAR to 0 and implicitly set it to max resps levels because the previous way was implicitly assuming all models were outputting at 1+7 RVQ bins.
|
2023-09-10 20:33:33 -05:00 |
|
|
671dca88ee
|
throw error when no reference audio is provided in the web UI because someone keeps doing that in the HF space
|
2023-09-10 15:50:50 -05:00 |
|
|
ba71020318
|
added option to limit (or exceed) inferenced RVQ-bin levels through the NAR
|
2023-09-10 13:50:13 -05:00 |
|
|
c74fe2f718
|
tweaks to web UI
|
2023-09-09 22:27:20 -05:00 |
|
|
7f8bd2b936
|
added printing elasped inference time
|
2023-09-09 20:05:03 -05:00 |
|
|
4f61f5c889
|
added option to set the trim length for an input prompt
|
2023-09-09 18:04:44 -05:00 |
|
|
d10053d11f
|
render README.md markdown for huggingface space
|
2023-09-09 17:04:51 -05:00 |
|
|
bc30026377
|
added advanced sampler parameters to the web UI
|
2023-09-09 16:51:36 -05:00 |
|
|
5ac119a6e7
|
added light web UI (need to port the telemetry disabling bandaids from aivc)
|
2023-09-09 16:17:20 -05:00 |
|
|
10c34c5b98
|
added a length-based decay factor for repetition penalty
|
2023-09-08 21:02:00 -05:00 |
|
|
b922f35b6b
|
added documentation on how these new sampling parameters are very iffy and you really need to know what you are doing to use them because this is audio generation and not text generation
|
2023-09-08 20:43:36 -05:00 |
|
|
14c78bae39
|
added lots of sampling options (top-k/top-p, repetition penalty, length penalty)
|
2023-09-08 20:30:54 -05:00 |
|
|
f69aad9c65
|
some day I'll get it right
|
2023-09-08 15:36:26 -05:00 |
|
|
b2907ae7e0
|
seems that my PromEmbedding/RespEmbedding doesn't actually work all that well, naively using dedicated MultiEmbeddings for AR/NAR in the monolithic model is the best way to go
|
2023-09-08 01:03:24 -05:00 |
|
|
67617d7d69
|
also cull frozen_params in the params optimizer receives to reduce VRAM it consumes
|
2023-09-07 18:27:02 -05:00 |
|
|
8837bc34d7
|
added option to specify parameters to freeze per-model in YAML (because I need to see about committing atrocities with convering an AR into an AR+NAR)
|
2023-09-07 18:19:51 -05:00 |
|
|
c47fc3274e
|
added backwards compat flag
|
2023-09-07 17:12:17 -05:00 |
|
|
ab5134f385
|
tweaks and fixes
|
2023-09-07 17:08:38 -05:00 |
|
|
b2c2dec291
|
added homebrewed per-RVQ-bin embedding solutions
|
2023-09-07 16:48:02 -05:00 |
|
|
e7a67410d1
|
oops
|
2023-09-07 09:14:03 -05:00 |
|
|
712808494f
|
added support for optional prodigy optimizer (https://github.com/konstmish/prodigy) although it consumes a lot more VRAM per parameter
|
2023-09-06 20:33:16 -05:00 |
|
|
7ce06432fd
|
fixed the AR+NAR dual model, the resp_emb has to be split up (classifier might too)
|
2023-09-06 19:33:39 -05:00 |
|
|
100ca6b7d0
|
added option to use SGD optimizer through the YAML, added option to pass in additional optimizer parameters through the YAML, added experimental unified AR+NAR model (does not seem fruitful in testing)
|
2023-09-06 18:58:35 -05:00 |
|
|
451726fdd5
|
added ability to disable activation checkpointing through the YAML (it is very VRAM intensive at double layer size)
|
2023-09-05 15:38:21 -05:00 |
|
|
143aee7526
|
removed dedicated interleaved AR code
|
2023-09-03 22:47:03 -05:00 |
|
|
2f9cd0842f
|
merged dedicated interleaved AR code with the normal AR code
|
2023-09-03 22:46:08 -05:00 |
|
|
3a6bd50322
|
haha
|
2023-09-03 21:36:58 -05:00 |
|
|
c56ce033d9
|
work on an interleaved AR (spoiler: it does not work)
|
2023-09-03 21:27:58 -05:00 |
|
|
8a6c203277
|
added per-speaker samplers
|
2023-09-03 21:27:13 -05:00 |
|
|
81b05dabb9
|
accurate epoch metric is now reported (based on samples processed / length of dataset's paths, rather than naive assumptions)
|
2023-09-03 08:03:36 -05:00 |
|
|
922404285c
|
fixed segfault from tts-c task token exceeding being too big (inserted it in the hypothetical svc task token because in reality that is never ever going to be a feasible task to train against)
|
2023-09-02 19:25:43 -05:00 |
|
|
4613781e23
|
integrated plot script, added tts-c task token to help the model be able to mix between normal VALL-E and VALL-E continuous
|
2023-09-02 16:29:53 -05:00 |
|
|
71e68a8528
|
tweaked tts-continuous task
|
2023-09-02 13:39:17 -05:00 |
|
|
57db3ccfa8
|
shuffled VALL-E continuous as a task tts-c instead, logic fixes for it
|
2023-09-02 12:23:40 -05:00 |
|
|
2f06166ddd
|
cleanups
|
2023-09-01 21:33:51 -05:00 |
|
|
e40c0d34a0
|
somewhat got recurrent forward working (it's as accurate as chunkwise forward: it's not accurate at all), added option to use AMP instead of blanket setting the weight's dtype
|
2023-09-01 20:58:29 -05:00 |
|
|
2bc2d08b09
|
(need to verify) added modifying model size and config bool to align with VALL-E continuous' methodology
|
2023-09-01 17:19:34 -05:00 |
|
|
5c8694db8e
|
nasty bandaid if there's no validation dataset specified during training (for example, during finetunes)
|
2023-08-30 18:23:05 -05:00 |
|
|
7f4388e591
|
added total samples processed and tokens processed (len of text tokens + len of target response tokens)
|
2023-08-28 11:02:45 -05:00 |
|
|
87c4bfedba
|
added ability to mark models as disabled for training, and hotloading them for eval/validation (useful if training only one model, or training a model per GPU)
|
2023-08-27 12:26:12 -05:00 |
|
|
165a1154e0
|
Undo naive=False test flag, this shouldn't have made its way in
|
2023-08-26 22:00:43 -05:00 |
|
|
78378ed1ce
|
overhauled dataloading code to be marginally faster, mostly cleaned up, and can leverage a metadata json to help things out
|
2023-08-26 19:53:23 -05:00 |
|
|
16e0020901
|
disabled chunkwise_recurrent for 2x speed gains (I suppose it has been working the entire time, but I have not been properly grabbing things, and this might explain why the output is bad)
|
2023-08-25 19:50:19 -05:00 |
|
|
6455a2f9d7
|
I think I fixed a bug?
|
2023-08-24 23:33:36 -05:00 |
|
|
0517d620b8
|
fixes with the local backend
|
2023-08-24 17:05:56 -05:00 |
|
|
00ad4af651
|
updated draconian requirement for espeak-ng to be installed and the env var set to the dll for Windows
|
2023-08-24 14:57:01 -05:00 |
|
|
22904a8639
|
more oversights fixed because I've been using a cached dataloader forever now and didn't catch these problems
|
2023-08-24 10:25:33 -05:00 |
|
|
5873c27f1a
|
ops
|
2023-08-24 09:20:47 -05:00 |
|
|
501a857d5d
|
ops
|
2023-08-23 17:03:25 -05:00 |
|
|
4585824cd3
|
tweaks, including exporting on save/quit
|
2023-08-23 16:43:03 -05:00 |
|
|
d106598403
|
do not utilize diskcache if a config yaml is not loaded
|
2023-08-23 11:02:15 -05:00 |
|
|
524d289c9c
|
Forgot to re-add in setting the weight's dtype on model load
|
2023-08-22 22:57:23 -05:00 |
|
|
9c5a33bfd2
|
added repo with my weights so far
|
2023-08-22 13:09:44 -05:00 |
|
|
7b1b82e0e5
|
inferencing cleanup
|
2023-08-20 21:36:02 -05:00 |
|
|
a47029065b
|
I don't know if the lack of start/stop tokens being added was causing my inference tests to fail, but it seems better now
|
2023-08-20 19:21:54 -05:00 |
|
|
736c077282
|
ops
|
2023-08-20 13:42:18 -05:00 |
|
|
b105f6211e
|
added ability to export weights mid-training to avoid CBT to yank the weights while the training script is running
|
2023-08-20 13:39:58 -05:00 |
|
|
fc576010ce
|
wrapped saving the checkpoint in a try/catch so I can stop waking up to the damn trainer crashing because it ran out of disk space; I'd much rather it keep training to give me time to eventually clear up disk space rather than it silently restarting on its own
|
2023-08-20 06:29:17 -05:00 |
|
|
2d1a9f10c0
|
nightmare of spaghetti that might break compat; mechanism to increase RVQ bins of an existing model without retraining, keeps sampled proms/resps at max RVQ level and trim off excess levels according to what model receives them, some other things I already forgot (I really hope no one else has weights being baked right now)
|
2023-08-19 15:06:33 -05:00 |
|
|
f7f6d3bf6d
|
validated that SpeechX tasks cse and nse works, added a method to test each task by invoking python3 -m vall_e.data --action=tasks --tasks='sr,se,cse,nse'
|
2023-08-19 09:50:07 -05:00 |
|
|
6ca347e1e1
|
literally had a urethra moment before going to bed with a way to implement cse/nse tasks
|
2023-08-19 01:16:46 -05:00 |
|
|
8f42c578c9
|
setting up for allowing training for a partial amount of the speechx tasks (do NOT try this at home yet without a proper model, as performance is predecated on having a solid base vall-e model for the tasks
|
2023-08-19 00:16:08 -05:00 |
|
|
ae9d38aa31
|
forgot to have it pull from specified noise to the hdf5 dataset
|
2023-08-18 23:57:07 -05:00 |
|
|
77292c42f9
|
tested the training preparation for tasks ns, sr, and tse (I don't expect it to go well with only 2 RVQ bins)
|
2023-08-18 23:55:40 -05:00 |
|
|
bbb0563b3d
|
pseudocode polyfill stub some other flavor of working on adding the tasks
|
2023-08-18 22:22:13 -05:00 |
|
|
0b46c1e312
|
god I am inexperienced with retaining compat from previous weights, I hope no one actually has weights
|
2023-08-18 21:29:20 -05:00 |
|
|
508677fcd5
|
repaired auraloss loss calc during eval/val
|
2023-08-18 21:19:47 -05:00 |
|
|
fb4e816823
|
oops
|
2023-08-18 21:11:19 -05:00 |
|
|
2a71486cb6
|
preparing for SpeechX extensions
|
2023-08-18 20:58:07 -05:00 |
|
|
ced31fd9b7
|
removed the sampler as it's very misleading
|
2023-08-18 14:47:48 -05:00 |
|
|
8e7f900210
|
forgot the =
|
2023-08-17 19:07:59 -05:00 |
|
|
3ff7cf8341
|
maybe fix evaluation dataset not being capped to cfg.evaluation.size
|
2023-08-17 18:56:37 -05:00 |
|
|
ee58db746f
|
actually make the evaluation dataset shuffled for sample_type=speaker
|
2023-08-17 15:04:45 -05:00 |
|
|
18403a3523
|
maybe fixes eval dataloader not shuffling under distributed
|
2023-08-17 13:41:53 -05:00 |
|
|
03872b823f
|
why did I type rglob, another 10 bucks down the drain...
|
2023-08-17 00:11:29 -05:00 |
|