|
188d116222
|
some weird fixes for an equally weird regression with LoRA loading
|
2024-07-22 20:47:24 -05:00 |
|
|
e33c4b0cb1
|
oops
|
2024-07-22 19:38:39 -05:00 |
|
|
75b04686f8
|
added prom-less training / inferencing, some other things
|
2024-07-22 19:36:07 -05:00 |
|
|
491ae2a684
|
some insanity for sanity checks (some phonemes from phonemizing japanese are not in my tokenizer...)
|
2024-07-22 00:30:40 -05:00 |
|
|
ad024f400f
|
actually pass language into dataset process script, fix coercing japanese into hiragana because espeak does not like kanji
|
2024-07-21 23:21:37 -05:00 |
|
|
3e5ca3a201
|
more demo page tweaks
|
2024-07-21 19:31:13 -05:00 |
|
|
7366f36f81
|
oops
|
2024-07-21 19:17:25 -05:00 |
|
|
e19aa643a6
|
cleaned up demo page creation, added option to pass in RVQ level sampling distribution for training
|
2024-07-21 19:12:03 -05:00 |
|
|
ba7ee8c0ee
|
added demo link to readme
|
2024-07-19 21:22:30 -05:00 |
|
|
9ec88d9444
|
validated passing URI path for assets instead of base64 encoding them
|
2024-07-19 21:07:17 -05:00 |
|
|
d87b492295
|
added rudimentary demo page creator (currently just embeds base64 wavs into the page, need to test not doing that)
|
2024-07-19 20:49:40 -05:00 |
|
|
d53038a9e4
|
actually have split classifiers working
|
2024-07-19 15:33:31 -05:00 |
|
|
692d09f9c1
|
eval/validation fix for SpeechX tasks
|
2024-07-19 09:16:37 -05:00 |
|
|
28a674e0f1
|
fixes...
|
2024-07-18 23:25:32 -05:00 |
|
|
39f961abcd
|
test trainer (vall_e.models.ar_nar) tests some SpeechX features
|
2024-07-18 18:46:45 -05:00 |
|
|
83a0954f85
|
fixes for re-introducing SpeechX tasks (need to actually validate if these all do the right things)
|
2024-07-18 17:16:32 -05:00 |
|
|
bccbb77a1a
|
added option to either naively concat codes to concat audio waveforms (prior behavior) or to decode => concat => encode instead (although this only currently happens for prom sampling if an utternace is too small)
|
2024-07-18 16:48:41 -05:00 |
|
|
97e768601c
|
re-introducing SpeechX tasks (need to validate them all, everything works with base tts anyways)
|
2024-07-18 16:16:14 -05:00 |
|
|
c2b8035e74
|
oops, kept forgetting to actually pass in lang/tone tokens (despite not really using these at the moment)
|
2024-07-18 14:18:34 -05:00 |
|
|
22fe53508c
|
added experimental disjointed position IDs (because I *think* this might help because technically a sequence is made up of several parts, and the position embeddings shouldn't be unified)
|
2024-07-16 19:52:41 -05:00 |
|
|
fe0f235335
|
mechanism to store the model config inside the weights and load them, some other things to allow LoRA training on the RetNet (gradient checkpointing will gripe about inputs not having require_grad and nothing seems to remedy it)
|
2024-07-16 18:23:13 -05:00 |
|
|
3acc54df22
|
allow loading a different model within the web ui (apparently I did not have the web UI in the documentation)
|
2024-07-15 19:59:48 -05:00 |
|
|
7b210d9738
|
sanity cleanup
|
2024-07-04 15:58:08 -05:00 |
|
|
1ecf2793f4
|
(commented-out) support for facebookresearch/AudioDec, but support really didn't wow me (so I commented it out until I figure out why my output audio is super crusty with AudioDec)
|
2024-07-04 15:40:51 -05:00 |
|
|
f770467eb3
|
stuff
|
2024-07-01 18:13:29 -05:00 |
|
|
312a8e3ead
|
add shuffle to samplers that can support it
|
2024-06-30 11:36:46 -05:00 |
|
|
396af541c5
|
ugh
|
2024-06-30 11:11:58 -05:00 |
|
|
dced595391
|
more cleanup
|
2024-06-30 11:00:12 -05:00 |
|
|
bc2a6fa756
|
sanity cleanup: moved experimental features under its own thing
|
2024-06-30 10:37:33 -05:00 |
|
|
b21f74a5c5
|
added summing of external embeddings (at this point i dont think any amount of cope bandaids will get DAC to train nicely, I think the RVQ levels the NAR tends add too much noise if they're not accurate)
|
2024-06-29 23:42:30 -05:00 |
|
|
793ccb16fb
|
ugh
|
2024-06-29 22:14:35 -05:00 |
|
|
2808f881c8
|
cleaned up subjugated audio embedding into a flag, flag can also have it include the original, underlying embedding as well (it seems to do better when set to inclusive)
|
2024-06-29 21:46:35 -05:00 |
|
|
ec5eaebcbc
|
experimental method of using DACs quantizer ""embeddings"" to see if it helps with model quality
|
2024-06-29 19:46:11 -05:00 |
|
|
a8718d35a4
|
nasty bandaid because some of my DAC dataset only has 8 RVQ levels instead of the full 9
|
2024-06-29 10:16:37 -05:00 |
|
|
c4dd523b6f
|
change from chunk-slicing paths for distributed dataloader to instead interleave
|
2024-06-29 10:10:35 -05:00 |
|
|
dd40463803
|
limit eval size because the training batch size seems to be used for the eval dataloader, somehow (bandaid)
|
2024-06-29 09:11:28 -05:00 |
|
|
591d3ac848
|
have eval dataloader use eval batch size for batchedordersampler
|
2024-06-28 22:44:00 -05:00 |
|
|
1a392b69f6
|
local training backend should be a bit more aware of variable batch sizes, maybe
|
2024-06-28 22:39:05 -05:00 |
|
|
83075c1505
|
sort duration buckets to ensure that paths sorted-by-duration are actually sorted by duration (because i didnt know that python dicts can have non-strings as keys), added batching samples based on total duration to ensure best training throughput
|
2024-06-28 22:28:54 -05:00 |
|
|
8fffb94964
|
backport fix from tortoise_tts with local trainer + loading state when training lora
|
2024-06-25 13:41:29 -05:00 |
|
|
62a53eed64
|
fixed deducing tokenizer path, added option to default to naive tokenizer (for old models, like ar+nar-retnet-8)
|
2024-06-18 22:11:14 -05:00 |
|
|
8a986eb480
|
load exported LoRA weights if exists (to-do: make a better LoRA loading mechanism)
|
2024-06-18 21:45:46 -05:00 |
|
|
2bfe786ebd
|
ban stop token for NAR levels (because sometimes it gets sampled and causes problems)
|
2024-06-17 22:14:43 -05:00 |
|
|
7cfb78fa64
|
enable LoRA for targetted RVQ levels (to experiment with, seems to help)
|
2024-06-17 21:45:03 -05:00 |
|
|
7047fcc6e2
|
actually make deepspeed work with LoRAs
|
2024-06-17 13:55:37 -05:00 |
|
|
1d159b1476
|
updated export routine to split LoRA weights from the state dict (should work with deepspeed)
|
2024-06-17 13:28:18 -05:00 |
|
|
726a4b613f
|
naive, rudimentary DeepSpeed support (just live with the LoRA weights living with the original weights, they can be split later)
|
2024-06-17 13:17:24 -05:00 |
|
|
bd0bc10ec0
|
added LoRA policy to decide what layer of the model gets adapted based on simple inclusion/exclusion terms
|
2024-06-17 13:05:06 -05:00 |
|
|
be051d9544
|
added other LoRA method using parametrization rather than linear injection
|
2024-06-17 09:58:34 -05:00 |
|
|
45a39fb79f
|
very rudimentary lora support (no deepspeed support, tested training and saving but not loading yet)
|
2024-06-17 00:09:16 -05:00 |
|
|
19410a919e
|
ugh
|
2024-06-15 12:29:03 -05:00 |
|
|
d343bde09b
|
residual_in_fp32=False for mamba arch backends because it breaks the classifier (output projection / lm head / what-have-you) under AMP
|
2024-06-15 12:08:03 -05:00 |
|
|
ccb14c06ef
|
mamba2-hf using vasqu/mamba2-torch because it lets me use mamba2 without triton ops (training with my 4xV100s are not happy with mamba2 because of triton)
|
2024-06-14 19:42:17 -05:00 |
|
|
31f71fa134
|
sampler update (some brainworm just never actually had a sampler for sample_type=path)
|
2024-06-14 16:55:40 -05:00 |
|
|
b3b67f34ac
|
added option to sort paths by durations to better group equally lengthed sequences together (and there was maybe a logic error from creating the samplers and then interleave-reordering paths, desyncing them, maybe)
|
2024-06-13 22:37:34 -05:00 |
|
|
83eab4fa59
|
actually going for the suggested "2x layers, no intermediate scaling" is wrong for VALL-E, directly copying the normal transformer structure fixes mamba2 performance in the test trainer
|
2024-06-13 20:08:22 -05:00 |
|
|
26da24fd8d
|
mamba updated to fix that pesky NaN error during training
|
2024-06-13 12:38:33 -05:00 |
|
|
bcf3910a17
|
the NAR only dream is dead (it just won't work)
|
2024-06-12 19:49:47 -05:00 |
|
|
a9353cf9fa
|
ugh
|
2024-06-12 00:14:29 -05:00 |
|
|
cca542a4c0
|
ugh
|
2024-06-11 23:59:28 -05:00 |
|
|
65a8960305
|
option to split classifier per-level instead of sharing one (at this point I'm just scrambling to try and cope with training a DAC model, the NAR is being a pain)
|
2024-06-11 22:28:59 -05:00 |
|
|
a7a6e0ac76
|
validated that inferencing works, changed some defaults (NAR benefits from greedy sampling)
|
2024-06-09 17:11:38 -05:00 |
|
|
234f9efc6e
|
ugh
|
2024-06-09 11:39:43 -05:00 |
|
|
132a02c48b
|
sanity cleanup, backup config yaml for each log file
|
2024-06-09 11:22:52 -05:00 |
|
|
8d92dac829
|
forgot I renamed this
|
2024-06-09 11:12:30 -05:00 |
|
|
80f9530840
|
ugh
|
2024-06-09 01:43:44 -05:00 |
|
|
5c732b72ee
|
ugh
|
2024-06-08 20:34:00 -05:00 |
|
|
8d068fa3f9
|
reticulating splines
|
2024-06-08 20:30:15 -05:00 |
|
|
ead3e2f0cb
|
ugh
|
2024-06-08 16:14:57 -05:00 |
|
|
b072f9b96b
|
fixes
|
2024-06-08 16:01:34 -05:00 |
|
|
58fb0a84db
|
added experimental NAR only model (inferences text length, need more experimenting), AudioEmbedding logic cleanup (I still think it's being done wrong)
|
2024-06-08 15:42:02 -05:00 |
|
|
e35a91c67a
|
ugh
|
2024-06-07 21:56:14 -05:00 |
|
|
7d6fff24f9
|
un-tensor'd quant_level marker since it doesn't need to be one (I forgot why I had it as one but nothing seems to need it as a tensor that didn't already make it one)
|
2024-06-07 20:46:22 -05:00 |
|
|
b0158a61d5
|
fixed some logic errors with training (grabbing wrong quant level...)
|
2024-06-07 20:34:36 -05:00 |
|
|
eafa622be2
|
I forgot the actual reason I was cleaning things up was to re-include prom loss calculation (I realized the reason I did this was because of an prom embedding oversight, it seems to work now)
|
2024-06-07 20:29:25 -05:00 |
|
|
da8242d086
|
finally got around to removing omegaconf
|
2024-06-07 20:23:53 -05:00 |
|
|
4ade2b60ee
|
ugh
|
2024-06-06 21:57:11 -05:00 |
|
|
f9f309281a
|
ugh
|
2024-06-06 20:55:27 -05:00 |
|
|
a5c90348d9
|
head hurt
|
2024-06-06 20:51:31 -05:00 |
|
|
516b0894d7
|
m
|
2024-06-06 19:41:26 -05:00 |
|
|
ee25d2e62e
|
removed the need to supply targ_list + different AudioEmbedding + other things
|
2024-06-06 18:52:41 -05:00 |
|
|
fcac9503e2
|
cleanup
|
2024-06-06 13:08:02 -05:00 |
|
|
b2194b859a
|
re-added loading multiple models because I'm now entertaining having split AR/NAR models again (and need a way to load both at once)
|
2024-06-06 09:48:43 -05:00 |
|
|
b05a905b95
|
ugh
|
2024-06-05 21:02:05 -05:00 |
|
|
4073656293
|
oops
|
2024-06-05 20:53:10 -05:00 |
|
|
ff6fe6f1bc
|
cleanup
|
2024-06-05 20:30:43 -05:00 |
|
|
880b4ecd1b
|
cleanup, putting some thoughts in comments before I forget about them
|
2024-06-05 19:50:06 -05:00 |
|
|
3cfc8a96bb
|
oops
|
2024-06-05 10:30:04 -05:00 |
|
|
48cd1054f9
|
madness
|
2024-06-04 23:48:51 -05:00 |
|
|
9e3f2e300f
|
experimental "just have a token for what rvq level we're on" that seems to help all models (mamba almost works, but it might just have to be relegated as a pure AR model)
|
2024-06-04 23:23:31 -05:00 |
|
|
e0886c5a78
|
re-added mamba as a possible non-experimental arch backend (test trainer will set it as AR only, doing any NAR tasks lobotomizes it)
|
2024-06-04 22:41:22 -05:00 |
|
|
687c71e028
|
disable accuracy calc because it breaks with actual batched training even though it shouldn't
|
2024-06-04 22:13:44 -05:00 |
|
|
d005e24953
|
oops
|
2024-06-04 22:10:04 -05:00 |
|
|
0f7f3ae754
|
added loss calc split and acc for experimental model
|
2024-06-04 22:04:40 -05:00 |
|
|
014e565c4b
|
tweaks
|
2024-06-04 20:41:13 -05:00 |
|
|
6d5bd0156a
|
fixes
|
2024-06-04 18:50:48 -05:00 |
|
|
ed3aeaf3a1
|
copy pasted from test to actual trainer
|
2024-06-04 18:40:30 -05:00 |
|
|
0aa01ba31a
|
forgot one crucial detail (you *need* the previous RVQ level to keep coherence between all RVQ levels) (experimental deinterleaved is a bit crusty though)
|
2024-06-04 18:30:30 -05:00 |
|
|
2ffad5cb6f
|
typo
|
2024-06-04 14:20:57 -05:00 |
|
|
406ff7bbe1
|
re-implemented config.model.interleave for the HF-compat experimental method
|
2024-06-04 14:19:52 -05:00 |
|
|
c93d5863fd
|
fixes
|
2024-06-04 00:07:00 -05:00 |
|
|
186b93a77e
|
oops
|
2024-06-03 22:35:55 -05:00 |
|
|
e50edc3b48
|
added a flag to convert to a HF compatible model on export by stitching things
|
2024-06-03 22:34:47 -05:00 |
|
|
934672252b
|
feverish cleanup
|
2024-06-03 21:28:49 -05:00 |
|
|
7feeb944a0
|
probably insane with even entertaining going this route
|
2024-06-03 20:26:27 -05:00 |
|
|
c2a436d368
|
somehow between training sessions grad_norm = None even though it worked before
|
2024-06-02 08:29:27 -05:00 |
|
|
c1fcd889d5
|
reverted automatically disabling split loss calc, since it seems that it's actually cacling loss on prom causes the oddities, maybe
|
2024-06-01 12:34:59 -05:00 |
|
|
8cf176ab46
|
ugh
|
2024-06-01 10:46:42 -05:00 |
|
|
827cf632e7
|
report current loss scale and adjust grad norm by loss scale (for deepspeed)
|
2024-06-01 10:44:32 -05:00 |
|
|
d0ebce6bac
|
ugh
|
2024-06-01 10:30:13 -05:00 |
|
|
39bc019142
|
actually save per-rank sampler states
|
2024-06-01 09:46:32 -05:00 |
|
|
74df2f5332
|
split sampler dict by global_rank, also handle splitting dataset paths by global_rank if sampler_type == path (because I do not trust DistributedSampler) (need to test)
|
2024-06-01 09:29:49 -05:00 |
|
|
31785f4eeb
|
actually don't default to compute split losses, test bitnet model doesn't seem to be doing things right (despite debug printouts showing theyre roughly the same logit/loss sequences, could just be bitnet linears being not up to par on actual models)
|
2024-06-01 09:12:51 -05:00 |
|
|
e9c87060df
|
oops
|
2024-05-31 22:22:28 -05:00 |
|
|
b482ca19ff
|
added model config option to set KV head count for MQA/GQA instead of MHA for llama-based models (i think its very negligible both ways on such a small model size)
|
2024-05-31 19:32:37 -05:00 |
|
|
e15c6c74c3
|
correctness
|
2024-05-30 20:50:45 -05:00 |
|
|
da473295b7
|
better way to compute per-segment losses
|
2024-05-28 19:29:54 -05:00 |
|
|
6c49ad06a3
|
forgot to reinclude mult by loss factors
|
2024-05-27 20:40:21 -05:00 |
|
|
b82f0d5c0c
|
finally nailed the issue that caused logging to break on one machine but not another (bitnet includes zetascale which is a parasite that will break logging)
|
2024-05-27 19:47:58 -05:00 |
|
|
c0ac84c795
|
uh
|
2024-05-27 19:05:56 -05:00 |
|
|
197d517181
|
ugh
|
2024-05-27 17:09:35 -05:00 |
|
|
5af6f41c94
|
added loss calcs against prom (requires the right settings for not shit results, disabled by default)
|
2024-05-27 08:43:00 -05:00 |
|
|
05cd8b797e
|
nevermind it breaks training
|
2024-05-25 18:03:43 -05:00 |
|
|
85f9684720
|
some cleanup
|
2024-05-25 17:46:52 -05:00 |
|
|
d760924719
|
added kludgy eval only so I don't have to start training, type eval, stop training, then delete the logs for that session
|
2024-05-25 17:39:51 -05:00 |
|
|
ddbacde0d1
|
DAC just doesn't work well enough......
|
2024-05-25 11:07:52 -05:00 |
|
|
e3ef89f5aa
|
100x better for subtrain/eval to be by group instead
|
2024-05-19 16:40:14 -05:00 |
|
|
458b95d196
|
added option to split between text loss and audio loss (to-do: document this better), because it may or may not be a problem with LLaMA-backed models because my loss hovers around 3.9 / 56% accuracy despite sounding decent at the moment
|
2024-05-19 11:23:56 -05:00 |
|
|
74e531d391
|
ugh
|
2024-05-18 12:02:56 -05:00 |
|
|
4bc7e5a6d1
|
fix loading without needing an hdf5 dataset already prepped (and some other incidental speedups during dataloader prep)
|
2024-05-18 07:14:26 -05:00 |
|
|
d88a5ca183
|
ugh
|
2024-05-16 07:25:33 -05:00 |
|
|
d9aabfa3ae
|
final tweaks, hopefully, again
|
2024-05-15 23:04:19 -05:00 |
|
|
8d79f78e0a
|
god I need to replace omegaconf
|
2024-05-12 14:01:52 -05:00 |
|
|
5eb5db7f7f
|
just don't use DAC 24Khz, it's bad
|
2024-05-12 13:41:17 -05:00 |
|
|
230da8b559
|
should be the final things to scramble around for, DAC's 24KHz model is unusable for this, but both encodec's 24KHz and DAC's 44KHz work
|
2024-05-12 13:22:08 -05:00 |
|
|
2437a86efa
|
ugh
|
2024-05-12 13:02:15 -05:00 |
|
|
4f1593c8db
|
a bunch of shit to salvage my old encodec-quantized audio because dac-encoded audio just does not want to converge
|
2024-05-12 10:17:29 -05:00 |
|
|
917eeb40d2
|
ughhh
|
2024-05-12 08:22:39 -05:00 |
|
|
9910c75d5a
|
checkpointing for bitnet impl
|
2024-05-12 07:52:54 -05:00 |
|
|
14709ac67f
|
ughh
|
2024-05-12 07:30:59 -05:00 |
|
|
3774fcbdee
|
ugh
|
2024-05-11 22:58:38 -05:00 |
|
|
856545f8bb
|
nan loss detection (should have added it earlier), loss scaling for local backend + fp16
|
2024-05-11 22:23:29 -05:00 |
|
|
a755eb3c62
|
ugh
|
2024-05-11 17:34:45 -05:00 |
|
|
88e9b9caff
|
local ddp fix
|
2024-05-11 17:29:01 -05:00 |
|
|
3337c69e5a
|
leverage between xformers and torch.backends.cuda.sdp_kernel for attention
|
2024-05-11 17:14:05 -05:00 |
|
|
d33c7bb7cf
|
ugh
|
2024-05-11 16:47:19 -05:00 |
|
|
0b6499601b
|
sanitizing
|
2024-05-11 16:31:05 -05:00 |
|
|
71e373064f
|
remove redundant loss, tweak readme
|
2024-05-11 15:02:47 -05:00 |
|
|
04a80d6b55
|
maybe it's better to be more explicit in deepspeed configs
|
2024-05-11 13:57:43 -05:00 |
|
|
4d93a16ef7
|
might just be better to explicitly define prompt duration ranges, especially under a "train small contexts then increase it" training paradigm
|
2024-05-11 09:50:54 -05:00 |
|