|
0f7f3ae754
|
added loss calc split and acc for experimental model
|
2024-06-04 22:04:40 -05:00 |
|
|
014e565c4b
|
tweaks
|
2024-06-04 20:41:13 -05:00 |
|
|
6d5bd0156a
|
fixes
|
2024-06-04 18:50:48 -05:00 |
|
|
ed3aeaf3a1
|
copy pasted from test to actual trainer
|
2024-06-04 18:40:30 -05:00 |
|
|
0aa01ba31a
|
forgot one crucial detail (you *need* the previous RVQ level to keep coherence between all RVQ levels) (experimental deinterleaved is a bit crusty though)
|
2024-06-04 18:30:30 -05:00 |
|
|
2ffad5cb6f
|
typo
|
2024-06-04 14:20:57 -05:00 |
|
|
406ff7bbe1
|
re-implemented config.model.interleave for the HF-compat experimental method
|
2024-06-04 14:19:52 -05:00 |
|
|
c93d5863fd
|
fixes
|
2024-06-04 00:07:00 -05:00 |
|
|
186b93a77e
|
oops
|
2024-06-03 22:35:55 -05:00 |
|
|
e50edc3b48
|
added a flag to convert to a HF compatible model on export by stitching things
|
2024-06-03 22:34:47 -05:00 |
|
|
934672252b
|
feverish cleanup
|
2024-06-03 21:28:49 -05:00 |
|
|
7feeb944a0
|
probably insane with even entertaining going this route
|
2024-06-03 20:26:27 -05:00 |
|
|
c2a436d368
|
somehow between training sessions grad_norm = None even though it worked before
|
2024-06-02 08:29:27 -05:00 |
|
|
c1fcd889d5
|
reverted automatically disabling split loss calc, since it seems that it's actually cacling loss on prom causes the oddities, maybe
|
2024-06-01 12:34:59 -05:00 |
|
|
8cf176ab46
|
ugh
|
2024-06-01 10:46:42 -05:00 |
|
|
827cf632e7
|
report current loss scale and adjust grad norm by loss scale (for deepspeed)
|
2024-06-01 10:44:32 -05:00 |
|
|
d0ebce6bac
|
ugh
|
2024-06-01 10:30:13 -05:00 |
|
|
39bc019142
|
actually save per-rank sampler states
|
2024-06-01 09:46:32 -05:00 |
|
|
74df2f5332
|
split sampler dict by global_rank, also handle splitting dataset paths by global_rank if sampler_type == path (because I do not trust DistributedSampler) (need to test)
|
2024-06-01 09:29:49 -05:00 |
|
|
31785f4eeb
|
actually don't default to compute split losses, test bitnet model doesn't seem to be doing things right (despite debug printouts showing theyre roughly the same logit/loss sequences, could just be bitnet linears being not up to par on actual models)
|
2024-06-01 09:12:51 -05:00 |
|
|
e9c87060df
|
oops
|
2024-05-31 22:22:28 -05:00 |
|
|
b482ca19ff
|
added model config option to set KV head count for MQA/GQA instead of MHA for llama-based models (i think its very negligible both ways on such a small model size)
|
2024-05-31 19:32:37 -05:00 |
|
|
e15c6c74c3
|
correctness
|
2024-05-30 20:50:45 -05:00 |
|
|
da473295b7
|
better way to compute per-segment losses
|
2024-05-28 19:29:54 -05:00 |
|
|
6c49ad06a3
|
forgot to reinclude mult by loss factors
|
2024-05-27 20:40:21 -05:00 |
|
|
b82f0d5c0c
|
finally nailed the issue that caused logging to break on one machine but not another (bitnet includes zetascale which is a parasite that will break logging)
|
2024-05-27 19:47:58 -05:00 |
|
|
c0ac84c795
|
uh
|
2024-05-27 19:05:56 -05:00 |
|
|
197d517181
|
ugh
|
2024-05-27 17:09:35 -05:00 |
|
|
5af6f41c94
|
added loss calcs against prom (requires the right settings for not shit results, disabled by default)
|
2024-05-27 08:43:00 -05:00 |
|
|
05cd8b797e
|
nevermind it breaks training
|
2024-05-25 18:03:43 -05:00 |
|
|
85f9684720
|
some cleanup
|
2024-05-25 17:46:52 -05:00 |
|
|
d760924719
|
added kludgy eval only so I don't have to start training, type eval, stop training, then delete the logs for that session
|
2024-05-25 17:39:51 -05:00 |
|
|
ddbacde0d1
|
DAC just doesn't work well enough......
|
2024-05-25 11:07:52 -05:00 |
|
|
e3ef89f5aa
|
100x better for subtrain/eval to be by group instead
|
2024-05-19 16:40:14 -05:00 |
|
|
458b95d196
|
added option to split between text loss and audio loss (to-do: document this better), because it may or may not be a problem with LLaMA-backed models because my loss hovers around 3.9 / 56% accuracy despite sounding decent at the moment
|
2024-05-19 11:23:56 -05:00 |
|
|
74e531d391
|
ugh
|
2024-05-18 12:02:56 -05:00 |
|
|
4bc7e5a6d1
|
fix loading without needing an hdf5 dataset already prepped (and some other incidental speedups during dataloader prep)
|
2024-05-18 07:14:26 -05:00 |
|
|
d88a5ca183
|
ugh
|
2024-05-16 07:25:33 -05:00 |
|
|
d9aabfa3ae
|
final tweaks, hopefully, again
|
2024-05-15 23:04:19 -05:00 |
|
|
8d79f78e0a
|
god I need to replace omegaconf
|
2024-05-12 14:01:52 -05:00 |
|
|
5eb5db7f7f
|
just don't use DAC 24Khz, it's bad
|
2024-05-12 13:41:17 -05:00 |
|
|
230da8b559
|
should be the final things to scramble around for, DAC's 24KHz model is unusable for this, but both encodec's 24KHz and DAC's 44KHz work
|
2024-05-12 13:22:08 -05:00 |
|
|
2437a86efa
|
ugh
|
2024-05-12 13:02:15 -05:00 |
|
|
4f1593c8db
|
a bunch of shit to salvage my old encodec-quantized audio because dac-encoded audio just does not want to converge
|
2024-05-12 10:17:29 -05:00 |
|
|
917eeb40d2
|
ughhh
|
2024-05-12 08:22:39 -05:00 |
|
|
9910c75d5a
|
checkpointing for bitnet impl
|
2024-05-12 07:52:54 -05:00 |
|
|
14709ac67f
|
ughh
|
2024-05-12 07:30:59 -05:00 |
|
|
3774fcbdee
|
ugh
|
2024-05-11 22:58:38 -05:00 |
|
|
856545f8bb
|
nan loss detection (should have added it earlier), loss scaling for local backend + fp16
|
2024-05-11 22:23:29 -05:00 |
|
|
a755eb3c62
|
ugh
|
2024-05-11 17:34:45 -05:00 |
|
|
88e9b9caff
|
local ddp fix
|
2024-05-11 17:29:01 -05:00 |
|
|
3337c69e5a
|
leverage between xformers and torch.backends.cuda.sdp_kernel for attention
|
2024-05-11 17:14:05 -05:00 |
|
|
d33c7bb7cf
|
ugh
|
2024-05-11 16:47:19 -05:00 |
|
|
0b6499601b
|
sanitizing
|
2024-05-11 16:31:05 -05:00 |
|
|
71e373064f
|
remove redundant loss, tweak readme
|
2024-05-11 15:02:47 -05:00 |
|
|
04a80d6b55
|
maybe it's better to be more explicit in deepspeed configs
|
2024-05-11 13:57:43 -05:00 |
|
|
4d93a16ef7
|
might just be better to explicitly define prompt duration ranges, especially under a "train small contexts then increase it" training paradigm
|
2024-05-11 09:50:54 -05:00 |
|
|
bd0a36ba8d
|
I swear I keep seeing tqdm flicker back a number
|
2024-05-10 18:36:01 -05:00 |
|
|
2109712e5b
|
resolve deprecation warning that doesn't show on my old training rig but does on my new one
|
2024-05-09 23:25:44 -05:00 |
|
|
1547de5020
|
haha...
|
2024-05-09 23:15:52 -05:00 |
|
|
b7bd885651
|
some possible sanity with deepspeed config
|
2024-05-09 22:48:42 -05:00 |
|
|
c4b696ebeb
|
oops
|
2024-05-09 22:33:40 -05:00 |
|
|
c22a177cf8
|
forgot to pass warmup to schedule free
|
2024-05-09 22:18:49 -05:00 |
|
|
b6131565ad
|
autotune?
|
2024-05-09 21:25:40 -05:00 |
|
|
6ed6ab8c03
|
a bit more cleanup for deepspeed ds_cfg creation
|
2024-05-09 21:00:26 -05:00 |
|
|
0d5d545a40
|
crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc.
|
2024-05-09 20:28:20 -05:00 |
|
|
c6e0f905b5
|
final tweaks (again) before training restarts
|
2024-05-08 02:11:38 -05:00 |
|
|
215800484d
|
correcting my wrong of assuming I could just use raw 24Khz audio in the 44Khz DAC without too much of an issue (there are issues)
|
2024-05-04 23:49:15 -05:00 |
|
|
9f738fbd5b
|
seems I actually don't need RVQ bins 9-32 with the 24Khz DAC model........ (time to requantize my audio...)
|
2024-05-04 23:09:18 -05:00 |
|
|
33b7f81b94
|
small cleanups
|
2024-05-04 22:37:22 -05:00 |
|
|
8aa1b2dabf
|
documentation update
|
2024-05-04 21:03:46 -05:00 |
|
|
253441b750
|
forgot to disable verbose flag
|
2024-05-04 13:13:52 -05:00 |
|
|
3dca1125f5
|
implemented xformers in HF's Llama (because theres no flash attention for Volta cards)
|
2024-05-04 13:07:45 -05:00 |
|
|
277dcec484
|
apparently I got an error for trying to serialize an errant tensor that made its way into the json, this could be remedied easily with recursively traversing the dict and coercing any objects to primitives, but I'm tired and I just want to start training and nap
|
2024-05-04 12:33:43 -05:00 |
|
|
ffa200eec7
|
added option to specify frames per second for the given audio representation (Encodec is 75Hz, DAC is 41Hz (at 24K sources))
|
2024-05-04 12:05:41 -05:00 |
|
|
c494894261
|
simple DDP wrapper (for my NVlink test)
|
2024-05-04 11:48:26 -05:00 |
|
|
a7b43b98b5
|
renamed cfg.bitsandbytes to cfg.optimizations (and having it serve as cfg.optimizations.bitsandbytes)
|
2024-05-02 20:08:59 -05:00 |
|
|
b5d1456a09
|
backwards compat for my shitty old weights (was testing if disabling AudioEmbedding summing magically made things better (it did not))
|
2024-04-29 22:14:01 -05:00 |
|
|
5120ffdda7
|
god it would be nice to know the best way to handle audio embeddings, because I genuinely don't know without skimming through papers or devoting X amount of GPU hours in training
|
2024-04-29 18:24:05 -05:00 |
|
|
6a11bc9cb6
|
update tokenizer because, for some reason, it had the wrong order for the special tokens to where eos = unk
|
2024-04-29 09:09:26 -05:00 |
|
|
57810e4ba4
|
metadata only path (might drop HDF5 since its giving file sizes twice as large as my actual unpacked dataset)
|
2024-04-28 23:03:09 -05:00 |
|
|
caad7ee3c9
|
final tweaks, hopefully
|
2024-04-28 22:28:29 -05:00 |
|
|
ffc334cf58
|
added dataset transcription helper script (now I don't ever have to touch ai-voice-cloning) (to-do: unify scripts into the module)
|
2024-04-21 17:43:20 -05:00 |
|
|
b251669536
|
forgot to fix up the test trainer
|
2024-04-21 14:58:04 -05:00 |
|
|
071fb97777
|
dataset preparation script updates, caved and am using HF tokenizer now
|
2024-04-21 14:49:18 -05:00 |
|
|
a8ffa88844
|
it slipped my mind that technically DAC can be used at any sample rate, since it models waveforms; make it a config YAML option to allow this behavior
|
2024-04-19 18:36:54 -05:00 |
|
|
8214aa23d7
|
converting over to a different intermediary dataset format
|
2024-04-18 21:24:06 -05:00 |
|
|
4f5c9e518a
|
actually use the passed-through sample rate from encode for DAC because it does its own resampling I guess
|
2024-04-18 13:32:41 -05:00 |
|
|
2e9e6e68f7
|
Forgot I need to use the DAC's 44K model because 24K model has 32 codebooks instead of 9.
|
2024-04-17 20:59:25 -05:00 |
|
|
5ff2b4aab5
|
finally swallowing the Descript-Audio-Codec pill (I guess I'm going to have to regenerate my entire dataset)
|
2024-04-17 20:39:35 -05:00 |
|
|
b0bd88833c
|
refractor cleanup, had a revelation on how I can handle a batch of varying tasks
|
2024-04-16 21:04:48 -05:00 |
|
|
467fa1c5ee
|
wrapper fixes
|
2024-04-16 10:19:02 -05:00 |
|
|
aa1e25fbf5
|
backwards compat for old YAMLs with models , option to set flash attention 2 for Llama (and derivatives), included syncdoth/RetNet s torchscale retnet for shits and grins, etc.
|
2024-04-16 10:02:31 -05:00 |
|
|
545162195b
|
deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things
|
2024-04-15 19:54:32 -05:00 |
|
|
d69a00e389
|
Properly pass retention_mask for retnet-HF, attempt to fix recurrent forward for retnet (doesn't work still)
|
2024-04-14 13:12:50 -05:00 |
|
|
789bb5d11b
|
add an optional label override for model loading (used for easy testing between 12/16/20/24 layered model)
|
2024-04-13 12:43:35 -05:00 |
|
|
f0c4baeb25
|
added Adagrad (experimenting with it), added 'extended' model size (16 layers instead of 12, experimenting with it)
|
2024-04-09 22:04:01 -05:00 |
|
|
4d75ee066c
|
actually do the Linear replacement with TE's Linear
|
2024-04-09 14:41:13 -05:00 |
|
|
9d97eb5104
|
added FP8 support through NVIDIA/TransformerEngine , added RetNet_HF through syncdoth/RetNet (as an alternative to branch away from torchscale)
|
2024-04-08 20:14:51 -05:00 |
|
|
7075c2a5f0
|
added an option to allow injecting embeddings from another model, because it dawned upon me how valuable embeddings from a good model can be for subsequent trainings (defined under cfg.models._embeddings as a relative path to the yaml)
|
2024-04-04 19:11:49 -05:00 |
|