vall-e

ecker/vall-e

Fork 0

Commit Graph

Select branches

Hide Pull Requests

master

#1

#1

#2

#25

#25

880b4ecd1b cleanup, putting some thoughts in comments before I forget about them mrq 2024-06-05 19:50:06 -0500
3cfc8a96bb oops mrq 2024-06-05 10:30:04 -0500
48cd1054f9 madness mrq 2024-06-04 23:48:51 -0500
9e3f2e300f experimental "just have a token for what rvq level we're on" that seems to help all models (mamba almost works, but it might just have to be relegated as a pure AR model) mrq 2024-06-04 23:23:31 -0500
e0886c5a78 re-added mamba as a possible non-experimental arch backend (test trainer will set it as AR only, doing any NAR tasks lobotomizes it) mrq 2024-06-04 22:41:22 -0500
687c71e028 disable accuracy calc because it breaks with actual batched training even though it shouldn't mrq 2024-06-04 22:13:44 -0500
d005e24953 oops mrq 2024-06-04 22:10:04 -0500
0f7f3ae754 added loss calc split and acc for experimental model mrq 2024-06-04 22:04:40 -0500
014e565c4b tweaks mrq 2024-06-04 20:41:13 -0500
6d5bd0156a fixes mrq 2024-06-04 18:50:48 -0500
ed3aeaf3a1 copy pasted from test to actual trainer mrq 2024-06-04 18:40:30 -0500
0aa01ba31a forgot one crucial detail (you *need* the previous RVQ level to keep coherence between all RVQ levels) (experimental deinterleaved is a bit crusty though) mrq 2024-06-04 18:30:30 -0500
2ffad5cb6f typo mrq 2024-06-04 14:20:57 -0500
406ff7bbe1 re-implemented config.model.interleave for the HF-compat experimental method mrq 2024-06-04 14:19:52 -0500
c93d5863fd fixes mrq 2024-06-04 00:07:00 -0500
186b93a77e oops mrq 2024-06-03 22:35:55 -0500
e50edc3b48 added a flag to convert to a HF compatible model on export by stitching things mrq 2024-06-03 22:34:47 -0500
934672252b feverish cleanup mrq 2024-06-03 21:28:49 -0500
7feeb944a0 probably insane with even entertaining going this route mrq 2024-06-03 20:26:27 -0500
c2a436d368 somehow between training sessions grad_norm = None even though it worked before mrq 2024-06-02 08:29:27 -0500
c1fcd889d5 reverted automatically disabling split loss calc, since it seems that it's actually cacling loss on prom causes the oddities, maybe mrq 2024-06-01 12:34:59 -0500
8cf176ab46 ugh mrq 2024-06-01 10:46:42 -0500
827cf632e7 report current loss scale and adjust grad norm by loss scale (for deepspeed) mrq 2024-06-01 10:44:32 -0500
d0ebce6bac ugh mrq 2024-06-01 10:30:13 -0500
39bc019142 actually save per-rank sampler states mrq 2024-06-01 09:46:32 -0500
74df2f5332 split sampler dict by global_rank, also handle splitting dataset paths by global_rank if sampler_type == path (because I do not trust DistributedSampler) (need to test) mrq 2024-06-01 09:29:49 -0500
31785f4eeb actually don't default to compute split losses, test bitnet model doesn't seem to be doing things right (despite debug printouts showing theyre roughly the same logit/loss sequences, could just be bitnet linears being not up to par on actual models) mrq 2024-06-01 09:12:51 -0500
e9c87060df oops mrq 2024-05-31 22:22:28 -0500
b482ca19ff added model config option to set KV head count for MQA/GQA instead of MHA for llama-based models (i think its very negligible both ways on such a small model size) mrq 2024-05-31 19:32:37 -0500
e15c6c74c3 correctness mrq 2024-05-30 20:50:45 -0500
da473295b7 better way to compute per-segment losses mrq 2024-05-28 19:29:54 -0500
6c49ad06a3 forgot to reinclude mult by loss factors mrq 2024-05-27 20:40:21 -0500
b82f0d5c0c finally nailed the issue that caused logging to break on one machine but not another (bitnet includes zetascale which is a parasite that will break logging) mrq 2024-05-27 19:47:58 -0500
c0ac84c795 uh mrq 2024-05-27 19:05:56 -0500
197d517181 ugh mrq 2024-05-27 17:09:35 -0500
5af6f41c94 added loss calcs against prom (requires the right settings for not shit results, disabled by default) mrq 2024-05-27 08:43:00 -0500
05cd8b797e nevermind it breaks training mrq 2024-05-25 18:03:43 -0500
85f9684720 some cleanup mrq 2024-05-25 17:46:52 -0500
d760924719 added kludgy eval only so I don't have to start training, type eval, stop training, then delete the logs for that session mrq 2024-05-25 17:39:51 -0500
ddbacde0d1 DAC just doesn't work well enough...... mrq 2024-05-25 11:07:52 -0500
e3ef89f5aa 100x better for subtrain/eval to be by group instead mrq 2024-05-19 16:40:14 -0500
458b95d196 added option to split between text loss and audio loss (to-do: document this better), because it may or may not be a problem with LLaMA-backed models because my loss hovers around 3.9 / 56% accuracy despite sounding decent at the moment mrq 2024-05-19 11:23:56 -0500
74e531d391 ugh mrq 2024-05-18 12:02:56 -0500
59ef9461f8 ugh mrq 2024-05-18 10:13:58 -0500
4bc7e5a6d1 fix loading without needing an hdf5 dataset already prepped (and some other incidental speedups during dataloader prep) mrq 2024-05-18 07:14:26 -0500
d88a5ca183 ugh mrq 2024-05-16 07:25:33 -0500
d9aabfa3ae final tweaks, hopefully, again mrq 2024-05-15 23:04:19 -0500
8d79f78e0a god I need to replace omegaconf mrq 2024-05-12 14:01:52 -0500
5eb5db7f7f just don't use DAC 24Khz, it's bad mrq 2024-05-12 13:41:17 -0500
230da8b559 should be the final things to scramble around for, DAC's 24KHz model is unusable for this, but both encodec's 24KHz and DAC's 44KHz work mrq 2024-05-12 13:22:08 -0500
2437a86efa ugh mrq 2024-05-12 13:02:15 -0500
4f1593c8db a bunch of shit to salvage my old encodec-quantized audio because dac-encoded audio just does not want to converge mrq 2024-05-12 10:17:29 -0500
917eeb40d2 ughhh mrq 2024-05-12 08:22:39 -0500
9910c75d5a checkpointing for bitnet impl mrq 2024-05-12 07:52:54 -0500
14709ac67f ughh mrq 2024-05-12 07:30:59 -0500
3774fcbdee ugh mrq 2024-05-11 22:58:38 -0500
856545f8bb nan loss detection (should have added it earlier), loss scaling for local backend + fp16 mrq 2024-05-11 22:23:29 -0500
a755eb3c62 ugh mrq 2024-05-11 17:34:45 -0500
88e9b9caff local ddp fix mrq 2024-05-11 17:29:01 -0500
3337c69e5a leverage between xformers and torch.backends.cuda.sdp_kernel for attention mrq 2024-05-11 17:14:05 -0500
d33c7bb7cf ugh mrq 2024-05-11 16:47:19 -0500
0b6499601b sanitizing mrq 2024-05-11 16:31:05 -0500
71e373064f remove redundant loss, tweak readme mrq 2024-05-11 15:02:47 -0500
04a80d6b55 maybe it's better to be more explicit in deepspeed configs mrq 2024-05-11 13:57:43 -0500
4d93a16ef7 might just be better to explicitly define prompt duration ranges, especially under a "train small contexts then increase it" training paradigm mrq 2024-05-11 09:50:54 -0500
bd0a36ba8d I swear I keep seeing tqdm flicker back a number mrq 2024-05-10 18:36:01 -0500
2109712e5b resolve deprecation warning that doesn't show on my old training rig but does on my new one mrq 2024-05-09 23:25:44 -0500
1547de5020 haha... mrq 2024-05-09 23:15:52 -0500
b7bd885651 some possible sanity with deepspeed config mrq 2024-05-09 22:48:42 -0500
c4b696ebeb oops mrq 2024-05-09 22:33:40 -0500
c22a177cf8 forgot to pass warmup to schedule free mrq 2024-05-09 22:18:49 -0500
b6131565ad autotune? mrq 2024-05-09 21:25:40 -0500
6ed6ab8c03 a bit more cleanup for deepspeed ds_cfg creation mrq 2024-05-09 21:00:26 -0500
0d5d545a40 crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc. mrq 2024-05-09 20:28:20 -0500
c6e0f905b5 final tweaks (again) before training restarts mrq 2024-05-08 02:11:38 -0500
215800484d correcting my wrong of assuming I could just use raw 24Khz audio in the 44Khz DAC without too much of an issue (there are issues) mrq 2024-05-04 23:49:15 -0500
9f738fbd5b seems I actually don't need RVQ bins 9-32 with the 24Khz DAC model........ (time to requantize my audio...) mrq 2024-05-04 23:09:18 -0500
33b7f81b94 small cleanups mrq 2024-05-04 22:37:22 -0500
8aa1b2dabf documentation update mrq 2024-05-04 21:03:46 -0500
253441b750 forgot to disable verbose flag mrq 2024-05-04 13:13:52 -0500
3dca1125f5 implemented xformers in HF's Llama (because theres no flash attention for Volta cards) mrq 2024-05-04 13:07:45 -0500
277dcec484 apparently I got an error for trying to serialize an errant tensor that made its way into the json, this could be remedied easily with recursively traversing the dict and coercing any objects to primitives, but I'm tired and I just want to start training and nap mrq 2024-05-04 12:33:43 -0500
ffa200eec7 added option to specify frames per second for the given audio representation (Encodec is 75Hz, DAC is 41Hz (at 24K sources)) mrq 2024-05-04 12:05:41 -0500
c494894261 simple DDP wrapper (for my NVlink test) mrq 2024-05-04 11:48:26 -0500
783db3d2c5 forgot to commit the DAC test utterance mrq 2024-05-04 09:46:51 -0500
a7b43b98b5 renamed cfg.bitsandbytes to cfg.optimizations (and having it serve as cfg.optimizations.bitsandbytes) mrq 2024-05-02 20:08:59 -0500
b5d1456a09 backwards compat for my shitty old weights (was testing if disabling AudioEmbedding summing magically made things better (it did not)) mrq 2024-04-29 22:14:01 -0500
5120ffdda7 god it would be nice to know the best way to handle audio embeddings, because I genuinely don't know without skimming through papers or devoting X amount of GPU hours in training mrq 2024-04-29 18:24:05 -0500
6a11bc9cb6 update tokenizer because, for some reason, it had the wrong order for the special tokens to where eos = unk mrq 2024-04-29 09:09:26 -0500
57810e4ba4 metadata only path (might drop HDF5 since its giving file sizes twice as large as my actual unpacked dataset) mrq 2024-04-28 23:03:09 -0500
caad7ee3c9 final tweaks, hopefully mrq 2024-04-28 22:28:29 -0500
ffc334cf58 added dataset transcription helper script (now I don't ever have to touch ai-voice-cloning) (to-do: unify scripts into the module) mrq 2024-04-21 17:43:20 -0500
b251669536 forgot to fix up the test trainer mrq 2024-04-21 14:58:04 -0500
071fb97777 dataset preparation script updates, caved and am using HF tokenizer now mrq 2024-04-21 14:49:18 -0500
a8ffa88844 it slipped my mind that technically DAC can be used at any sample rate, since it models waveforms; make it a config YAML option to allow this behavior mrq 2024-04-19 18:36:54 -0500
00804a47e9 Forgot to copy intermediary dataset conversion script mrq 2024-04-18 21:34:28 -0500
8214aa23d7 converting over to a different intermediary dataset format mrq 2024-04-18 21:24:06 -0500
4f5c9e518a actually use the passed-through sample rate from encode for DAC because it does its own resampling I guess mrq 2024-04-18 13:32:41 -0500
2e9e6e68f7 Forgot I need to use the DAC's 44K model because 24K model has 32 codebooks instead of 9. mrq 2024-04-17 20:59:25 -0500
5ff2b4aab5 finally swallowing the Descript-Audio-Codec pill (I guess I'm going to have to regenerate my entire dataset) mrq 2024-04-17 20:39:35 -0500

Commit Graph Select branches Hide Pull Requests master #1 #1 #2 #25 #25 Mono Color

Commit Graph

Select branches

Hide Pull Requests

master

#1

#1

#2

#25

#25