|
31785f4eeb
|
actually don't default to compute split losses, test bitnet model doesn't seem to be doing things right (despite debug printouts showing theyre roughly the same logit/loss sequences, could just be bitnet linears being not up to par on actual models)
|
2024-06-01 09:12:51 -05:00 |
|
|
e9c87060df
|
oops
|
2024-05-31 22:22:28 -05:00 |
|
|
b482ca19ff
|
added model config option to set KV head count for MQA/GQA instead of MHA for llama-based models (i think its very negligible both ways on such a small model size)
|
2024-05-31 19:32:37 -05:00 |
|
|
da473295b7
|
better way to compute per-segment losses
|
2024-05-28 19:29:54 -05:00 |
|
|
5af6f41c94
|
added loss calcs against prom (requires the right settings for not shit results, disabled by default)
|
2024-05-27 08:43:00 -05:00 |
|
|
ddbacde0d1
|
DAC just doesn't work well enough......
|
2024-05-25 11:07:52 -05:00 |
|
|
458b95d196
|
added option to split between text loss and audio loss (to-do: document this better), because it may or may not be a problem with LLaMA-backed models because my loss hovers around 3.9 / 56% accuracy despite sounding decent at the moment
|
2024-05-19 11:23:56 -05:00 |
|
|
8d79f78e0a
|
god I need to replace omegaconf
|
2024-05-12 14:01:52 -05:00 |
|
|
2437a86efa
|
ugh
|
2024-05-12 13:02:15 -05:00 |
|
|
3774fcbdee
|
ugh
|
2024-05-11 22:58:38 -05:00 |
|
|
856545f8bb
|
nan loss detection (should have added it earlier), loss scaling for local backend + fp16
|
2024-05-11 22:23:29 -05:00 |
|
|
3337c69e5a
|
leverage between xformers and torch.backends.cuda.sdp_kernel for attention
|
2024-05-11 17:14:05 -05:00 |
|
|
0b6499601b
|
sanitizing
|
2024-05-11 16:31:05 -05:00 |
|
|
04a80d6b55
|
maybe it's better to be more explicit in deepspeed configs
|
2024-05-11 13:57:43 -05:00 |
|
|
4d93a16ef7
|
might just be better to explicitly define prompt duration ranges, especially under a "train small contexts then increase it" training paradigm
|
2024-05-11 09:50:54 -05:00 |
|
|
1547de5020
|
haha...
|
2024-05-09 23:15:52 -05:00 |
|
|
b7bd885651
|
some possible sanity with deepspeed config
|
2024-05-09 22:48:42 -05:00 |
|
|
b6131565ad
|
autotune?
|
2024-05-09 21:25:40 -05:00 |
|
|
6ed6ab8c03
|
a bit more cleanup for deepspeed ds_cfg creation
|
2024-05-09 21:00:26 -05:00 |
|
|
0d5d545a40
|
crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc.
|
2024-05-09 20:28:20 -05:00 |
|
|
215800484d
|
correcting my wrong of assuming I could just use raw 24Khz audio in the 44Khz DAC without too much of an issue (there are issues)
|
2024-05-04 23:49:15 -05:00 |
|
|
33b7f81b94
|
small cleanups
|
2024-05-04 22:37:22 -05:00 |
|
|
ffa200eec7
|
added option to specify frames per second for the given audio representation (Encodec is 75Hz, DAC is 41Hz (at 24K sources))
|
2024-05-04 12:05:41 -05:00 |
|
|
c494894261
|
simple DDP wrapper (for my NVlink test)
|
2024-05-04 11:48:26 -05:00 |
|
|
a7b43b98b5
|
renamed cfg.bitsandbytes to cfg.optimizations (and having it serve as cfg.optimizations.bitsandbytes)
|
2024-05-02 20:08:59 -05:00 |
|
|
b5d1456a09
|
backwards compat for my shitty old weights (was testing if disabling AudioEmbedding summing magically made things better (it did not))
|
2024-04-29 22:14:01 -05:00 |
|
|
5120ffdda7
|
god it would be nice to know the best way to handle audio embeddings, because I genuinely don't know without skimming through papers or devoting X amount of GPU hours in training
|
2024-04-29 18:24:05 -05:00 |
|
|
caad7ee3c9
|
final tweaks, hopefully
|
2024-04-28 22:28:29 -05:00 |
|
|
071fb97777
|
dataset preparation script updates, caved and am using HF tokenizer now
|
2024-04-21 14:49:18 -05:00 |
|
|
a8ffa88844
|
it slipped my mind that technically DAC can be used at any sample rate, since it models waveforms; make it a config YAML option to allow this behavior
|
2024-04-19 18:36:54 -05:00 |
|
|
4f5c9e518a
|
actually use the passed-through sample rate from encode for DAC because it does its own resampling I guess
|
2024-04-18 13:32:41 -05:00 |
|
|
5ff2b4aab5
|
finally swallowing the Descript-Audio-Codec pill (I guess I'm going to have to regenerate my entire dataset)
|
2024-04-17 20:39:35 -05:00 |
|
|
b0bd88833c
|
refractor cleanup, had a revelation on how I can handle a batch of varying tasks
|
2024-04-16 21:04:48 -05:00 |
|
|
aa1e25fbf5
|
backwards compat for old YAMLs with models , option to set flash attention 2 for Llama (and derivatives), included syncdoth/RetNet s torchscale retnet for shits and grins, etc.
|
2024-04-16 10:02:31 -05:00 |
|
|
545162195b
|
deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things
|
2024-04-15 19:54:32 -05:00 |
|
|
789bb5d11b
|
add an optional label override for model loading (used for easy testing between 12/16/20/24 layered model)
|
2024-04-13 12:43:35 -05:00 |
|
|
f0c4baeb25
|
added Adagrad (experimenting with it), added 'extended' model size (16 layers instead of 12, experimenting with it)
|
2024-04-09 22:04:01 -05:00 |
|
|
9d97eb5104
|
added FP8 support through NVIDIA/TransformerEngine , added RetNet_HF through syncdoth/RetNet (as an alternative to branch away from torchscale)
|
2024-04-08 20:14:51 -05:00 |
|
|
7075c2a5f0
|
added an option to allow injecting embeddings from another model, because it dawned upon me how valuable embeddings from a good model can be for subsequent trainings (defined under cfg.models._embeddings as a relative path to the yaml)
|
2024-04-04 19:11:49 -05:00 |
|
|
47435207f7
|
Added cfg.bitsandbytes.replace as a less intrusive alternative to cfg.bitsandbytes.inject to replace all Linear modules in a model
|
2024-03-01 19:20:10 -06:00 |
|
|
0427d8d076
|
logger broke for some reason, added flag to just tqdm.write instead, make cfg.bitsandbytes.bitnet==True yamls denoted since I'm sure they're not interoperable
|
2024-03-01 10:32:35 -06:00 |
|
|
35d78a2bb0
|
Yet Another Underlying Transformer Implementation (BitNet, will give it a few days to see how it fares)
|
2024-02-29 20:29:17 -06:00 |
|
|
c690aa509d
|
fixes and compat (MoE-fying an existing model and retraining from there just ruins it after a second of audio...)
|
2023-12-25 21:20:32 -06:00 |
|
|
9c198eb75a
|
added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works)
|
2023-12-20 18:45:58 -06:00 |
|
|
32d4271ca8
|
fixed issue with training from scratch (oops)
|
2023-10-21 09:55:38 -05:00 |
|
|
3195026dba
|
fixed issue with the 'add another target audio to artificially create longer sequences' for HDF5 just duplicating the utterance initially sampled
|
2023-10-18 20:38:33 -05:00 |
|
|
65f500083d
|
tweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to work
|
2023-10-12 22:21:43 -05:00 |
|
|
8740cdefc6
|
added initial support for languages (still testing, marked as model version 3), added experimental 'context extend by limiting the resp context' (untested)
|
2023-10-11 20:38:40 -05:00 |
|
|
6045cbce94
|
added experimental option to append utterances for training target (emphasis on experimental)
|
2023-10-11 17:32:45 -05:00 |
|
|
893a610fad
|
cleanup, use deepspeed inferencing pathway if requested
|
2023-10-09 15:24:04 -05:00 |
|
|
63cc9cf37a
|
added compat flags for torchscale because the maintainer for torchscale broke compat for existing models
|
2023-10-05 16:39:46 -05:00 |
|
|
153f8b293c
|
added min-x and min-y arguments to plot.py, helper script to download from my existing checkpoint
|
2023-10-04 19:41:37 -05:00 |
|
|
d12877ee09
|
added option to set probability of selecting the AR during training under a monolithic AR+NAR, added some more to-dos while I have them in mind
|
2023-10-02 16:52:42 -05:00 |
|
|
c0b25541e3
|
restructured some things with the model to remove dead weights
|
2023-09-20 19:10:59 -05:00 |
|
|
d07c63b9d8
|
unified more things with training the AR+NAR monolothic model
|
2023-09-12 15:54:41 -05:00 |
|
|
40ef34e1ca
|
this embedding class definitely works, and migrating from the previous embedding weights seems to work.
|
2023-09-11 14:13:42 -05:00 |
|
|
671dca88ee
|
throw error when no reference audio is provided in the web UI because someone keeps doing that in the HF space
|
2023-09-10 15:50:50 -05:00 |
|
|
c74fe2f718
|
tweaks to web UI
|
2023-09-09 22:27:20 -05:00 |
|
|
f69aad9c65
|
some day I'll get it right
|
2023-09-08 15:36:26 -05:00 |
|
|
8837bc34d7
|
added option to specify parameters to freeze per-model in YAML (because I need to see about committing atrocities with convering an AR into an AR+NAR)
|
2023-09-07 18:19:51 -05:00 |
|
|
c47fc3274e
|
added backwards compat flag
|
2023-09-07 17:12:17 -05:00 |
|
|
e7a67410d1
|
oops
|
2023-09-07 09:14:03 -05:00 |
|
|
100ca6b7d0
|
added option to use SGD optimizer through the YAML, added option to pass in additional optimizer parameters through the YAML, added experimental unified AR+NAR model (does not seem fruitful in testing)
|
2023-09-06 18:58:35 -05:00 |
|
|
451726fdd5
|
added ability to disable activation checkpointing through the YAML (it is very VRAM intensive at double layer size)
|
2023-09-05 15:38:21 -05:00 |
|
|
2f9cd0842f
|
merged dedicated interleaved AR code with the normal AR code
|
2023-09-03 22:46:08 -05:00 |
|
|
8a6c203277
|
added per-speaker samplers
|
2023-09-03 21:27:13 -05:00 |
|
|
57db3ccfa8
|
shuffled VALL-E continuous as a task tts-c instead, logic fixes for it
|
2023-09-02 12:23:40 -05:00 |
|
|
2f06166ddd
|
cleanups
|
2023-09-01 21:33:51 -05:00 |
|
|
e40c0d34a0
|
somewhat got recurrent forward working (it's as accurate as chunkwise forward: it's not accurate at all), added option to use AMP instead of blanket setting the weight's dtype
|
2023-09-01 20:58:29 -05:00 |
|
|
2bc2d08b09
|
(need to verify) added modifying model size and config bool to align with VALL-E continuous' methodology
|
2023-09-01 17:19:34 -05:00 |
|
|
87c4bfedba
|
added ability to mark models as disabled for training, and hotloading them for eval/validation (useful if training only one model, or training a model per GPU)
|
2023-08-27 12:26:12 -05:00 |
|
|
165a1154e0
|
Undo naive=False test flag, this shouldn't have made its way in
|
2023-08-26 22:00:43 -05:00 |
|
|
78378ed1ce
|
overhauled dataloading code to be marginally faster, mostly cleaned up, and can leverage a metadata json to help things out
|
2023-08-26 19:53:23 -05:00 |
|
|
00ad4af651
|
updated draconian requirement for espeak-ng to be installed and the env var set to the dll for Windows
|
2023-08-24 14:57:01 -05:00 |
|
|
4585824cd3
|
tweaks, including exporting on save/quit
|
2023-08-23 16:43:03 -05:00 |
|
|
d106598403
|
do not utilize diskcache if a config yaml is not loaded
|
2023-08-23 11:02:15 -05:00 |
|
|
7b1b82e0e5
|
inferencing cleanup
|
2023-08-20 21:36:02 -05:00 |
|
|
736c077282
|
ops
|
2023-08-20 13:42:18 -05:00 |
|
|
2d1a9f10c0
|
nightmare of spaghetti that might break compat; mechanism to increase RVQ bins of an existing model without retraining, keeps sampled proms/resps at max RVQ level and trim off excess levels according to what model receives them, some other things I already forgot (I really hope no one else has weights being baked right now)
|
2023-08-19 15:06:33 -05:00 |
|
|
f7f6d3bf6d
|
validated that SpeechX tasks cse and nse works, added a method to test each task by invoking python3 -m vall_e.data --action=tasks --tasks='sr,se,cse,nse'
|
2023-08-19 09:50:07 -05:00 |
|
|
8f42c578c9
|
setting up for allowing training for a partial amount of the speechx tasks (do NOT try this at home yet without a proper model, as performance is predecated on having a solid base vall-e model for the tasks
|
2023-08-19 00:16:08 -05:00 |
|
|
ae9d38aa31
|
forgot to have it pull from specified noise to the hdf5 dataset
|
2023-08-18 23:57:07 -05:00 |
|
|
77292c42f9
|
tested the training preparation for tasks ns, sr, and tse (I don't expect it to go well with only 2 RVQ bins)
|
2023-08-18 23:55:40 -05:00 |
|
|
bbb0563b3d
|
pseudocode polyfill stub some other flavor of working on adding the tasks
|
2023-08-18 22:22:13 -05:00 |
|
|
fb4e816823
|
oops
|
2023-08-18 21:11:19 -05:00 |
|
|
2a71486cb6
|
preparing for SpeechX extensions
|
2023-08-18 20:58:07 -05:00 |
|
|
ced31fd9b7
|
removed the sampler as it's very misleading
|
2023-08-18 14:47:48 -05:00 |
|
|
ee58db746f
|
actually make the evaluation dataset shuffled for sample_type=speaker
|
2023-08-17 15:04:45 -05:00 |
|
|
d7152fc7b9
|
added pruning of old checkpoints if specified (cfg.trainer.keep_last_checkpoints)
|
2023-08-16 20:12:12 -05:00 |
|
|
44c08d828e
|
added sample_type that samples from speakers to truly balance an epoch by speakers rather than the entire dataset and a sampler that tries to balance by speakers
|
2023-08-16 19:39:21 -05:00 |
|
|
1e3e1d9315
|
tweaks
|
2023-08-15 21:58:16 -05:00 |
|
|
13571380be
|
made exporter make more sense
|
2023-08-13 22:56:28 -05:00 |
|
|
d7deaf6def
|
distributed training works now (hopefully)
|
2023-08-13 22:07:45 -05:00 |
|
|
d89568a96e
|
some fixes for the local framework
|
2023-08-05 03:22:15 +00:00 |
|
|
5970f254e3
|
some fixes for the local framework
|
2023-08-05 02:17:30 +00:00 |
|
|
608c1970eb
|
ops
|
2023-08-03 20:36:19 -05:00 |
|
|
c85101403f
|
big cleanup
|
2023-08-03 20:26:36 -05:00 |
|
|
f6597e2dfe
|
adjustments
|
2023-08-02 18:36:26 -05:00 |
|
|
bf8cedc9dd
|
Rewrite init
|
2023-08-02 21:53:35 +00:00 |
|