|
96d05be73c
|
demo page tweaks
|
2024-10-10 13:52:37 -05:00 |
|
|
2ea978f318
|
added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval
|
2024-10-10 13:40:25 -05:00 |
|
|
52299127ab
|
fix vall_e.emb.process
|
2024-10-08 20:00:34 -05:00 |
|
|
0656a762af
|
fix vall_e.emb.transcriber
|
2024-10-08 19:24:43 -05:00 |
|
|
acdce66d4e
|
readme tweaks, set the (unused) default model download URL back to the base ar+nar-llama-8 model, as ar+nar-tts+stt-llama-8 was renamed back to it since it performs well
|
2024-10-05 22:53:53 -05:00 |
|
|
84c7419001
|
faster
|
2024-10-04 22:30:47 -05:00 |
|
|
a507b769a1
|
sped up inferencing by not doing .tolist() for rep pen / length pen (and a bug fix in the web UI from prev commit)
|
2024-10-04 22:18:20 -05:00 |
|
|
4a8e3ccf06
|
README tweaks, added --input-prompt-prefix as an experiment (its literally better to just not do this, but i'll retain it in case i have a revelation on how to improve it)
|
2024-10-04 18:57:19 -05:00 |
|
|
a9fa0898a9
|
tweaked demo page script to sample speakers instead
|
2024-09-28 10:50:26 -05:00 |
|
|
2f1dca3089
|
added language selection in web UI, tweaked demo script
|
2024-09-28 09:49:45 -05:00 |
|
|
10df2ef5f3
|
fixed oversight where input audio does not resample (lol...)
|
2024-09-27 20:27:53 -05:00 |
|
|
039482a48e
|
don't do eval on stt because it's so slow and I don't even bother doing any metrics against it anyways (to-do: make this a flag)
|
2024-09-26 18:56:57 -05:00 |
|
|
ff7a1b4163
|
coerce into path for other sampler_types (it's required for sampling for similar utterances)
|
2024-09-26 18:37:56 -05:00 |
|
|
f24547ad4e
|
add top_k sampling / offset for prompt similar utterance sampling
|
2024-09-26 16:26:40 -05:00 |
|
|
9da630f73a
|
swap order of demo entries, as the model prioritizes adhering to the speaker prompt more (instead of trying to match the ground truth magically)
|
2024-09-25 23:31:24 -05:00 |
|
|
e84d466261
|
vall_e.plot tweaks
|
2024-09-24 20:05:10 -05:00 |
|
|
c5e9142863
|
added option to retokenize phonemes for hdf5 (to save having to remake my hdf5 file)
|
2024-09-21 13:08:01 -05:00 |
|
|
536c11c4ac
|
actually validated and fixed sampling similar utterances for the prompt (hopefully nothing else is needed)
|
2024-09-21 12:59:51 -05:00 |
|
|
d31f27119a
|
regex replace out the (lang) markers in espeak, updated tokenizer vocab as lazily as possible to not have unk tokens
|
2024-09-21 12:29:28 -05:00 |
|
|
769f67dcfe
|
actually fix validation of phonemes in the symmap
|
2024-09-21 12:19:34 -05:00 |
|
|
c8d4716a9f
|
ugh
|
2024-09-18 21:40:57 -05:00 |
|
|
fe241f6a99
|
support for wildcard in training/validation/noise dataset array (to-do: a better way to query between metadata folder and data folder)
|
2024-09-18 21:34:43 -05:00 |
|
|
b5bec0c9ce
|
oops, turns out these are not split by speaker names already........ (also added sampling the dataset in the webui for easy viewing)
|
2024-09-18 20:19:46 -05:00 |
|
|
fa9d3f6c06
|
lang fixes / reworked phoneme symmap validation
|
2024-09-18 19:36:03 -05:00 |
|
|
84647f588a
|
more tweaks
|
2024-09-18 16:43:57 -05:00 |
|
|
ebac1db16c
|
maybe final tweaks, I really needed to unify my json read/write and orjson is proven to be fast enough for me to try and rely on it more
|
2024-09-17 22:57:04 -05:00 |
|
|
6ceed866b5
|
*faster*
|
2024-09-17 22:44:36 -05:00 |
|
|
f00283440c
|
faster
|
2024-09-17 22:26:31 -05:00 |
|
|
be22b65300
|
solved my problem
|
2024-09-17 21:58:44 -05:00 |
|
|
8f41d1b324
|
more tweaks
|
2024-09-17 16:26:30 -05:00 |
|
|
804ddb5182
|
optimizations (6 hours to do cosine similarities on a speaker set of just 17k utterances................)
|
2024-09-17 15:51:45 -05:00 |
|
|
a9fbe81f98
|
oops
|
2024-09-17 15:25:12 -05:00 |
|
|
c440c4fe7e
|
relegated processing similarity data into vall_e.emb.similarity since it's easier, seems to work?
|
2024-09-17 14:37:21 -05:00 |
|
|
56f25f7a9b
|
more stuff for similar-speaker prompt sampling (to-do: actually test if this works...)
|
2024-09-16 23:10:29 -05:00 |
|
|
69f140ba45
|
fix oversight with phonemizing french because espeak defines french as fr-fr instead of fr (even though spain spanish is es and not es-sp or some shit, but portugal portuguese is pt-pt)
|
2024-09-13 12:53:36 -05:00 |
|
|
4f3c7a37c8
|
also do text similarities (dont know what use I'll have for this)
|
2024-09-10 16:45:59 -05:00 |
|
|
1c615a0f52
|
helper script (vall_e.emb.similar) to figure out the best way to compute similarity scores for audio (iunno how to go about it desu)
|
2024-09-10 16:34:23 -05:00 |
|
|
d059f6f56d
|
added helper script to process Emilia (amphion/Emilia-Dataset), clean up espeak phonemes for non-English transcriptions with English words (because for some reason espeak injects (en){word}(lang) markers and it's annoying)
|
2024-09-09 09:57:32 -05:00 |
|
|
31e8b7edb8
|
tweaks and fixes for lora stuffs
|
2024-09-08 18:05:21 -05:00 |
|
|
54203c059d
|
validated rep pen for STT (sometimes needed to wrangle the model)
|
2024-09-08 08:30:30 -05:00 |
|
|
6a967f91b9
|
oops
|
2024-09-07 22:13:49 -05:00 |
|
|
5d66a7db52
|
webui cleanup, more tweaks, default to safetensors in config
|
2024-09-07 21:45:05 -05:00 |
|
|
a6ad0577b8
|
cleanup the resultant text from STT
|
2024-09-06 18:44:25 -05:00 |
|
|
fa93061b3e
|
more fixes, moved sampler state dict to a better place, eval works again
|
2024-09-06 16:59:56 -05:00 |
|
|
4bd9bb39c8
|
webui for STT (still need to bake the model to handle it better, a few hours so far has it generate what looks like a normal transcription but does not correlate to the audio right now)
|
2024-09-06 15:13:04 -05:00 |
|
|
d33a906119
|
cleanup for AR_NAR inferencing to allow both TTS and STT tasks simultaneously (need to have training eval do this to though)
|
2024-09-06 14:30:12 -05:00 |
|
|
341e19162b
|
fixes, again
|
2024-09-06 11:41:41 -05:00 |
|
|
94cf81d38c
|
tweak
|
2024-09-05 23:21:18 -05:00 |
|
|
413097f5f7
|
fixes
|
2024-09-05 21:42:59 -05:00 |
|
|
54547b74d8
|
experimental implementation of STT (need to actually test on a model, test trainer seems to work)
|
2024-09-05 20:43:20 -05:00 |
|
|
d319d33368
|
haha
|
2024-09-04 14:52:26 -05:00 |
|
|
619369236b
|
ugh
|
2024-08-30 21:10:57 -05:00 |
|
|
168e203942
|
ugh
|
2024-08-30 14:39:07 -05:00 |
|
|
685f4faec0
|
ugh
|
2024-08-30 10:46:26 -05:00 |
|
|
32287710a2
|
moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)
|
2024-08-29 13:27:16 -05:00 |
|
|
d423bc03c2
|
fixed attentions for MoE
|
2024-08-27 17:02:42 -05:00 |
|
|
b7b99a25f1
|
added ability to specify attention backend for CLI and webui (because im tired of editing the yaml)
|
2024-08-26 19:33:51 -05:00 |
|
|
0d706ec6a1
|
added fused_attn (triton-based fused attention) and simply just query for flash_attn under rocm
|
2024-08-26 19:13:34 -05:00 |
|
|
6b0891448c
|
pain (some shit to try and get some flash attention for ROCm (gfx1100) through triton fused attention but no good)
|
2024-08-25 20:07:27 -05:00 |
|
|
40e1799adc
|
fixed xformers and flash_attn to actually work now
|
2024-08-19 01:03:35 -05:00 |
|
|
29c35528e5
|
the sooner I accept there's no FA for V100s the sooner I'll go to bed
|
2024-08-18 23:54:33 -05:00 |
|
|
d636edd3a2
|
added flash_attn LlamaAttention (including flash_attn==1.0.9)
|
2024-08-18 20:51:14 -05:00 |
|
|
054d28573a
|
my DAC dataset again managed to only have some utterances with only 8 of 9 RVQ levels, this fixes an oversight from it
|
2024-08-09 21:18:01 -05:00 |
|
|
2a1794c084
|
ughghghhhh
|
2024-08-09 21:15:01 -05:00 |
|
|
ed373957e2
|
maybe not
|
2024-08-09 11:38:08 -05:00 |
|
|
c658a7b440
|
make loss scaling opt-in rather than automatically determined (because it seems a DAC-based model really doesnt like loss scaling)
|
2024-08-09 10:51:36 -05:00 |
|
|
d04f6911b4
|
oops
|
2024-08-08 19:38:55 -05:00 |
|
|
0aa59e6f3f
|
uncommented block that writes the metadata on HDF5 creation
|
2024-08-08 19:21:29 -05:00 |
|
|
79a6781c9e
|
fix vall_e.data --action=hdf5 actually transcribing because past me completely forgot it tried to already put the transcribe/process dataset scripts inside the module before
|
2024-08-08 07:51:42 -05:00 |
|
|
949339a3fa
|
do not include SDPA attention if there's no available SDPA backends
|
2024-08-06 20:42:39 -05:00 |
|
|
613024ec0d
|
ugh
|
2024-08-06 20:35:15 -05:00 |
|
|
eac353cd0b
|
busy work and cleanup while I wait for 1TB of audio to quantize... again.
|
2024-08-06 20:23:33 -05:00 |
|
|
f284c7ea9c
|
do mixed-precision for AMP inside the compress function itself, because the loudness function gripes when using a float16 (non-power of 2 lengths) or bfloat16 (something about views for bfloat16)
|
2024-08-06 15:08:37 -05:00 |
|
|
b6ba2cc8e7
|
tweaked vall_e.emb.process to instead process audio one file at a time instead of all the files for a given speaker to avoid OOMing on less-memory-filled systems with --low-memory
|
2024-08-06 14:24:40 -05:00 |
|
|
9710b06b74
|
tweaks and things
|
2024-08-06 08:17:25 -05:00 |
|
|
134dac8c2b
|
re-adapted process_libritts.py to a 'better' way (better because it processed without needing to shuffle a bunch of things and adapt to cope or something)
|
2024-08-05 20:34:58 -05:00 |
|
|
3f73fcca29
|
oops
|
2024-08-05 20:12:13 -05:00 |
|
|
597441e48b
|
moved transcribe and process dataset scripts to vall_e/emb within the module itself, argparse-ified transcription script
|
2024-08-05 19:40:50 -05:00 |
|
|
7cdfa3dc0c
|
updated process_datasets.py, added argparsing so I can mostly stop manually editing things, and some other cleanup
|
2024-08-05 15:59:25 -05:00 |
|
|
debcc93e7e
|
add adapted MixtralAttention for when I make a bad decision to actually train a MoE
|
2024-08-04 22:03:22 -05:00 |
|
|
10aaf840e7
|
added export option to convert Llama to MixtralMoE for another dumb experiment
|
2024-08-04 20:25:06 -05:00 |
|
|
3a65cc4b22
|
fix issue with sft and shared tensors...
|
2024-08-04 19:56:21 -05:00 |
|
|
23f3b56fda
|
oops
|
2024-08-04 08:18:57 -05:00 |
|
|
d19f93a2c0
|
documentation update
|
2024-08-04 00:14:49 -05:00 |
|
|
2cb465018b
|
implicitly load either normal pickled weights or safetensors on loading the model
|
2024-08-03 23:34:18 -05:00 |
|
|
c09133d00f
|
added safetensors support (with metadata) and feed whatever torch.load/torch.save into it
|
2024-08-03 23:15:20 -05:00 |
|
|
6a733eb2ed
|
changed torch.Tensor().to(device, dtype) to just torch.tensor(..., device, dtype) because it's been bothering my autism that I'm creating tensors then converting rather than creating with the right device/dtype, some 'optimization' to compile the model but it doesnt seem to do anything useful
|
2024-08-03 22:10:21 -05:00 |
|
|
ab673e0426
|
add cap for NAR-len training, to avoid any weird cases in early training where it'll just mess up and generate long lengths
|
2024-08-03 21:00:32 -05:00 |
|
|
4d2b88b164
|
throw exception if training, but no model is set to train (because i ran into this wondering what the hell was happening)
|
2024-08-03 20:51:23 -05:00 |
|
|
d0a5c7eca2
|
more coping with the NAR len
|
2024-08-03 20:23:36 -05:00 |
|
|
11fa3da665
|
some cleanup, fixed the wrapper attention to explicitly use other sdpa backends
|
2024-08-03 19:51:00 -05:00 |
|
|
9564ecda43
|
wrapper attention class for other sdpa backends + xformers seems to have broke...
|
2024-08-03 15:12:11 -05:00 |
|
|
9e1989be1b
|
tweaked initial NAR pass's initial token embeddings to use a different value, or osmething
|
2024-08-03 09:01:37 -05:00 |
|
|
26f74c5739
|
somehow fixed non-unified position IDs for the NAR-len
|
2024-08-03 08:43:42 -05:00 |
|
|
66407e5bdb
|
tweaks for the NAR-len model, maybe
|
2024-08-03 08:40:39 -05:00 |
|
|
97c5241bef
|
fixes, throw an exception when using NAR only model with non-unified position IDs, since for some reason it outputs garbage for the NAR
|
2024-08-02 22:25:49 -05:00 |
|
|
4456d3172b
|
that's what I get for testing without hdf5 on my previous machine....
|
2024-08-02 20:44:01 -05:00 |
|
|
7a77978096
|
oversight with using resize_modules
|
2024-08-02 20:28:49 -05:00 |
|
|
808a79ebaf
|
oops
|
2024-08-01 22:56:04 -05:00 |
|
|
443422ecb5
|
ugh, finally got some form of offloading working (need to test if it works on different GPUs, but GPU and CPU offloading seems to work in the test trainer)
|
2024-08-01 22:43:39 -05:00 |
|
|
c9ec6b28ef
|
it actually wasn't working because Engines.__init__() automatically moves the entire module to the requested device, which was being called after offloading the model in the test trainer (and it seems I cant do it without injecting a bunch of shit in modeling_llama.py)
|
2024-08-01 20:56:28 -05:00 |
|
|
b4c895114c
|
naive model offloading support (handles automatically splitting parts of the model to requested device per memory constraints, either inferred or requested in the yaml, input tensors are automatically migrated to the right device, it SEEMS to work for training under the test trainer when split between GPU and CPU) (this was specifically only because that Flux imagegen model released so I can test it there)
|
2024-08-01 20:12:06 -05:00 |
|
|
387358bc8a
|
fixes for the NAR-len model, and documentation some config options, and a better way to handle resizing modules on state_dict load
|
2024-07-31 20:35:09 -05:00 |
|
|
52d13b321f
|
I rather have it default to non-strict loading instead so I can clean up YAMLs
|
2024-07-30 22:24:38 -05:00 |
|
|
d7c6be6f78
|
fix weird regression in handling checkpoints when backend is local, but deepspeed checkpoints are in (it was handled with LoRA loading but not real loading...)
|
2024-07-30 22:15:56 -05:00 |
|
|
07f8e2ad06
|
added option to set the causal size (how many tokens to sample per AR step), but requires the model to be trained for this (which explains why recurrent chunk sampling just doesn't work for the retnet tests, obvious in hindsight)
|
2024-07-30 20:53:51 -05:00 |
|
|
ebf848d249
|
possible speedup for samplers that require a list of previous tokens (the DRY sampler made me realize that I should copy the tolist() thing from the rep pen sampler for everything else)
|
2024-07-29 20:23:26 -05:00 |
|
|
55b0121b1a
|
trying (and failing) to nail a weird regression in fancier attentions
|
2024-07-29 19:53:37 -05:00 |
|
|
c2f5b916fc
|
added what I think is DRY sampling
|
2024-07-29 19:15:07 -05:00 |
|
|
ce8bb1e4f7
|
sanity cleanups with weird off-by-one-ness, cleaned up and validated vall_e.models.experimental works again
|
2024-07-27 15:36:05 -05:00 |
|
|
06e948aec1
|
suppress warning on exit about distributed not being cleaned up (because I updated my system)
|
2024-07-25 16:50:47 -05:00 |
|
|
682e4387dc
|
oops (fixed proms being erased from a config oversight)
|
2024-07-25 12:39:57 -05:00 |
|
|
1acb0e9c84
|
added experimental training setting to perform token dropout to MAYBE compensate for errors from the preceding RVQ level (two types: token error offset, token dropout embedding replace)
|
2024-07-24 19:35:17 -05:00 |
|
|
611a1c4bdc
|
might help
|
2024-07-22 20:57:01 -05:00 |
|
|
188d116222
|
some weird fixes for an equally weird regression with LoRA loading
|
2024-07-22 20:47:24 -05:00 |
|
|
e33c4b0cb1
|
oops
|
2024-07-22 19:38:39 -05:00 |
|
|
75b04686f8
|
added prom-less training / inferencing, some other things
|
2024-07-22 19:36:07 -05:00 |
|
|
491ae2a684
|
some insanity for sanity checks (some phonemes from phonemizing japanese are not in my tokenizer...)
|
2024-07-22 00:30:40 -05:00 |
|
|
ad024f400f
|
actually pass language into dataset process script, fix coercing japanese into hiragana because espeak does not like kanji
|
2024-07-21 23:21:37 -05:00 |
|
|
3e5ca3a201
|
more demo page tweaks
|
2024-07-21 19:31:13 -05:00 |
|
|
7366f36f81
|
oops
|
2024-07-21 19:17:25 -05:00 |
|
|
e19aa643a6
|
cleaned up demo page creation, added option to pass in RVQ level sampling distribution for training
|
2024-07-21 19:12:03 -05:00 |
|
|
ba7ee8c0ee
|
added demo link to readme
|
2024-07-19 21:22:30 -05:00 |
|
|
9ec88d9444
|
validated passing URI path for assets instead of base64 encoding them
|
2024-07-19 21:07:17 -05:00 |
|
|
d87b492295
|
added rudimentary demo page creator (currently just embeds base64 wavs into the page, need to test not doing that)
|
2024-07-19 20:49:40 -05:00 |
|
|
d53038a9e4
|
actually have split classifiers working
|
2024-07-19 15:33:31 -05:00 |
|
|
692d09f9c1
|
eval/validation fix for SpeechX tasks
|
2024-07-19 09:16:37 -05:00 |
|
|
28a674e0f1
|
fixes...
|
2024-07-18 23:25:32 -05:00 |
|
|
39f961abcd
|
test trainer (vall_e.models.ar_nar) tests some SpeechX features
|
2024-07-18 18:46:45 -05:00 |
|
|
83a0954f85
|
fixes for re-introducing SpeechX tasks (need to actually validate if these all do the right things)
|
2024-07-18 17:16:32 -05:00 |
|
|
bccbb77a1a
|
added option to either naively concat codes to concat audio waveforms (prior behavior) or to decode => concat => encode instead (although this only currently happens for prom sampling if an utternace is too small)
|
2024-07-18 16:48:41 -05:00 |
|
|
97e768601c
|
re-introducing SpeechX tasks (need to validate them all, everything works with base tts anyways)
|
2024-07-18 16:16:14 -05:00 |
|
|
c2b8035e74
|
oops, kept forgetting to actually pass in lang/tone tokens (despite not really using these at the moment)
|
2024-07-18 14:18:34 -05:00 |
|
|
22fe53508c
|
added experimental disjointed position IDs (because I *think* this might help because technically a sequence is made up of several parts, and the position embeddings shouldn't be unified)
|
2024-07-16 19:52:41 -05:00 |
|
|
fe0f235335
|
mechanism to store the model config inside the weights and load them, some other things to allow LoRA training on the RetNet (gradient checkpointing will gripe about inputs not having require_grad and nothing seems to remedy it)
|
2024-07-16 18:23:13 -05:00 |
|
|
3acc54df22
|
allow loading a different model within the web ui (apparently I did not have the web UI in the documentation)
|
2024-07-15 19:59:48 -05:00 |
|
|
7b210d9738
|
sanity cleanup
|
2024-07-04 15:58:08 -05:00 |
|
|
1ecf2793f4
|
(commented-out) support for facebookresearch/AudioDec, but support really didn't wow me (so I commented it out until I figure out why my output audio is super crusty with AudioDec)
|
2024-07-04 15:40:51 -05:00 |
|
|
f770467eb3
|
stuff
|
2024-07-01 18:13:29 -05:00 |
|
|
312a8e3ead
|
add shuffle to samplers that can support it
|
2024-06-30 11:36:46 -05:00 |
|
|
396af541c5
|
ugh
|
2024-06-30 11:11:58 -05:00 |
|
|
dced595391
|
more cleanup
|
2024-06-30 11:00:12 -05:00 |
|
|
bc2a6fa756
|
sanity cleanup: moved experimental features under its own thing
|
2024-06-30 10:37:33 -05:00 |
|
|
b21f74a5c5
|
added summing of external embeddings (at this point i dont think any amount of cope bandaids will get DAC to train nicely, I think the RVQ levels the NAR tends add too much noise if they're not accurate)
|
2024-06-29 23:42:30 -05:00 |
|
|
793ccb16fb
|
ugh
|
2024-06-29 22:14:35 -05:00 |
|
|
2808f881c8
|
cleaned up subjugated audio embedding into a flag, flag can also have it include the original, underlying embedding as well (it seems to do better when set to inclusive)
|
2024-06-29 21:46:35 -05:00 |
|
|
ec5eaebcbc
|
experimental method of using DACs quantizer ""embeddings"" to see if it helps with model quality
|
2024-06-29 19:46:11 -05:00 |
|
|
a8718d35a4
|
nasty bandaid because some of my DAC dataset only has 8 RVQ levels instead of the full 9
|
2024-06-29 10:16:37 -05:00 |
|
|
c4dd523b6f
|
change from chunk-slicing paths for distributed dataloader to instead interleave
|
2024-06-29 10:10:35 -05:00 |
|
|
dd40463803
|
limit eval size because the training batch size seems to be used for the eval dataloader, somehow (bandaid)
|
2024-06-29 09:11:28 -05:00 |
|
|
591d3ac848
|
have eval dataloader use eval batch size for batchedordersampler
|
2024-06-28 22:44:00 -05:00 |
|
|
1a392b69f6
|
local training backend should be a bit more aware of variable batch sizes, maybe
|
2024-06-28 22:39:05 -05:00 |
|
|
83075c1505
|
sort duration buckets to ensure that paths sorted-by-duration are actually sorted by duration (because i didnt know that python dicts can have non-strings as keys), added batching samples based on total duration to ensure best training throughput
|
2024-06-28 22:28:54 -05:00 |
|
|
8fffb94964
|
backport fix from tortoise_tts with local trainer + loading state when training lora
|
2024-06-25 13:41:29 -05:00 |
|
|
62a53eed64
|
fixed deducing tokenizer path, added option to default to naive tokenizer (for old models, like ar+nar-retnet-8)
|
2024-06-18 22:11:14 -05:00 |
|
|
8a986eb480
|
load exported LoRA weights if exists (to-do: make a better LoRA loading mechanism)
|
2024-06-18 21:45:46 -05:00 |
|
|
2bfe786ebd
|
ban stop token for NAR levels (because sometimes it gets sampled and causes problems)
|
2024-06-17 22:14:43 -05:00 |
|
|
7cfb78fa64
|
enable LoRA for targetted RVQ levels (to experiment with, seems to help)
|
2024-06-17 21:45:03 -05:00 |
|
|
7047fcc6e2
|
actually make deepspeed work with LoRAs
|
2024-06-17 13:55:37 -05:00 |
|
|
1d159b1476
|
updated export routine to split LoRA weights from the state dict (should work with deepspeed)
|
2024-06-17 13:28:18 -05:00 |
|
|
726a4b613f
|
naive, rudimentary DeepSpeed support (just live with the LoRA weights living with the original weights, they can be split later)
|
2024-06-17 13:17:24 -05:00 |
|
|
bd0bc10ec0
|
added LoRA policy to decide what layer of the model gets adapted based on simple inclusion/exclusion terms
|
2024-06-17 13:05:06 -05:00 |
|
|
be051d9544
|
added other LoRA method using parametrization rather than linear injection
|
2024-06-17 09:58:34 -05:00 |
|
|
45a39fb79f
|
very rudimentary lora support (no deepspeed support, tested training and saving but not loading yet)
|
2024-06-17 00:09:16 -05:00 |
|
|
19410a919e
|
ugh
|
2024-06-15 12:29:03 -05:00 |
|
|
d343bde09b
|
residual_in_fp32=False for mamba arch backends because it breaks the classifier (output projection / lm head / what-have-you) under AMP
|
2024-06-15 12:08:03 -05:00 |
|
|
ccb14c06ef
|
mamba2-hf using vasqu/mamba2-torch because it lets me use mamba2 without triton ops (training with my 4xV100s are not happy with mamba2 because of triton)
|
2024-06-14 19:42:17 -05:00 |
|
|
31f71fa134
|
sampler update (some brainworm just never actually had a sampler for sample_type=path)
|
2024-06-14 16:55:40 -05:00 |
|
|
b3b67f34ac
|
added option to sort paths by durations to better group equally lengthed sequences together (and there was maybe a logic error from creating the samplers and then interleave-reordering paths, desyncing them, maybe)
|
2024-06-13 22:37:34 -05:00 |
|
|
83eab4fa59
|
actually going for the suggested "2x layers, no intermediate scaling" is wrong for VALL-E, directly copying the normal transformer structure fixes mamba2 performance in the test trainer
|
2024-06-13 20:08:22 -05:00 |
|
|
26da24fd8d
|
mamba updated to fix that pesky NaN error during training
|
2024-06-13 12:38:33 -05:00 |
|
|
bcf3910a17
|
the NAR only dream is dead (it just won't work)
|
2024-06-12 19:49:47 -05:00 |
|
|
a9353cf9fa
|
ugh
|
2024-06-12 00:14:29 -05:00 |
|
|
cca542a4c0
|
ugh
|
2024-06-11 23:59:28 -05:00 |
|
|
65a8960305
|
option to split classifier per-level instead of sharing one (at this point I'm just scrambling to try and cope with training a DAC model, the NAR is being a pain)
|
2024-06-11 22:28:59 -05:00 |
|
|
a7a6e0ac76
|
validated that inferencing works, changed some defaults (NAR benefits from greedy sampling)
|
2024-06-09 17:11:38 -05:00 |
|
|
234f9efc6e
|
ugh
|
2024-06-09 11:39:43 -05:00 |
|
|
132a02c48b
|
sanity cleanup, backup config yaml for each log file
|
2024-06-09 11:22:52 -05:00 |
|
|
8d92dac829
|
forgot I renamed this
|
2024-06-09 11:12:30 -05:00 |
|
|
80f9530840
|
ugh
|
2024-06-09 01:43:44 -05:00 |
|
|
5c732b72ee
|
ugh
|
2024-06-08 20:34:00 -05:00 |
|
|
8d068fa3f9
|
reticulating splines
|
2024-06-08 20:30:15 -05:00 |
|
|
ead3e2f0cb
|
ugh
|
2024-06-08 16:14:57 -05:00 |
|
|
b072f9b96b
|
fixes
|
2024-06-08 16:01:34 -05:00 |
|
|
58fb0a84db
|
added experimental NAR only model (inferences text length, need more experimenting), AudioEmbedding logic cleanup (I still think it's being done wrong)
|
2024-06-08 15:42:02 -05:00 |
|
|
e35a91c67a
|
ugh
|
2024-06-07 21:56:14 -05:00 |
|
|
7d6fff24f9
|
un-tensor'd quant_level marker since it doesn't need to be one (I forgot why I had it as one but nothing seems to need it as a tensor that didn't already make it one)
|
2024-06-07 20:46:22 -05:00 |
|
|
b0158a61d5
|
fixed some logic errors with training (grabbing wrong quant level...)
|
2024-06-07 20:34:36 -05:00 |
|
|
eafa622be2
|
I forgot the actual reason I was cleaning things up was to re-include prom loss calculation (I realized the reason I did this was because of an prom embedding oversight, it seems to work now)
|
2024-06-07 20:29:25 -05:00 |
|
|
da8242d086
|
finally got around to removing omegaconf
|
2024-06-07 20:23:53 -05:00 |
|
|
4ade2b60ee
|
ugh
|
2024-06-06 21:57:11 -05:00 |
|
|
f9f309281a
|
ugh
|
2024-06-06 20:55:27 -05:00 |
|
|
a5c90348d9
|
head hurt
|
2024-06-06 20:51:31 -05:00 |
|
|
516b0894d7
|
m
|
2024-06-06 19:41:26 -05:00 |
|
|
ee25d2e62e
|
removed the need to supply targ_list + different AudioEmbedding + other things
|
2024-06-06 18:52:41 -05:00 |
|
|
fcac9503e2
|
cleanup
|
2024-06-06 13:08:02 -05:00 |
|
|
b2194b859a
|
re-added loading multiple models because I'm now entertaining having split AR/NAR models again (and need a way to load both at once)
|
2024-06-06 09:48:43 -05:00 |
|
|
b05a905b95
|
ugh
|
2024-06-05 21:02:05 -05:00 |
|
|
4073656293
|
oops
|
2024-06-05 20:53:10 -05:00 |
|
|
ff6fe6f1bc
|
cleanup
|
2024-06-05 20:30:43 -05:00 |
|
|
880b4ecd1b
|
cleanup, putting some thoughts in comments before I forget about them
|
2024-06-05 19:50:06 -05:00 |
|
|
3cfc8a96bb
|
oops
|
2024-06-05 10:30:04 -05:00 |
|
|
48cd1054f9
|
madness
|
2024-06-04 23:48:51 -05:00 |
|
|
9e3f2e300f
|
experimental "just have a token for what rvq level we're on" that seems to help all models (mamba almost works, but it might just have to be relegated as a pure AR model)
|
2024-06-04 23:23:31 -05:00 |
|
|
e0886c5a78
|
re-added mamba as a possible non-experimental arch backend (test trainer will set it as AR only, doing any NAR tasks lobotomizes it)
|
2024-06-04 22:41:22 -05:00 |
|
|
687c71e028
|
disable accuracy calc because it breaks with actual batched training even though it shouldn't
|
2024-06-04 22:13:44 -05:00 |
|
|
d005e24953
|
oops
|
2024-06-04 22:10:04 -05:00 |
|
|
0f7f3ae754
|
added loss calc split and acc for experimental model
|
2024-06-04 22:04:40 -05:00 |
|
|
014e565c4b
|
tweaks
|
2024-06-04 20:41:13 -05:00 |
|
|
6d5bd0156a
|
fixes
|
2024-06-04 18:50:48 -05:00 |
|
|
ed3aeaf3a1
|
copy pasted from test to actual trainer
|
2024-06-04 18:40:30 -05:00 |
|
|
0aa01ba31a
|
forgot one crucial detail (you *need* the previous RVQ level to keep coherence between all RVQ levels) (experimental deinterleaved is a bit crusty though)
|
2024-06-04 18:30:30 -05:00 |
|
|
2ffad5cb6f
|
typo
|
2024-06-04 14:20:57 -05:00 |
|
|
406ff7bbe1
|
re-implemented config.model.interleave for the HF-compat experimental method
|
2024-06-04 14:19:52 -05:00 |
|
|
c93d5863fd
|
fixes
|
2024-06-04 00:07:00 -05:00 |
|
|
186b93a77e
|
oops
|
2024-06-03 22:35:55 -05:00 |
|
|
e50edc3b48
|
added a flag to convert to a HF compatible model on export by stitching things
|
2024-06-03 22:34:47 -05:00 |
|
|
934672252b
|
feverish cleanup
|
2024-06-03 21:28:49 -05:00 |
|
|
7feeb944a0
|
probably insane with even entertaining going this route
|
2024-06-03 20:26:27 -05:00 |
|
|
c2a436d368
|
somehow between training sessions grad_norm = None even though it worked before
|
2024-06-02 08:29:27 -05:00 |
|
|
c1fcd889d5
|
reverted automatically disabling split loss calc, since it seems that it's actually cacling loss on prom causes the oddities, maybe
|
2024-06-01 12:34:59 -05:00 |
|
|
8cf176ab46
|
ugh
|
2024-06-01 10:46:42 -05:00 |
|
|
827cf632e7
|
report current loss scale and adjust grad norm by loss scale (for deepspeed)
|
2024-06-01 10:44:32 -05:00 |
|
|
d0ebce6bac
|
ugh
|
2024-06-01 10:30:13 -05:00 |
|
|
39bc019142
|
actually save per-rank sampler states
|
2024-06-01 09:46:32 -05:00 |
|
|
74df2f5332
|
split sampler dict by global_rank, also handle splitting dataset paths by global_rank if sampler_type == path (because I do not trust DistributedSampler) (need to test)
|
2024-06-01 09:29:49 -05:00 |
|
|
31785f4eeb
|
actually don't default to compute split losses, test bitnet model doesn't seem to be doing things right (despite debug printouts showing theyre roughly the same logit/loss sequences, could just be bitnet linears being not up to par on actual models)
|
2024-06-01 09:12:51 -05:00 |
|
|
e9c87060df
|
oops
|
2024-05-31 22:22:28 -05:00 |
|
|
b482ca19ff
|
added model config option to set KV head count for MQA/GQA instead of MHA for llama-based models (i think its very negligible both ways on such a small model size)
|
2024-05-31 19:32:37 -05:00 |
|
|
e15c6c74c3
|
correctness
|
2024-05-30 20:50:45 -05:00 |
|
|
da473295b7
|
better way to compute per-segment losses
|
2024-05-28 19:29:54 -05:00 |
|
|
6c49ad06a3
|
forgot to reinclude mult by loss factors
|
2024-05-27 20:40:21 -05:00 |
|
|
b82f0d5c0c
|
finally nailed the issue that caused logging to break on one machine but not another (bitnet includes zetascale which is a parasite that will break logging)
|
2024-05-27 19:47:58 -05:00 |
|
|
c0ac84c795
|
uh
|
2024-05-27 19:05:56 -05:00 |
|
|
197d517181
|
ugh
|
2024-05-27 17:09:35 -05:00 |
|
|
5af6f41c94
|
added loss calcs against prom (requires the right settings for not shit results, disabled by default)
|
2024-05-27 08:43:00 -05:00 |
|
|
05cd8b797e
|
nevermind it breaks training
|
2024-05-25 18:03:43 -05:00 |
|
|
85f9684720
|
some cleanup
|
2024-05-25 17:46:52 -05:00 |
|
|
d760924719
|
added kludgy eval only so I don't have to start training, type eval, stop training, then delete the logs for that session
|
2024-05-25 17:39:51 -05:00 |
|
|
ddbacde0d1
|
DAC just doesn't work well enough......
|
2024-05-25 11:07:52 -05:00 |
|
|
e3ef89f5aa
|
100x better for subtrain/eval to be by group instead
|
2024-05-19 16:40:14 -05:00 |
|
|
458b95d196
|
added option to split between text loss and audio loss (to-do: document this better), because it may or may not be a problem with LLaMA-backed models because my loss hovers around 3.9 / 56% accuracy despite sounding decent at the moment
|
2024-05-19 11:23:56 -05:00 |
|
|
74e531d391
|
ugh
|
2024-05-18 12:02:56 -05:00 |
|
|
4bc7e5a6d1
|
fix loading without needing an hdf5 dataset already prepped (and some other incidental speedups during dataloader prep)
|
2024-05-18 07:14:26 -05:00 |
|
|
d88a5ca183
|
ugh
|
2024-05-16 07:25:33 -05:00 |
|
|
d9aabfa3ae
|
final tweaks, hopefully, again
|
2024-05-15 23:04:19 -05:00 |
|
|
8d79f78e0a
|
god I need to replace omegaconf
|
2024-05-12 14:01:52 -05:00 |
|
|
5eb5db7f7f
|
just don't use DAC 24Khz, it's bad
|
2024-05-12 13:41:17 -05:00 |
|
|
230da8b559
|
should be the final things to scramble around for, DAC's 24KHz model is unusable for this, but both encodec's 24KHz and DAC's 44KHz work
|
2024-05-12 13:22:08 -05:00 |
|
|
2437a86efa
|
ugh
|
2024-05-12 13:02:15 -05:00 |
|