|
a6ad0577b8
|
cleanup the resultant text from STT
|
2024-09-06 18:44:25 -05:00 |
|
|
fa93061b3e
|
more fixes, moved sampler state dict to a better place, eval works again
|
2024-09-06 16:59:56 -05:00 |
|
|
4bd9bb39c8
|
webui for STT (still need to bake the model to handle it better, a few hours so far has it generate what looks like a normal transcription but does not correlate to the audio right now)
|
2024-09-06 15:13:04 -05:00 |
|
|
d33a906119
|
cleanup for AR_NAR inferencing to allow both TTS and STT tasks simultaneously (need to have training eval do this to though)
|
2024-09-06 14:30:12 -05:00 |
|
|
341e19162b
|
fixes, again
|
2024-09-06 11:41:41 -05:00 |
|
|
94cf81d38c
|
tweak
|
2024-09-05 23:21:18 -05:00 |
|
|
413097f5f7
|
fixes
|
2024-09-05 21:42:59 -05:00 |
|
|
54547b74d8
|
experimental implementation of STT (need to actually test on a model, test trainer seems to work)
|
2024-09-05 20:43:20 -05:00 |
|
|
d319d33368
|
haha
|
2024-09-04 14:52:26 -05:00 |
|
|
619369236b
|
ugh
|
2024-08-30 21:10:57 -05:00 |
|
|
168e203942
|
ugh
|
2024-08-30 14:39:07 -05:00 |
|
|
685f4faec0
|
ugh
|
2024-08-30 10:46:26 -05:00 |
|
|
32287710a2
|
moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)
|
2024-08-29 13:27:16 -05:00 |
|
|
d423bc03c2
|
fixed attentions for MoE
|
2024-08-27 17:02:42 -05:00 |
|
|
b7b99a25f1
|
added ability to specify attention backend for CLI and webui (because im tired of editing the yaml)
|
2024-08-26 19:33:51 -05:00 |
|
|
0d706ec6a1
|
added fused_attn (triton-based fused attention) and simply just query for flash_attn under rocm
|
2024-08-26 19:13:34 -05:00 |
|
|
6b0891448c
|
pain (some shit to try and get some flash attention for ROCm (gfx1100) through triton fused attention but no good)
|
2024-08-25 20:07:27 -05:00 |
|
|
40e1799adc
|
fixed xformers and flash_attn to actually work now
|
2024-08-19 01:03:35 -05:00 |
|
|
29c35528e5
|
the sooner I accept there's no FA for V100s the sooner I'll go to bed
|
2024-08-18 23:54:33 -05:00 |
|
|
d636edd3a2
|
added flash_attn LlamaAttention (including flash_attn==1.0.9)
|
2024-08-18 20:51:14 -05:00 |
|
|
054d28573a
|
my DAC dataset again managed to only have some utterances with only 8 of 9 RVQ levels, this fixes an oversight from it
|
2024-08-09 21:18:01 -05:00 |
|
|
2a1794c084
|
ughghghhhh
|
2024-08-09 21:15:01 -05:00 |
|
|
ed373957e2
|
maybe not
|
2024-08-09 11:38:08 -05:00 |
|
|
c658a7b440
|
make loss scaling opt-in rather than automatically determined (because it seems a DAC-based model really doesnt like loss scaling)
|
2024-08-09 10:51:36 -05:00 |
|
|
d04f6911b4
|
oops
|
2024-08-08 19:38:55 -05:00 |
|
|
0aa59e6f3f
|
uncommented block that writes the metadata on HDF5 creation
|
2024-08-08 19:21:29 -05:00 |
|
|
79a6781c9e
|
fix vall_e.data --action=hdf5 actually transcribing because past me completely forgot it tried to already put the transcribe/process dataset scripts inside the module before
|
2024-08-08 07:51:42 -05:00 |
|
|
949339a3fa
|
do not include SDPA attention if there's no available SDPA backends
|
2024-08-06 20:42:39 -05:00 |
|
|
613024ec0d
|
ugh
|
2024-08-06 20:35:15 -05:00 |
|
|
eac353cd0b
|
busy work and cleanup while I wait for 1TB of audio to quantize... again.
|
2024-08-06 20:23:33 -05:00 |
|
|
f284c7ea9c
|
do mixed-precision for AMP inside the compress function itself, because the loudness function gripes when using a float16 (non-power of 2 lengths) or bfloat16 (something about views for bfloat16)
|
2024-08-06 15:08:37 -05:00 |
|
|
b6ba2cc8e7
|
tweaked vall_e.emb.process to instead process audio one file at a time instead of all the files for a given speaker to avoid OOMing on less-memory-filled systems with --low-memory
|
2024-08-06 14:24:40 -05:00 |
|
|
9710b06b74
|
tweaks and things
|
2024-08-06 08:17:25 -05:00 |
|
|
8bac8fe902
|
oops
|
2024-08-05 20:38:29 -05:00 |
|
|
134dac8c2b
|
re-adapted process_libritts.py to a 'better' way (better because it processed without needing to shuffle a bunch of things and adapt to cope or something)
|
2024-08-05 20:34:58 -05:00 |
|
|
3f73fcca29
|
oops
|
2024-08-05 20:12:13 -05:00 |
|
|
597441e48b
|
moved transcribe and process dataset scripts to vall_e/emb within the module itself, argparse-ified transcription script
|
2024-08-05 19:40:50 -05:00 |
|
|
7cdfa3dc0c
|
updated process_datasets.py, added argparsing so I can mostly stop manually editing things, and some other cleanup
|
2024-08-05 15:59:25 -05:00 |
|
|
debcc93e7e
|
add adapted MixtralAttention for when I make a bad decision to actually train a MoE
|
2024-08-04 22:03:22 -05:00 |
|
|
10aaf840e7
|
added export option to convert Llama to MixtralMoE for another dumb experiment
|
2024-08-04 20:25:06 -05:00 |
|
|
3a65cc4b22
|
fix issue with sft and shared tensors...
|
2024-08-04 19:56:21 -05:00 |
|
|
23f3b56fda
|
oops
|
2024-08-04 08:18:57 -05:00 |
|
|
d19f93a2c0
|
documentation update
|
2024-08-04 00:14:49 -05:00 |
|
|
2cb465018b
|
implicitly load either normal pickled weights or safetensors on loading the model
|
2024-08-03 23:34:18 -05:00 |
|
|
c09133d00f
|
added safetensors support (with metadata) and feed whatever torch.load/torch.save into it
|
2024-08-03 23:15:20 -05:00 |
|
|
6a733eb2ed
|
changed torch.Tensor().to(device, dtype) to just torch.tensor(..., device, dtype) because it's been bothering my autism that I'm creating tensors then converting rather than creating with the right device/dtype, some 'optimization' to compile the model but it doesnt seem to do anything useful
|
2024-08-03 22:10:21 -05:00 |
|
|
ab673e0426
|
add cap for NAR-len training, to avoid any weird cases in early training where it'll just mess up and generate long lengths
|
2024-08-03 21:00:32 -05:00 |
|
|
4d2b88b164
|
throw exception if training, but no model is set to train (because i ran into this wondering what the hell was happening)
|
2024-08-03 20:51:23 -05:00 |
|
|
d0a5c7eca2
|
more coping with the NAR len
|
2024-08-03 20:23:36 -05:00 |
|
|
11fa3da665
|
some cleanup, fixed the wrapper attention to explicitly use other sdpa backends
|
2024-08-03 19:51:00 -05:00 |
|