a507b769a1sped up inferencing by not doing .tolist() for rep pen / length pen (and a bug fix in the web UI from prev commit)mrq2024-10-04 22:18:20 -0500
4a8e3ccf06README tweaks, added --input-prompt-prefix as an experiment (its literally better to just not do this, but i'll retain it in case i have a revelation on how to improve it)mrq2024-10-04 18:57:19 -0500
a9fa0898a9tweaked demo page script to sample speakers insteadmrq2024-09-28 10:50:26 -0500
2f1dca3089added language selection in web UI, tweaked demo scriptmrq2024-09-28 09:49:45 -0500
10df2ef5f3fixed oversight where input audio does not resample (lol...)mrq2024-09-27 20:27:53 -0500
039482a48edon't do eval on stt because it's so slow and I don't even bother doing any metrics against it anyways (to-do: make this a flag)mrq2024-09-26 18:56:57 -0500
ff7a1b4163coerce into path for other sampler_types (it's required for sampling for similar utterances)mrq2024-09-26 18:37:56 -0500
f24547ad4eadd top_k sampling / offset for prompt similar utterance samplingmrq2024-09-26 16:26:40 -0500
9da630f73aswap order of demo entries, as the model prioritizes adhering to the speaker prompt more (instead of trying to match the ground truth magically)mrq2024-09-25 23:31:24 -0500
c5e9142863added option to retokenize phonemes for hdf5 (to save having to remake my hdf5 file)mrq2024-09-21 13:08:01 -0500
536c11c4acactually validated and fixed sampling similar utterances for the prompt (hopefully nothing else is needed)mrq2024-09-21 12:59:51 -0500
d31f27119aregex replace out the (lang) markers in espeak, updated tokenizer vocab as lazily as possible to not have unk tokensmrq2024-09-21 12:29:28 -0500
769f67dcfeactually fix validation of phonemes in the symmapmrq2024-09-21 12:19:34 -0500
fe241f6a99support for wildcard in training/validation/noise dataset array (to-do: a better way to query between metadata folder and data folder)mrq2024-09-18 21:34:43 -0500
b5bec0c9ceoops, turns out these are not split by speaker names already........ (also added sampling the dataset in the webui for easy viewing)mrq2024-09-18 20:19:46 -0500
ebac1db16cmaybe final tweaks, I really needed to unify my json read/write and orjson is proven to be fast enough for me to try and rely on it moremrq2024-09-17 22:57:04 -0500
c440c4fe7erelegated processing similarity data into vall_e.emb.similarity since it's easier, seems to work?mrq2024-09-17 14:37:21 -0500
56f25f7a9bmore stuff for similar-speaker prompt sampling (to-do: actually test if this works...)mrq2024-09-16 23:10:29 -0500
69f140ba45fix oversight with phonemizing french because espeak defines french as fr-fr instead of fr (even though spain spanish is es and not es-sp or some shit, but portugal portuguese is pt-pt)mrq2024-09-13 12:53:36 -0500
4f3c7a37c8also do text similarities (dont know what use I'll have for this)mrq2024-09-10 16:45:59 -0500
1c615a0f52helper script (vall_e.emb.similar) to figure out the best way to compute similarity scores for audio (iunno how to go about it desu)mrq2024-09-10 16:34:23 -0500
17487ad70aweird quirk in process_emilia.py where language gets mutated, somehow (I hate python)mrq2024-09-10 14:00:27 -0500
d059f6f56dadded helper script to process Emilia (amphion/Emilia-Dataset), clean up espeak phonemes for non-English transcriptions with English words (because for some reason espeak injects (en){word}(lang) markers and it's annoying)mrq2024-09-09 09:57:32 -0500
31e8b7edb8tweaks and fixes for lora stuffsmrq2024-09-08 18:05:21 -0500
54203c059dvalidated rep pen for STT (sometimes needed to wrangle the model)mrq2024-09-08 08:30:30 -0500
5d66a7db52webui cleanup, more tweaks, default to safetensors in configmrq2024-09-07 21:45:05 -0500
a6ad0577b8cleanup the resultant text from STTmrq2024-09-06 18:44:25 -0500
fa93061b3emore fixes, moved sampler state dict to a better place, eval works againmrq2024-09-06 16:59:56 -0500
4bd9bb39c8webui for STT (still need to bake the model to handle it better, a few hours so far has it generate what looks like a normal transcription but does not correlate to the audio right now)mrq2024-09-06 15:13:04 -0500
d33a906119cleanup for AR_NAR inferencing to allow both TTS and STT tasks simultaneously (need to have training eval do this to though)mrq2024-09-06 14:30:12 -0500
32287710a2moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)mrq2024-08-29 13:27:16 -0500
d423bc03c2fixed attentions for MoEmrq2024-08-27 17:02:42 -0500
b7b99a25f1added ability to specify attention backend for CLI and webui (because im tired of editing the yaml)mrq2024-08-26 19:33:51 -0500
0d706ec6a1added fused_attn (triton-based fused attention) and simply just query for flash_attn under rocmmrq2024-08-26 19:13:34 -0500
6b0891448cpain (some shit to try and get some flash attention for ROCm (gfx1100) through triton fused attention but no good)mrq2024-08-25 20:07:27 -0500
40e1799adcfixed xformers and flash_attn to actually work nowmrq2024-08-19 01:03:35 -0500
29c35528e5the sooner I accept there's no FA for V100s the sooner I'll go to bedmrq2024-08-18 23:54:33 -0500
054d28573amy DAC dataset again managed to only have some utterances with only 8 of 9 RVQ levels, this fixes an oversight from itmrq2024-08-09 21:18:01 -0500
c658a7b440make loss scaling opt-in rather than automatically determined (because it seems a DAC-based model really doesnt like loss scaling)mrq2024-08-09 10:51:36 -0500
0aa59e6f3funcommented block that writes the metadata on HDF5 creationmrq2024-08-08 19:21:29 -0500
79a6781c9efix vall_e.data --action=hdf5 actually transcribing because past me completely forgot it tried to already put the transcribe/process dataset scripts inside the module beforemrq2024-08-08 07:51:42 -0500
949339a3fado not include SDPA attention if there's no available SDPA backendsmrq2024-08-06 20:42:39 -0500
eac353cd0bbusy work and cleanup while I wait for 1TB of audio to quantize... again.mrq2024-08-06 20:23:33 -0500
f284c7ea9cdo mixed-precision for AMP inside the compress function itself, because the loudness function gripes when using a float16 (non-power of 2 lengths) or bfloat16 (something about views for bfloat16)mrq2024-08-06 15:08:37 -0500
b6ba2cc8e7tweaked vall_e.emb.process to instead process audio one file at a time instead of all the files for a given speaker to avoid OOMing on less-memory-filled systems with --low-memorymrq2024-08-06 14:24:40 -0500
9710b06b74tweaks and thingsmrq2024-08-06 08:17:25 -0500
134dac8c2bre-adapted process_libritts.py to a 'better' way (better because it processed without needing to shuffle a bunch of things and adapt to cope or something)mrq2024-08-05 20:34:58 -0500
597441e48bmoved transcribe and process dataset scripts to vall_e/emb within the module itself, argparse-ified transcription scriptmrq2024-08-05 19:40:50 -0500
7cdfa3dc0cupdated process_datasets.py, added argparsing so I can mostly stop manually editing things, and some other cleanupmrq2024-08-05 15:59:25 -0500
debcc93e7eadd adapted MixtralAttention for when I make a bad decision to actually train a MoEmrq2024-08-04 22:03:22 -0500
10aaf840e7added export option to convert Llama to MixtralMoE for another dumb experimentmrq2024-08-04 20:25:06 -0500
3a65cc4b22fix issue with sft and shared tensors...mrq2024-08-04 19:56:21 -0500
2cb465018bimplicitly load either normal pickled weights or safetensors on loading the modelmrq2024-08-03 23:34:18 -0500
c09133d00fadded safetensors support (with metadata) and feed whatever torch.load/torch.save into itmrq2024-08-03 23:15:20 -0500
6a733eb2edchanged torch.Tensor().to(device, dtype) to just torch.tensor(..., device, dtype) because it's been bothering my autism that I'm creating tensors then converting rather than creating with the right device/dtype, some 'optimization' to compile the model but it doesnt seem to do anything usefulmrq2024-08-03 22:10:21 -0500
ab673e0426add cap for NAR-len training, to avoid any weird cases in early training where it'll just mess up and generate long lengthsmrq2024-08-03 21:00:32 -0500
4d2b88b164throw exception if training, but no model is set to train (because i ran into this wondering what the hell was happening)mrq2024-08-03 20:51:23 -0500
d0a5c7eca2more coping with the NAR lenmrq2024-08-03 20:23:36 -0500
11fa3da665some cleanup, fixed the wrapper attention to explicitly use other sdpa backendsmrq2024-08-03 19:51:00 -0500
9564ecda43wrapper attention class for other sdpa backends + xformers seems to have broke...mrq2024-08-03 15:12:11 -0500
9e1989be1btweaked initial NAR pass's initial token embeddings to use a different value, or osmethingmrq2024-08-03 09:01:37 -0500
26f74c5739somehow fixed non-unified position IDs for the NAR-lenmrq2024-08-03 08:43:42 -0500
66407e5bdbtweaks for the NAR-len model, maybemrq2024-08-03 08:40:39 -0500
97c5241beffixes, throw an exception when using NAR only model with non-unified position IDs, since for some reason it outputs garbage for the NARmrq2024-08-02 22:25:49 -0500
4456d3172bthat's what I get for testing without hdf5 on my previous machine....mrq2024-08-02 20:44:01 -0500
7a77978096oversight with using resize_modulesmrq2024-08-02 20:28:49 -0500
443422ecb5ugh, finally got some form of offloading working (need to test if it works on different GPUs, but GPU and CPU offloading seems to work in the test trainer)mrq2024-08-01 22:43:39 -0500
c9ec6b28efit actually wasn't working because Engines.__init__() automatically moves the entire module to the requested device, which was being called after offloading the model in the test trainer (and it seems I cant do it without injecting a bunch of shit in modeling_llama.py)mrq2024-08-01 20:56:28 -0500
b4c895114cnaive model offloading support (handles automatically splitting parts of the model to requested device per memory constraints, either inferred or requested in the yaml, input tensors are automatically migrated to the right device, it SEEMS to work for training under the test trainer when split between GPU and CPU) (this was specifically only because that Flux imagegen model released so I can test it there)mrq2024-08-01 20:12:06 -0500
387358bc8afixes for the NAR-len model, and documentation some config options, and a better way to handle resizing modules on state_dict loadmrq2024-07-31 20:35:09 -0500