Commit Graph

175 Commits

Author SHA1 Message Date
mrq
6845c447c9 added more harvard sentences to load from a text file 2024-11-21 13:18:11 -06:00
mrq
2b29790173 oops 2024-11-18 14:12:26 -06:00
mrq
4a71981456 normalize sampler index by batch size (if not using batched sampler), add option to cap out utterances for a speaker, some other things 2024-11-18 12:46:50 -06:00
mrq
39096f8ff3 redid loss calculation to be cleaner, and position ID generation, and other things (I might need to train the NAR-len from scratch and not resume from an existing checkpoint.........) 2024-11-14 22:17:47 -06:00
mrq
e412e98125 ugh 2024-11-14 07:34:22 -06:00
mrq
c00fc18b62 actually use the right embedding for nar-len 2024-11-13 18:04:04 -06:00
mrq
976ee87f6f resume iteration step in tqdm trainer, warn to logger if the sampler state dict was invalidated 2024-11-13 09:09:28 -06:00
mrq
0f2584eba7 new meme sampler PogChamp new meme sampler PogChamp (it sort of helps?) 2024-11-12 22:30:09 -06:00
mrq
2495a7ef67 Fixed STT in the web UI 2024-11-12 12:49:53 -06:00
mrq
2f56696506 overhauled inference/sampler kwargs to stop being a bloated mess 2024-11-11 20:21:16 -06:00
mrq
354f8e059d store dataset hash alongside state dict so it can be ignored if mismatched 2024-11-11 18:16:56 -06:00
mrq
f7b8b1e825 dropped subtrain dataloader since its useless to duplicate 2024-11-11 17:00:49 -06:00
mrq
cf9df71f2c use homwbrewed caching system for dataloader paths / durations (I'm pretty sure I am now triggering OOM killers with my entire dataset used) 2024-11-11 16:32:08 -06:00
mrq
9def34cd66 lol 2024-11-10 12:48:41 -06:00
mrq
9cb0b6901b unified nar.py into ar_nar.py 2024-11-10 12:19:48 -06:00
mrq
3826f9bae4 saner mask creation? (it doesnt matter, kv cache wont work) 2024-11-02 21:00:21 -05:00
mrq
ef1c17430f skip step on nan loss (ironically I have not had a nan loss after adding this), throw exception with invalid cfg.dataset.sample_type and sample_order combination (because I was tricked by this in my yaml and had inconsistent vram usage) 2024-11-01 20:54:53 -05:00
mrq
8eb9a4056b modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling 2024-10-22 18:12:39 -05:00
mrq
0dfab973e7 oops 2024-10-18 09:40:06 -05:00
mrq
75b90be325 cleaned up unused config flags, allow less strict yaml by pruning missing keys, renamed some dataset configs to be more unified 2024-10-17 17:06:48 -05:00
mrq
f88097ccf6 add config option to set the rate of sampling randomly vs similar speakers during training 2024-10-16 14:27:58 -05:00
mrq
bef43a0c18 added experimental entropix sampling support 2024-10-11 21:18:26 -05:00
mrq
85d85c1351 more arg creep for demo page 2024-10-10 19:40:01 -05:00
mrq
75a4c866d6 more demo page tweaks, added arg to force enable/disable LoRAs for inferencing (to-do: setup arg flags to handle this, and checkbox in web UI) 2024-10-10 19:04:12 -05:00
mrq
96d05be73c demo page tweaks 2024-10-10 13:52:37 -05:00
mrq
2ea978f318 added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval 2024-10-10 13:40:25 -05:00
mrq
ff7a1b4163 coerce into path for other sampler_types (it's required for sampling for similar utterances) 2024-09-26 18:37:56 -05:00
mrq
f24547ad4e add top_k sampling / offset for prompt similar utterance sampling 2024-09-26 16:26:40 -05:00
mrq
c5e9142863 added option to retokenize phonemes for hdf5 (to save having to remake my hdf5 file) 2024-09-21 13:08:01 -05:00
mrq
536c11c4ac actually validated and fixed sampling similar utterances for the prompt (hopefully nothing else is needed) 2024-09-21 12:59:51 -05:00
mrq
d31f27119a regex replace out the (lang) markers in espeak, updated tokenizer vocab as lazily as possible to not have unk tokens 2024-09-21 12:29:28 -05:00
mrq
769f67dcfe actually fix validation of phonemes in the symmap 2024-09-21 12:19:34 -05:00
mrq
fe241f6a99 support for wildcard in training/validation/noise dataset array (to-do: a better way to query between metadata folder and data folder) 2024-09-18 21:34:43 -05:00
mrq
fa9d3f6c06 lang fixes / reworked phoneme symmap validation 2024-09-18 19:36:03 -05:00
mrq
84647f588a more tweaks 2024-09-18 16:43:57 -05:00
mrq
ebac1db16c maybe final tweaks, I really needed to unify my json read/write and orjson is proven to be fast enough for me to try and rely on it more 2024-09-17 22:57:04 -05:00
mrq
a9fbe81f98 oops 2024-09-17 15:25:12 -05:00
mrq
c440c4fe7e relegated processing similarity data into vall_e.emb.similarity since it's easier, seems to work? 2024-09-17 14:37:21 -05:00
mrq
56f25f7a9b more stuff for similar-speaker prompt sampling (to-do: actually test if this works...) 2024-09-16 23:10:29 -05:00
mrq
1c615a0f52 helper script (vall_e.emb.similar) to figure out the best way to compute similarity scores for audio (iunno how to go about it desu) 2024-09-10 16:34:23 -05:00
mrq
d059f6f56d added helper script to process Emilia (amphion/Emilia-Dataset), clean up espeak phonemes for non-English transcriptions with English words (because for some reason espeak injects (en){word}(lang) markers and it's annoying) 2024-09-09 09:57:32 -05:00
mrq
31e8b7edb8 tweaks and fixes for lora stuffs 2024-09-08 18:05:21 -05:00
mrq
fa93061b3e more fixes, moved sampler state dict to a better place, eval works again 2024-09-06 16:59:56 -05:00
mrq
341e19162b fixes, again 2024-09-06 11:41:41 -05:00
mrq
94cf81d38c tweak 2024-09-05 23:21:18 -05:00
mrq
54547b74d8 experimental implementation of STT (need to actually test on a model, test trainer seems to work) 2024-09-05 20:43:20 -05:00
mrq
32287710a2 moved prints to use logger, edited readme (fused_attn doesnt seem stable for training) 2024-08-29 13:27:16 -05:00
mrq
d636edd3a2 added flash_attn LlamaAttention (including flash_attn==1.0.9) 2024-08-18 20:51:14 -05:00
mrq
2a1794c084 ughghghhhh 2024-08-09 21:15:01 -05:00
mrq
c658a7b440 make loss scaling opt-in rather than automatically determined (because it seems a DAC-based model really doesnt like loss scaling) 2024-08-09 10:51:36 -05:00