|
3826f9bae4
|
saner mask creation? (it doesnt matter, kv cache wont work)
|
2024-11-02 21:00:21 -05:00 |
|
|
ef1c17430f
|
skip step on nan loss (ironically I have not had a nan loss after adding this), throw exception with invalid cfg.dataset.sample_type and sample_order combination (because I was tricked by this in my yaml and had inconsistent vram usage)
|
2024-11-01 20:54:53 -05:00 |
|
|
8eb9a4056b
|
modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling
|
2024-10-22 18:12:39 -05:00 |
|
|
0dfab973e7
|
oops
|
2024-10-18 09:40:06 -05:00 |
|
|
75b90be325
|
cleaned up unused config flags, allow less strict yaml by pruning missing keys, renamed some dataset configs to be more unified
|
2024-10-17 17:06:48 -05:00 |
|
|
f88097ccf6
|
add config option to set the rate of sampling randomly vs similar speakers during training
|
2024-10-16 14:27:58 -05:00 |
|
|
bef43a0c18
|
added experimental entropix sampling support
|
2024-10-11 21:18:26 -05:00 |
|
|
85d85c1351
|
more arg creep for demo page
|
2024-10-10 19:40:01 -05:00 |
|
|
75a4c866d6
|
more demo page tweaks, added arg to force enable/disable LoRAs for inferencing (to-do: setup arg flags to handle this, and checkbox in web UI)
|
2024-10-10 19:04:12 -05:00 |
|
|
96d05be73c
|
demo page tweaks
|
2024-10-10 13:52:37 -05:00 |
|
|
2ea978f318
|
added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval
|
2024-10-10 13:40:25 -05:00 |
|
|
ff7a1b4163
|
coerce into path for other sampler_types (it's required for sampling for similar utterances)
|
2024-09-26 18:37:56 -05:00 |
|
|
f24547ad4e
|
add top_k sampling / offset for prompt similar utterance sampling
|
2024-09-26 16:26:40 -05:00 |
|
|
c5e9142863
|
added option to retokenize phonemes for hdf5 (to save having to remake my hdf5 file)
|
2024-09-21 13:08:01 -05:00 |
|
|
536c11c4ac
|
actually validated and fixed sampling similar utterances for the prompt (hopefully nothing else is needed)
|
2024-09-21 12:59:51 -05:00 |
|
|
d31f27119a
|
regex replace out the (lang) markers in espeak, updated tokenizer vocab as lazily as possible to not have unk tokens
|
2024-09-21 12:29:28 -05:00 |
|
|
769f67dcfe
|
actually fix validation of phonemes in the symmap
|
2024-09-21 12:19:34 -05:00 |
|
|
fe241f6a99
|
support for wildcard in training/validation/noise dataset array (to-do: a better way to query between metadata folder and data folder)
|
2024-09-18 21:34:43 -05:00 |
|
|
fa9d3f6c06
|
lang fixes / reworked phoneme symmap validation
|
2024-09-18 19:36:03 -05:00 |
|
|
84647f588a
|
more tweaks
|
2024-09-18 16:43:57 -05:00 |
|
|
ebac1db16c
|
maybe final tweaks, I really needed to unify my json read/write and orjson is proven to be fast enough for me to try and rely on it more
|
2024-09-17 22:57:04 -05:00 |
|
|
a9fbe81f98
|
oops
|
2024-09-17 15:25:12 -05:00 |
|
|
c440c4fe7e
|
relegated processing similarity data into vall_e.emb.similarity since it's easier, seems to work?
|
2024-09-17 14:37:21 -05:00 |
|
|
56f25f7a9b
|
more stuff for similar-speaker prompt sampling (to-do: actually test if this works...)
|
2024-09-16 23:10:29 -05:00 |
|
|
1c615a0f52
|
helper script (vall_e.emb.similar) to figure out the best way to compute similarity scores for audio (iunno how to go about it desu)
|
2024-09-10 16:34:23 -05:00 |
|
|
d059f6f56d
|
added helper script to process Emilia (amphion/Emilia-Dataset), clean up espeak phonemes for non-English transcriptions with English words (because for some reason espeak injects (en){word}(lang) markers and it's annoying)
|
2024-09-09 09:57:32 -05:00 |
|
|
31e8b7edb8
|
tweaks and fixes for lora stuffs
|
2024-09-08 18:05:21 -05:00 |
|
|
fa93061b3e
|
more fixes, moved sampler state dict to a better place, eval works again
|
2024-09-06 16:59:56 -05:00 |
|
|
341e19162b
|
fixes, again
|
2024-09-06 11:41:41 -05:00 |
|
|
94cf81d38c
|
tweak
|
2024-09-05 23:21:18 -05:00 |
|
|
54547b74d8
|
experimental implementation of STT (need to actually test on a model, test trainer seems to work)
|
2024-09-05 20:43:20 -05:00 |
|
|
32287710a2
|
moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)
|
2024-08-29 13:27:16 -05:00 |
|
|
d636edd3a2
|
added flash_attn LlamaAttention (including flash_attn==1.0.9)
|
2024-08-18 20:51:14 -05:00 |
|
|
2a1794c084
|
ughghghhhh
|
2024-08-09 21:15:01 -05:00 |
|
|
c658a7b440
|
make loss scaling opt-in rather than automatically determined (because it seems a DAC-based model really doesnt like loss scaling)
|
2024-08-09 10:51:36 -05:00 |
|
|
d04f6911b4
|
oops
|
2024-08-08 19:38:55 -05:00 |
|
|
0aa59e6f3f
|
uncommented block that writes the metadata on HDF5 creation
|
2024-08-08 19:21:29 -05:00 |
|
|
79a6781c9e
|
fix vall_e.data --action=hdf5 actually transcribing because past me completely forgot it tried to already put the transcribe/process dataset scripts inside the module before
|
2024-08-08 07:51:42 -05:00 |
|
|
eac353cd0b
|
busy work and cleanup while I wait for 1TB of audio to quantize... again.
|
2024-08-06 20:23:33 -05:00 |
|
|
c09133d00f
|
added safetensors support (with metadata) and feed whatever torch.load/torch.save into it
|
2024-08-03 23:15:20 -05:00 |
|
|
6a733eb2ed
|
changed torch.Tensor().to(device, dtype) to just torch.tensor(..., device, dtype) because it's been bothering my autism that I'm creating tensors then converting rather than creating with the right device/dtype, some 'optimization' to compile the model but it doesnt seem to do anything useful
|
2024-08-03 22:10:21 -05:00 |
|
|
97c5241bef
|
fixes, throw an exception when using NAR only model with non-unified position IDs, since for some reason it outputs garbage for the NAR
|
2024-08-02 22:25:49 -05:00 |
|
|
4456d3172b
|
that's what I get for testing without hdf5 on my previous machine....
|
2024-08-02 20:44:01 -05:00 |
|
|
ce8bb1e4f7
|
sanity cleanups with weird off-by-one-ness, cleaned up and validated vall_e.models.experimental works again
|
2024-07-27 15:36:05 -05:00 |
|
|
682e4387dc
|
oops (fixed proms being erased from a config oversight)
|
2024-07-25 12:39:57 -05:00 |
|
|
75b04686f8
|
added prom-less training / inferencing, some other things
|
2024-07-22 19:36:07 -05:00 |
|
|
491ae2a684
|
some insanity for sanity checks (some phonemes from phonemizing japanese are not in my tokenizer...)
|
2024-07-22 00:30:40 -05:00 |
|
|
e19aa643a6
|
cleaned up demo page creation, added option to pass in RVQ level sampling distribution for training
|
2024-07-21 19:12:03 -05:00 |
|
|
d87b492295
|
added rudimentary demo page creator (currently just embeds base64 wavs into the page, need to test not doing that)
|
2024-07-19 20:49:40 -05:00 |
|
|
28a674e0f1
|
fixes...
|
2024-07-18 23:25:32 -05:00 |
|