Commit Graph

78 Commits

Author SHA1 Message Date
mrq
48490757da fixes 2024-11-10 20:37:50 -06:00
mrq
bbc2de3713 ugh 2024-11-05 11:50:05 -06:00
mrq
3826f9bae4 saner mask creation? (it doesnt matter, kv cache wont work) 2024-11-02 21:00:21 -05:00
mrq
bef43a0c18 added experimental entropix sampling support 2024-10-11 21:18:26 -05:00
mrq
2ea978f318 added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval 2024-10-10 13:40:25 -05:00
mrq
52299127ab fix vall_e.emb.process 2024-10-08 20:00:34 -05:00
mrq
0656a762af fix vall_e.emb.transcriber 2024-10-08 19:24:43 -05:00
mrq
10df2ef5f3 fixed oversight where input audio does not resample (lol...) 2024-09-27 20:27:53 -05:00
mrq
c8d4716a9f ugh 2024-09-18 21:40:57 -05:00
mrq
fa9d3f6c06 lang fixes / reworked phoneme symmap validation 2024-09-18 19:36:03 -05:00
mrq
84647f588a more tweaks 2024-09-18 16:43:57 -05:00
mrq
ebac1db16c maybe final tweaks, I really needed to unify my json read/write and orjson is proven to be fast enough for me to try and rely on it more 2024-09-17 22:57:04 -05:00
mrq
6ceed866b5 *faster* 2024-09-17 22:44:36 -05:00
mrq
f00283440c faster 2024-09-17 22:26:31 -05:00
mrq
be22b65300 solved my problem 2024-09-17 21:58:44 -05:00
mrq
8f41d1b324 more tweaks 2024-09-17 16:26:30 -05:00
mrq
804ddb5182 optimizations (6 hours to do cosine similarities on a speaker set of just 17k utterances................) 2024-09-17 15:51:45 -05:00
mrq
a9fbe81f98 oops 2024-09-17 15:25:12 -05:00
mrq
c440c4fe7e relegated processing similarity data into vall_e.emb.similarity since it's easier, seems to work? 2024-09-17 14:37:21 -05:00
mrq
56f25f7a9b more stuff for similar-speaker prompt sampling (to-do: actually test if this works...) 2024-09-16 23:10:29 -05:00
mrq
69f140ba45 fix oversight with phonemizing french because espeak defines french as fr-fr instead of fr (even though spain spanish is es and not es-sp or some shit, but portugal portuguese is pt-pt) 2024-09-13 12:53:36 -05:00
mrq
4f3c7a37c8 also do text similarities (dont know what use I'll have for this) 2024-09-10 16:45:59 -05:00
mrq
1c615a0f52 helper script (vall_e.emb.similar) to figure out the best way to compute similarity scores for audio (iunno how to go about it desu) 2024-09-10 16:34:23 -05:00
mrq
32287710a2 moved prints to use logger, edited readme (fused_attn doesnt seem stable for training) 2024-08-29 13:27:16 -05:00
mrq
054d28573a my DAC dataset again managed to only have some utterances with only 8 of 9 RVQ levels, this fixes an oversight from it 2024-08-09 21:18:01 -05:00
mrq
79a6781c9e fix vall_e.data --action=hdf5 actually transcribing because past me completely forgot it tried to already put the transcribe/process dataset scripts inside the module before 2024-08-08 07:51:42 -05:00
mrq
613024ec0d ugh 2024-08-06 20:35:15 -05:00
mrq
eac353cd0b busy work and cleanup while I wait for 1TB of audio to quantize... again. 2024-08-06 20:23:33 -05:00
mrq
f284c7ea9c do mixed-precision for AMP inside the compress function itself, because the loudness function gripes when using a float16 (non-power of 2 lengths) or bfloat16 (something about views for bfloat16) 2024-08-06 15:08:37 -05:00
mrq
b6ba2cc8e7 tweaked vall_e.emb.process to instead process audio one file at a time instead of all the files for a given speaker to avoid OOMing on less-memory-filled systems with --low-memory 2024-08-06 14:24:40 -05:00
mrq
9710b06b74 tweaks and things 2024-08-06 08:17:25 -05:00
mrq
134dac8c2b re-adapted process_libritts.py to a 'better' way (better because it processed without needing to shuffle a bunch of things and adapt to cope or something) 2024-08-05 20:34:58 -05:00
mrq
3f73fcca29 oops 2024-08-05 20:12:13 -05:00
mrq
597441e48b moved transcribe and process dataset scripts to vall_e/emb within the module itself, argparse-ified transcription script 2024-08-05 19:40:50 -05:00
mrq
75b04686f8 added prom-less training / inferencing, some other things 2024-07-22 19:36:07 -05:00
mrq
491ae2a684 some insanity for sanity checks (some phonemes from phonemizing japanese are not in my tokenizer...) 2024-07-22 00:30:40 -05:00
mrq
ad024f400f actually pass language into dataset process script, fix coercing japanese into hiragana because espeak does not like kanji 2024-07-21 23:21:37 -05:00
mrq
28a674e0f1 fixes... 2024-07-18 23:25:32 -05:00
mrq
bccbb77a1a added option to either naively concat codes to concat audio waveforms (prior behavior) or to decode => concat => encode instead (although this only currently happens for prom sampling if an utternace is too small) 2024-07-18 16:48:41 -05:00
mrq
7b210d9738 sanity cleanup 2024-07-04 15:58:08 -05:00
mrq
1ecf2793f4 (commented-out) support for facebookresearch/AudioDec, but support really didn't wow me (so I commented it out until I figure out why my output audio is super crusty with AudioDec) 2024-07-04 15:40:51 -05:00
mrq
b21f74a5c5 added summing of external embeddings (at this point i dont think any amount of cope bandaids will get DAC to train nicely, I think the RVQ levels the NAR tends add too much noise if they're not accurate) 2024-06-29 23:42:30 -05:00
mrq
793ccb16fb ugh 2024-06-29 22:14:35 -05:00
mrq
2808f881c8 cleaned up subjugated audio embedding into a flag, flag can also have it include the original, underlying embedding as well (it seems to do better when set to inclusive) 2024-06-29 21:46:35 -05:00
mrq
ec5eaebcbc experimental method of using DACs quantizer ""embeddings"" to see if it helps with model quality 2024-06-29 19:46:11 -05:00
mrq
234f9efc6e ugh 2024-06-09 11:39:43 -05:00
mrq
ddbacde0d1 DAC just doesn't work well enough...... 2024-05-25 11:07:52 -05:00
mrq
74e531d391 ugh 2024-05-18 12:02:56 -05:00
mrq
5eb5db7f7f just don't use DAC 24Khz, it's bad 2024-05-12 13:41:17 -05:00
mrq
230da8b559 should be the final things to scramble around for, DAC's 24KHz model is unusable for this, but both encodec's 24KHz and DAC's 44KHz work 2024-05-12 13:22:08 -05:00