|
bef43a0c18
|
added experimental entropix sampling support
|
2024-10-11 21:18:26 -05:00 |
|
|
2ea978f318
|
added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval
|
2024-10-10 13:40:25 -05:00 |
|
|
52299127ab
|
fix vall_e.emb.process
|
2024-10-08 20:00:34 -05:00 |
|
|
0656a762af
|
fix vall_e.emb.transcriber
|
2024-10-08 19:24:43 -05:00 |
|
|
10df2ef5f3
|
fixed oversight where input audio does not resample (lol...)
|
2024-09-27 20:27:53 -05:00 |
|
|
c8d4716a9f
|
ugh
|
2024-09-18 21:40:57 -05:00 |
|
|
fa9d3f6c06
|
lang fixes / reworked phoneme symmap validation
|
2024-09-18 19:36:03 -05:00 |
|
|
84647f588a
|
more tweaks
|
2024-09-18 16:43:57 -05:00 |
|
|
ebac1db16c
|
maybe final tweaks, I really needed to unify my json read/write and orjson is proven to be fast enough for me to try and rely on it more
|
2024-09-17 22:57:04 -05:00 |
|
|
6ceed866b5
|
*faster*
|
2024-09-17 22:44:36 -05:00 |
|
|
f00283440c
|
faster
|
2024-09-17 22:26:31 -05:00 |
|
|
be22b65300
|
solved my problem
|
2024-09-17 21:58:44 -05:00 |
|
|
8f41d1b324
|
more tweaks
|
2024-09-17 16:26:30 -05:00 |
|
|
804ddb5182
|
optimizations (6 hours to do cosine similarities on a speaker set of just 17k utterances................)
|
2024-09-17 15:51:45 -05:00 |
|
|
a9fbe81f98
|
oops
|
2024-09-17 15:25:12 -05:00 |
|
|
c440c4fe7e
|
relegated processing similarity data into vall_e.emb.similarity since it's easier, seems to work?
|
2024-09-17 14:37:21 -05:00 |
|
|
56f25f7a9b
|
more stuff for similar-speaker prompt sampling (to-do: actually test if this works...)
|
2024-09-16 23:10:29 -05:00 |
|
|
69f140ba45
|
fix oversight with phonemizing french because espeak defines french as fr-fr instead of fr (even though spain spanish is es and not es-sp or some shit, but portugal portuguese is pt-pt)
|
2024-09-13 12:53:36 -05:00 |
|
|
4f3c7a37c8
|
also do text similarities (dont know what use I'll have for this)
|
2024-09-10 16:45:59 -05:00 |
|
|
1c615a0f52
|
helper script (vall_e.emb.similar) to figure out the best way to compute similarity scores for audio (iunno how to go about it desu)
|
2024-09-10 16:34:23 -05:00 |
|
|
32287710a2
|
moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)
|
2024-08-29 13:27:16 -05:00 |
|
|
054d28573a
|
my DAC dataset again managed to only have some utterances with only 8 of 9 RVQ levels, this fixes an oversight from it
|
2024-08-09 21:18:01 -05:00 |
|
|
79a6781c9e
|
fix vall_e.data --action=hdf5 actually transcribing because past me completely forgot it tried to already put the transcribe/process dataset scripts inside the module before
|
2024-08-08 07:51:42 -05:00 |
|
|
613024ec0d
|
ugh
|
2024-08-06 20:35:15 -05:00 |
|
|
eac353cd0b
|
busy work and cleanup while I wait for 1TB of audio to quantize... again.
|
2024-08-06 20:23:33 -05:00 |
|
|
f284c7ea9c
|
do mixed-precision for AMP inside the compress function itself, because the loudness function gripes when using a float16 (non-power of 2 lengths) or bfloat16 (something about views for bfloat16)
|
2024-08-06 15:08:37 -05:00 |
|
|
b6ba2cc8e7
|
tweaked vall_e.emb.process to instead process audio one file at a time instead of all the files for a given speaker to avoid OOMing on less-memory-filled systems with --low-memory
|
2024-08-06 14:24:40 -05:00 |
|
|
9710b06b74
|
tweaks and things
|
2024-08-06 08:17:25 -05:00 |
|
|
134dac8c2b
|
re-adapted process_libritts.py to a 'better' way (better because it processed without needing to shuffle a bunch of things and adapt to cope or something)
|
2024-08-05 20:34:58 -05:00 |
|
|
3f73fcca29
|
oops
|
2024-08-05 20:12:13 -05:00 |
|
|
597441e48b
|
moved transcribe and process dataset scripts to vall_e/emb within the module itself, argparse-ified transcription script
|
2024-08-05 19:40:50 -05:00 |
|
|
75b04686f8
|
added prom-less training / inferencing, some other things
|
2024-07-22 19:36:07 -05:00 |
|
|
491ae2a684
|
some insanity for sanity checks (some phonemes from phonemizing japanese are not in my tokenizer...)
|
2024-07-22 00:30:40 -05:00 |
|
|
ad024f400f
|
actually pass language into dataset process script, fix coercing japanese into hiragana because espeak does not like kanji
|
2024-07-21 23:21:37 -05:00 |
|
|
28a674e0f1
|
fixes...
|
2024-07-18 23:25:32 -05:00 |
|
|
bccbb77a1a
|
added option to either naively concat codes to concat audio waveforms (prior behavior) or to decode => concat => encode instead (although this only currently happens for prom sampling if an utternace is too small)
|
2024-07-18 16:48:41 -05:00 |
|
|
7b210d9738
|
sanity cleanup
|
2024-07-04 15:58:08 -05:00 |
|
|
1ecf2793f4
|
(commented-out) support for facebookresearch/AudioDec, but support really didn't wow me (so I commented it out until I figure out why my output audio is super crusty with AudioDec)
|
2024-07-04 15:40:51 -05:00 |
|
|
b21f74a5c5
|
added summing of external embeddings (at this point i dont think any amount of cope bandaids will get DAC to train nicely, I think the RVQ levels the NAR tends add too much noise if they're not accurate)
|
2024-06-29 23:42:30 -05:00 |
|
|
793ccb16fb
|
ugh
|
2024-06-29 22:14:35 -05:00 |
|
|
2808f881c8
|
cleaned up subjugated audio embedding into a flag, flag can also have it include the original, underlying embedding as well (it seems to do better when set to inclusive)
|
2024-06-29 21:46:35 -05:00 |
|
|
ec5eaebcbc
|
experimental method of using DACs quantizer ""embeddings"" to see if it helps with model quality
|
2024-06-29 19:46:11 -05:00 |
|
|
234f9efc6e
|
ugh
|
2024-06-09 11:39:43 -05:00 |
|
|
ddbacde0d1
|
DAC just doesn't work well enough......
|
2024-05-25 11:07:52 -05:00 |
|
|
74e531d391
|
ugh
|
2024-05-18 12:02:56 -05:00 |
|
|
5eb5db7f7f
|
just don't use DAC 24Khz, it's bad
|
2024-05-12 13:41:17 -05:00 |
|
|
230da8b559
|
should be the final things to scramble around for, DAC's 24KHz model is unusable for this, but both encodec's 24KHz and DAC's 44KHz work
|
2024-05-12 13:22:08 -05:00 |
|
|
2437a86efa
|
ugh
|
2024-05-12 13:02:15 -05:00 |
|
|
4f1593c8db
|
a bunch of shit to salvage my old encodec-quantized audio because dac-encoded audio just does not want to converge
|
2024-05-12 10:17:29 -05:00 |
|
|
14709ac67f
|
ughh
|
2024-05-12 07:30:59 -05:00 |
|