|
59f56ad099
|
cleaup
|
2024-12-24 23:14:32 -06:00 |
|
|
82e8592f2a
|
working vall_e.cpp
|
2024-12-24 17:54:48 -06:00 |
|
|
497bdfc67b
|
more work (the wall is non-causal decoding......)
|
2024-12-22 20:11:31 -06:00 |
|
|
353e478e68
|
agony
|
2024-12-21 22:52:10 -06:00 |
|
|
8838babcba
|
sanity checks (and I realized that the model actually had langs set to 4 in the yaml for KO/ZH so................
|
2024-12-19 19:08:57 -06:00 |
|
|
9f2bd7f6e4
|
ugh
|
2024-12-17 23:17:12 -06:00 |
|
|
9090c34f10
|
cringe script to process seed-tts-eval's eval dataset into something i can easily use
|
2024-12-17 22:47:12 -06:00 |
|
|
fc5e6d8599
|
fixes to process_emilia.py script
|
2024-12-09 14:38:09 -06:00 |
|
|
fe241f6a99
|
support for wildcard in training/validation/noise dataset array (to-do: a better way to query between metadata folder and data folder)
|
2024-09-18 21:34:43 -05:00 |
|
|
b5bec0c9ce
|
oops, turns out these are not split by speaker names already........ (also added sampling the dataset in the webui for easy viewing)
|
2024-09-18 20:19:46 -05:00 |
|
|
56f25f7a9b
|
more stuff for similar-speaker prompt sampling (to-do: actually test if this works...)
|
2024-09-16 23:10:29 -05:00 |
|
|
17487ad70a
|
weird quirk in process_emilia.py where language gets mutated, somehow (I hate python)
|
2024-09-10 14:00:27 -05:00 |
|
|
d059f6f56d
|
added helper script to process Emilia (amphion/Emilia-Dataset), clean up espeak phonemes for non-English transcriptions with English words (because for some reason espeak injects (en){word}(lang) markers and it's annoying)
|
2024-09-09 09:57:32 -05:00 |
|
|
9710b06b74
|
tweaks and things
|
2024-08-06 08:17:25 -05:00 |
|
|
8bac8fe902
|
oops
|
2024-08-05 20:38:29 -05:00 |
|
|
134dac8c2b
|
re-adapted process_libritts.py to a 'better' way (better because it processed without needing to shuffle a bunch of things and adapt to cope or something)
|
2024-08-05 20:34:58 -05:00 |
|
|
597441e48b
|
moved transcribe and process dataset scripts to vall_e/emb within the module itself, argparse-ified transcription script
|
2024-08-05 19:40:50 -05:00 |
|
|
7cdfa3dc0c
|
updated process_datasets.py, added argparsing so I can mostly stop manually editing things, and some other cleanup
|
2024-08-05 15:59:25 -05:00 |
|
|
d19f93a2c0
|
documentation update
|
2024-08-04 00:14:49 -05:00 |
|
|
11fa3da665
|
some cleanup, fixed the wrapper attention to explicitly use other sdpa backends
|
2024-08-03 19:51:00 -05:00 |
|
|
9564ecda43
|
wrapper attention class for other sdpa backends + xformers seems to have broke...
|
2024-08-03 15:12:11 -05:00 |
|
|
ad024f400f
|
actually pass language into dataset process script, fix coercing japanese into hiragana because espeak does not like kanji
|
2024-07-21 23:21:37 -05:00 |
|
|
7b210d9738
|
sanity cleanup
|
2024-07-04 15:58:08 -05:00 |
|
|
db62e55a38
|
oops, I forgot to use the new thing for audio_backend
|
2024-07-04 14:54:11 -05:00 |
|
|
7feeb944a0
|
probably insane with even entertaining going this route
|
2024-06-03 20:26:27 -05:00 |
|
|
ddbacde0d1
|
DAC just doesn't work well enough......
|
2024-05-25 11:07:52 -05:00 |
|
|
74e531d391
|
ugh
|
2024-05-18 12:02:56 -05:00 |
|
|
59ef9461f8
|
ugh
|
2024-05-18 10:13:58 -05:00 |
|
|
d9aabfa3ae
|
final tweaks, hopefully, again
|
2024-05-15 23:04:19 -05:00 |
|
|
2437a86efa
|
ugh
|
2024-05-12 13:02:15 -05:00 |
|
|
4f1593c8db
|
a bunch of shit to salvage my old encodec-quantized audio because dac-encoded audio just does not want to converge
|
2024-05-12 10:17:29 -05:00 |
|
|
c6e0f905b5
|
final tweaks (again) before training restarts
|
2024-05-08 02:11:38 -05:00 |
|
|
8aa1b2dabf
|
documentation update
|
2024-05-04 21:03:46 -05:00 |
|
|
caad7ee3c9
|
final tweaks, hopefully
|
2024-04-28 22:28:29 -05:00 |
|
|
ffc334cf58
|
added dataset transcription helper script (now I don't ever have to touch ai-voice-cloning) (to-do: unify scripts into the module)
|
2024-04-21 17:43:20 -05:00 |
|
|
071fb97777
|
dataset preparation script updates, caved and am using HF tokenizer now
|
2024-04-21 14:49:18 -05:00 |
|
|
a8ffa88844
|
it slipped my mind that technically DAC can be used at any sample rate, since it models waveforms; make it a config YAML option to allow this behavior
|
2024-04-19 18:36:54 -05:00 |
|
|
00804a47e9
|
Forgot to copy intermediary dataset conversion script
|
2024-04-18 21:34:28 -05:00 |
|
|
4f5c9e518a
|
actually use the passed-through sample rate from encode for DAC because it does its own resampling I guess
|
2024-04-18 13:32:41 -05:00 |
|
|
09cda7d3f9
|
added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup
|
2023-10-16 19:30:38 -05:00 |
|
|
2deb995cc9
|
updated setup script
|
2023-10-06 20:08:28 -05:00 |
|
|
1fd91b6437
|
cleanup
|
2023-10-06 10:13:54 -05:00 |
|
|
3db7e7dea1
|
implicitly load checkpoint if deepspeed checkpoint not found, updated setup script to grab the diskcached dataloader things
|
2023-10-06 10:02:45 -05:00 |
|
|
2f2505b12f
|
updated setup script
|
2023-10-06 08:08:28 -05:00 |
|
|
153f8b293c
|
added min-x and min-y arguments to plot.py, helper script to download from my existing checkpoint
|
2023-10-04 19:41:37 -05:00 |
|
|
5ac119a6e7
|
added light web UI (need to port the telemetry disabling bandaids from aivc)
|
2023-09-09 16:17:20 -05:00 |
|
|
4613781e23
|
integrated plot script, added tts-c task token to help the model be able to mix between normal VALL-E and VALL-E continuous
|
2023-09-02 16:29:53 -05:00 |
|
|
f7e942ec99
|
modified plotting script to be more agnostic to X
|
2023-09-02 13:59:43 -05:00 |
|
|
21e5d250cc
|
fixed up plot script that I forgot about
|
2023-09-02 13:31:04 -05:00 |
|
|
5c8694db8e
|
nasty bandaid if there's no validation dataset specified during training (for example, during finetunes)
|
2023-08-30 18:23:05 -05:00 |
|