|
9657c1d4ce
|
oops
|
2023-03-21 20:31:01 +00:00 |
|
|
0c2a9168f8
|
DLAS is PIPified (but I'm still cloning it as a submodule to make updating it easier)
|
2023-03-21 15:46:53 +00:00 |
|
|
34ef0467b9
|
VALL-E config edits
|
2023-03-20 01:22:53 +00:00 |
|
|
2e33bf071a
|
forgot to not require it to be relative
|
2023-03-19 22:05:33 +00:00 |
|
|
5cb86106ce
|
option to set results folder location
|
2023-03-19 22:03:41 +00:00 |
|
|
74510e8623
|
doing what I do best: sourcing other configs and banging until it works (it doesnt work)
|
2023-03-18 15:16:15 +00:00 |
|
|
da9b4b5fb5
|
tweaks
|
2023-03-18 15:14:22 +00:00 |
|
|
f44895978d
|
brain worms
|
2023-03-17 20:08:08 +00:00 |
|
|
b17260cddf
|
added japanese tokenizer (experimental)
|
2023-03-17 20:04:40 +00:00 |
|
|
f34cc382c5
|
yammed
|
2023-03-17 18:57:36 +00:00 |
|
|
96b7f9d2cc
|
yammed
|
2023-03-17 13:08:34 +00:00 |
|
|
249c6019af
|
cleanup, metrics are grabbed for vall-e trainer
|
2023-03-17 05:33:49 +00:00 |
|
|
1b72d0bba0
|
forgot to separate phonemes by spaces for [redacted]
|
2023-03-17 02:08:07 +00:00 |
|
|
d4c50967a6
|
cleaned up some prepare dataset code
|
2023-03-17 01:24:02 +00:00 |
|
|
0b62ccc112
|
setup bnb on windows as needed
|
2023-03-16 20:48:48 +00:00 |
|
|
c4edfb7d5e
|
unbump rocm5.4.2 because it does not work for me desu
|
2023-03-16 15:33:23 +00:00 |
|
|
520fbcd163
|
bumped torch up (CUDA: 11.8, ROCm, 5.4.2)
|
2023-03-16 15:09:11 +00:00 |
|
|
1a8c5de517
|
unk hunting
|
2023-03-16 14:59:12 +00:00 |
|
|
46ff3c476a
|
fixes v2
|
2023-03-16 14:41:40 +00:00 |
|
|
0408d44602
|
fixed reload tts being broken due to being as untouched as I am
|
2023-03-16 14:24:44 +00:00 |
|
|
aeb904a800
|
yammed
|
2023-03-16 14:23:47 +00:00 |
|
|
f9154c4db1
|
fixes
|
2023-03-16 14:19:56 +00:00 |
|
|
54f2fc792a
|
ops
|
2023-03-16 05:14:15 +00:00 |
|
|
0a7d6f02a7
|
ops
|
2023-03-16 04:54:17 +00:00 |
|
|
4ac43fa3a3
|
I forgot I undid the thing in DLAS
|
2023-03-16 04:51:35 +00:00 |
|
|
da4f92681e
|
oops
|
2023-03-16 04:35:12 +00:00 |
|
|
ee8270bdfb
|
preparations for training an IPA-based finetune
|
2023-03-16 04:25:33 +00:00 |
|
|
7b80f7a42f
|
fixed not cleaning up states while training (oops)
|
2023-03-15 02:48:05 +00:00 |
|
|
b31bf1206e
|
oops
|
2023-03-15 01:51:04 +00:00 |
|
|
d752a22331
|
print a warning if automatically deduced batch size returns 1
|
2023-03-15 01:20:15 +00:00 |
|
|
f6d34e1dd3
|
and maybe I should have actually tested with ./models/tokenizers/ made
|
2023-03-15 01:09:20 +00:00 |
|
|
5e4f6808ce
|
I guess I didn't test on a blank-ish slate
|
2023-03-15 00:54:27 +00:00 |
|
|
363d0b09b1
|
added options to pick tokenizer json and diffusion model (so I don't have to add it in later when I get bored and add in diffusion training)
|
2023-03-15 00:37:38 +00:00 |
|
|
07b684c4e7
|
removed redundant training data (they exist within tortoise itself anyways), added utility: view tokenized text
|
2023-03-14 21:51:27 +00:00 |
|
|
469dd47a44
|
fixes #131
|
2023-03-14 18:58:03 +00:00 |
|
|
84b7383428
|
fixes #134
|
2023-03-14 18:52:56 +00:00 |
|
|
4b952ea52a
|
fixes #132
|
2023-03-14 18:46:20 +00:00 |
|
|
fe03ae5839
|
fixes
|
2023-03-14 17:42:42 +00:00 |
|
|
9d2c7fb942
|
cleanup
|
2023-03-14 16:23:29 +00:00 |
|
|
65fe304267
|
fixed broken graph displaying
|
2023-03-14 16:04:56 +00:00 |
|
|
7b16b3e88a
|
;)
|
2023-03-14 15:48:09 +00:00 |
|
|
c85e32ff53
|
(:
|
2023-03-14 14:08:35 +00:00 |
|
|
54036fd780
|
:)
|
2023-03-14 05:02:14 +00:00 |
|
|
92a05d3c4c
|
added PYTHONUTF8 to start/train bats
|
2023-03-14 02:29:11 +00:00 |
|
|
dadb1fca6b
|
multichannel audio now report correct duration (surprised it took this long for me to source multichannel audio)
|
2023-03-13 21:24:51 +00:00 |
|
|
32d968a8cd
|
(disabled by default until i validate it working) added additional transcription text normalization (something else I'm experimenting with requires it)
|
2023-03-13 19:07:23 +00:00 |
|
|
66ac8ba766
|
added mel LR weight (as I finally understand when to adjust the text), added text validation on dataset creation
|
2023-03-13 18:51:53 +00:00 |
|
|
ee1b048d07
|
when creating the train/validatio datasets, use segments if the main audio's duration is too long, and slice to make the segments if they don't exist
|
2023-03-13 04:26:00 +00:00 |
|
|
0cf9db5e69
|
oops
|
2023-03-13 01:33:45 +00:00 |
|
|
050bcefd73
|
resample to 22.5K when creating training inputs (to avoid redundant downsampling when loaded for training, even though most of my inputs are already at 22.5K), generalized resampler function to cache and reuse them, do not unload whisper when done transcribing since it gets unloaded anyways for any other non-transcription task
|
2023-03-13 01:20:55 +00:00 |
|