|
faa8da12d7
|
modified logic to determine valid voice folders, also allows subdirs within the folder (for example: ./voices/SH/james/ will be named SH/james)
|
2023-04-13 21:10:38 +00:00 |
|
|
02beb1dd8e
|
should fix #203
|
2023-04-13 03:14:06 +00:00 |
|
|
8f3e9447ba
|
disable diarize button
|
2023-04-12 20:03:54 +00:00 |
|
|
d8b996911c
|
a bunch of shit i had uncommited over the past while pertaining to VALL-E
|
2023-04-12 20:02:46 +00:00 |
|
|
0440eac2bc
|
#185
|
2023-03-31 06:55:52 +00:00 |
|
|
9f64153a28
|
fixes #185
|
2023-03-31 06:03:56 +00:00 |
|
|
4744120be2
|
added VALL-E inference support (very rudimentary, gimped, but it will load a model trained on a config generated through the web UI)
|
2023-03-31 03:26:00 +00:00 |
|
|
9b01377667
|
only include auto in the list of models under setting, nothing else
|
2023-03-29 19:53:23 +00:00 |
|
|
f66281f10c
|
added mixing models (shamelessly inspired from voldy's web ui)
|
2023-03-29 19:29:13 +00:00 |
|
|
c89c648b4a
|
fixes #176
|
2023-03-26 11:05:50 +00:00 |
|
|
41d47c7c2a
|
for real this time show those new vall-e metrics
|
2023-03-26 04:31:50 +00:00 |
|
|
c4ca04cc92
|
added showing reported training accuracy and eval/validation metrics to graph
|
2023-03-26 04:08:45 +00:00 |
|
|
8c647c889d
|
now there should be feature parity between trainers
|
2023-03-25 04:12:03 +00:00 |
|
|
fd9b2e082c
|
x_lim and y_lim for graph
|
2023-03-25 02:34:14 +00:00 |
|
|
9856db5900
|
actually make parsing VALL-E metrics work
|
2023-03-23 15:42:51 +00:00 |
|
|
69d84bb9e0
|
I forget
|
2023-03-23 04:53:31 +00:00 |
|
|
444bcdaf62
|
my sanitizer actually did work, it was just batch sizes leading to problems when transcribing
|
2023-03-23 04:41:56 +00:00 |
|
|
a6daf289bc
|
when the sanitizer thingy works in testing but it doesn't outside of testing, and you have to retranscribe for the fourth time today
|
2023-03-23 02:37:44 +00:00 |
|
|
86589fff91
|
why does this keep happening to me
|
2023-03-23 01:55:16 +00:00 |
|
|
0ea93a7f40
|
more cleanup, use 24KHz for preparing for VALL-E (encodec will resample to 24Khz anyways, makes audio a little nicer), some other things
|
2023-03-23 01:52:26 +00:00 |
|
|
d2a9ab9e41
|
remove redundant phonemize for vall-e (oops), quantize all files and then phonemize all files for cope optimization, load alignment model once instead of for every transcription (speedup with whisperx)
|
2023-03-23 00:22:25 +00:00 |
|
|
19c0854e6a
|
do not write current whisper.json if there's no changes
|
2023-03-22 22:24:07 +00:00 |
|
|
932eaccdf5
|
added whisper transcription 'sanitizing' (collapse very short transcriptions to the previous segment) (I really have to stop having several copies spanning several machines for AIVC, I keep reverting shit)
|
2023-03-22 22:10:01 +00:00 |
|
|
736cdc8926
|
disable diarization for whisperx as it's just a useless performance hit (I don't have anything that's multispeaker within the same audio file at the moment)
|
2023-03-22 20:38:58 +00:00 |
|
|
aa5bdafb06
|
ugh
|
2023-03-22 20:26:28 +00:00 |
|
|
13605f980c
|
now whisperx should output json that aligns with what's expected
|
2023-03-22 20:01:30 +00:00 |
|
|
8877960062
|
fixes for whisperx batching
|
2023-03-22 19:53:42 +00:00 |
|
|
4056a27bcb
|
begrudgingly added back whisperx integration (VAD/Diarization testing, I really, really need accurate timestamps before dumping mondo amounts of time on training a dataset)
|
2023-03-22 19:24:53 +00:00 |
|
|
b8c3c4cfe2
|
Fixed #167
|
2023-03-22 18:21:37 +00:00 |
|
|
f822c87344
|
cleanups, realigning vall-e training
|
2023-03-22 17:47:23 +00:00 |
|
|
909325bb5a
|
ugh
|
2023-03-21 22:18:57 +00:00 |
|
|
5a5fd9ca87
|
Added option to unsqueeze sample batches after sampling
|
2023-03-21 21:34:26 +00:00 |
|
|
9657c1d4ce
|
oops
|
2023-03-21 20:31:01 +00:00 |
|
|
0c2a9168f8
|
DLAS is PIPified (but I'm still cloning it as a submodule to make updating it easier)
|
2023-03-21 15:46:53 +00:00 |
|
|
34ef0467b9
|
VALL-E config edits
|
2023-03-20 01:22:53 +00:00 |
|
|
2e33bf071a
|
forgot to not require it to be relative
|
2023-03-19 22:05:33 +00:00 |
|
|
5cb86106ce
|
option to set results folder location
|
2023-03-19 22:03:41 +00:00 |
|
|
da9b4b5fb5
|
tweaks
|
2023-03-18 15:14:22 +00:00 |
|
|
f44895978d
|
brain worms
|
2023-03-17 20:08:08 +00:00 |
|
|
f34cc382c5
|
yammed
|
2023-03-17 18:57:36 +00:00 |
|
|
96b7f9d2cc
|
yammed
|
2023-03-17 13:08:34 +00:00 |
|
|
249c6019af
|
cleanup, metrics are grabbed for vall-e trainer
|
2023-03-17 05:33:49 +00:00 |
|
|
1b72d0bba0
|
forgot to separate phonemes by spaces for [redacted]
|
2023-03-17 02:08:07 +00:00 |
|
|
d4c50967a6
|
cleaned up some prepare dataset code
|
2023-03-17 01:24:02 +00:00 |
|
|
0b62ccc112
|
setup bnb on windows as needed
|
2023-03-16 20:48:48 +00:00 |
|
|
1a8c5de517
|
unk hunting
|
2023-03-16 14:59:12 +00:00 |
|
|
46ff3c476a
|
fixes v2
|
2023-03-16 14:41:40 +00:00 |
|
|
0408d44602
|
fixed reload tts being broken due to being as untouched as I am
|
2023-03-16 14:24:44 +00:00 |
|
|
aeb904a800
|
yammed
|
2023-03-16 14:23:47 +00:00 |
|
|
f9154c4db1
|
fixes
|
2023-03-16 14:19:56 +00:00 |
|
|
54f2fc792a
|
ops
|
2023-03-16 05:14:15 +00:00 |
|
|
0a7d6f02a7
|
ops
|
2023-03-16 04:54:17 +00:00 |
|
|
4ac43fa3a3
|
I forgot I undid the thing in DLAS
|
2023-03-16 04:51:35 +00:00 |
|
|
da4f92681e
|
oops
|
2023-03-16 04:35:12 +00:00 |
|
|
ee8270bdfb
|
preparations for training an IPA-based finetune
|
2023-03-16 04:25:33 +00:00 |
|
|
7b80f7a42f
|
fixed not cleaning up states while training (oops)
|
2023-03-15 02:48:05 +00:00 |
|
|
b31bf1206e
|
oops
|
2023-03-15 01:51:04 +00:00 |
|
|
d752a22331
|
print a warning if automatically deduced batch size returns 1
|
2023-03-15 01:20:15 +00:00 |
|
|
f6d34e1dd3
|
and maybe I should have actually tested with ./models/tokenizers/ made
|
2023-03-15 01:09:20 +00:00 |
|
|
5e4f6808ce
|
I guess I didn't test on a blank-ish slate
|
2023-03-15 00:54:27 +00:00 |
|
|
363d0b09b1
|
added options to pick tokenizer json and diffusion model (so I don't have to add it in later when I get bored and add in diffusion training)
|
2023-03-15 00:37:38 +00:00 |
|
|
07b684c4e7
|
removed redundant training data (they exist within tortoise itself anyways), added utility: view tokenized text
|
2023-03-14 21:51:27 +00:00 |
|
|
469dd47a44
|
fixes #131
|
2023-03-14 18:58:03 +00:00 |
|
|
84b7383428
|
fixes #134
|
2023-03-14 18:52:56 +00:00 |
|
|
4b952ea52a
|
fixes #132
|
2023-03-14 18:46:20 +00:00 |
|
|
fe03ae5839
|
fixes
|
2023-03-14 17:42:42 +00:00 |
|
|
9d2c7fb942
|
cleanup
|
2023-03-14 16:23:29 +00:00 |
|
|
65fe304267
|
fixed broken graph displaying
|
2023-03-14 16:04:56 +00:00 |
|
|
7b16b3e88a
|
;)
|
2023-03-14 15:48:09 +00:00 |
|
|
54036fd780
|
:)
|
2023-03-14 05:02:14 +00:00 |
|
|
92a05d3c4c
|
added PYTHONUTF8 to start/train bats
|
2023-03-14 02:29:11 +00:00 |
|
|
dadb1fca6b
|
multichannel audio now report correct duration (surprised it took this long for me to source multichannel audio)
|
2023-03-13 21:24:51 +00:00 |
|
|
32d968a8cd
|
(disabled by default until i validate it working) added additional transcription text normalization (something else I'm experimenting with requires it)
|
2023-03-13 19:07:23 +00:00 |
|
|
66ac8ba766
|
added mel LR weight (as I finally understand when to adjust the text), added text validation on dataset creation
|
2023-03-13 18:51:53 +00:00 |
|
|
ee1b048d07
|
when creating the train/validatio datasets, use segments if the main audio's duration is too long, and slice to make the segments if they don't exist
|
2023-03-13 04:26:00 +00:00 |
|
|
0cf9db5e69
|
oops
|
2023-03-13 01:33:45 +00:00 |
|
|
050bcefd73
|
resample to 22.5K when creating training inputs (to avoid redundant downsampling when loaded for training, even though most of my inputs are already at 22.5K), generalized resampler function to cache and reuse them, do not unload whisper when done transcribing since it gets unloaded anyways for any other non-transcription task
|
2023-03-13 01:20:55 +00:00 |
|
|
7c9c0dc584
|
forgot to clean up debug prints
|
2023-03-13 00:44:37 +00:00 |
|
|
239c984850
|
move validating audio to creating the text files instead, consider audio longer than 11 seconds invalid, consider text lengths over 200 invalid
|
2023-03-12 23:39:00 +00:00 |
|
|
51ddc205cd
|
update submodules
|
2023-03-12 18:14:36 +00:00 |
|
|
ccbf2e6aff
|
blame #122
|
2023-03-12 17:51:52 +00:00 |
|
|
9238df0b03
|
fixed last generation settings not actually load because brain worms
|
2023-03-12 15:49:50 +00:00 |
|
|
9594a960b0
|
Disable loss ETA for now until I fix it
|
2023-03-12 15:39:54 +00:00 |
|
mrq
|
be8b290a1a
|
Merge branch 'master' into save_more_user_config
|
2023-03-12 15:38:08 +00:00 |
|
|
296129ba9c
|
output fixes, I'm not sure why ETA wasn't working but it works in testing
|
2023-03-12 15:17:07 +00:00 |
|
|
098d7ad635
|
uh I don't remember, small things
|
2023-03-12 14:47:48 +00:00 |
|
|
233baa4e45
|
updated several default configurations to not cause null/empty errors. also default samples/iterations to 16-30 ultra fast which is typically suggested.
|
2023-03-12 16:08:02 +02:00 |
|
|
29b3d1ae1d
|
Fixed Keep X Previous States
|
2023-03-12 08:01:08 +02:00 |
|
|
9e320a34c8
|
Fixed Keep X Previous States
|
2023-03-12 08:00:03 +02:00 |
|
|
61500107ab
|
Catch OOM and run whisper on cpu automatically.
|
2023-03-12 06:48:28 +02:00 |
|
|
ede9804b76
|
added option to trim silence using torchaudio's VAD
|
2023-03-11 21:41:35 +00:00 |
|
|
dea2fa9caf
|
added fields to offset start/end slices to apply in bulk when slicing
|
2023-03-11 21:34:29 +00:00 |
|
|
89bb3d4419
|
rename transcribe button since it does more than transcribe
|
2023-03-11 21:18:04 +00:00 |
|
|
382a3e4104
|
rely on the whisper.json for handling a lot more things
|
2023-03-11 21:17:11 +00:00 |
|
|
9b376c381f
|
brain worm
|
2023-03-11 18:14:32 +00:00 |
|
|
94551fb9ac
|
split slicing dataset routine so it can be done after the fact
|
2023-03-11 17:27:01 +00:00 |
|
|
e3fdb79b49
|
rocm5.2 works for me desu so I bumped it back up
|
2023-03-11 17:02:56 +00:00 |
|
|
cf41492f76
|
fall back to normal behavior if theres actually no audiofiles loaded from the dataset when using it for computing latents
|
2023-03-11 16:46:03 +00:00 |
|
|
b90c164778
|
Farewell, parasite
|
2023-03-11 16:40:34 +00:00 |
|
|
2424c455cb
|
added option to not slice audio when transcribing, added option to prepare validation dataset on audio duration, added a warning if youre using whisperx and you're slicing audio
|
2023-03-11 16:32:35 +00:00 |
|