|
b17260cddf
|
added japanese tokenizer (experimental)
|
2023-03-17 20:04:40 +00:00 |
|
|
f34cc382c5
|
yammed
|
2023-03-17 18:57:36 +00:00 |
|
|
96b7f9d2cc
|
yammed
|
2023-03-17 13:08:34 +00:00 |
|
|
249c6019af
|
cleanup, metrics are grabbed for vall-e trainer
|
2023-03-17 05:33:49 +00:00 |
|
|
1b72d0bba0
|
forgot to separate phonemes by spaces for [redacted]
|
2023-03-17 02:08:07 +00:00 |
|
|
d4c50967a6
|
cleaned up some prepare dataset code
|
2023-03-17 01:24:02 +00:00 |
|
|
0b62ccc112
|
setup bnb on windows as needed
|
2023-03-16 20:48:48 +00:00 |
|
|
c4edfb7d5e
|
unbump rocm5.4.2 because it does not work for me desu
|
2023-03-16 15:33:23 +00:00 |
|
|
520fbcd163
|
bumped torch up (CUDA: 11.8, ROCm, 5.4.2)
|
2023-03-16 15:09:11 +00:00 |
|
|
1a8c5de517
|
unk hunting
|
2023-03-16 14:59:12 +00:00 |
|
|
46ff3c476a
|
fixes v2
|
2023-03-16 14:41:40 +00:00 |
|
|
0408d44602
|
fixed reload tts being broken due to being as untouched as I am
|
2023-03-16 14:24:44 +00:00 |
|
|
aeb904a800
|
yammed
|
2023-03-16 14:23:47 +00:00 |
|
|
f9154c4db1
|
fixes
|
2023-03-16 14:19:56 +00:00 |
|
|
54f2fc792a
|
ops
|
2023-03-16 05:14:15 +00:00 |
|
|
0a7d6f02a7
|
ops
|
2023-03-16 04:54:17 +00:00 |
|
|
4ac43fa3a3
|
I forgot I undid the thing in DLAS
|
2023-03-16 04:51:35 +00:00 |
|
|
da4f92681e
|
oops
|
2023-03-16 04:35:12 +00:00 |
|
|
ee8270bdfb
|
preparations for training an IPA-based finetune
|
2023-03-16 04:25:33 +00:00 |
|
|
7b80f7a42f
|
fixed not cleaning up states while training (oops)
|
2023-03-15 02:48:05 +00:00 |
|
|
b31bf1206e
|
oops
|
2023-03-15 01:51:04 +00:00 |
|
|
d752a22331
|
print a warning if automatically deduced batch size returns 1
|
2023-03-15 01:20:15 +00:00 |
|
|
f6d34e1dd3
|
and maybe I should have actually tested with ./models/tokenizers/ made
|
2023-03-15 01:09:20 +00:00 |
|
|
5e4f6808ce
|
I guess I didn't test on a blank-ish slate
|
2023-03-15 00:54:27 +00:00 |
|
|
363d0b09b1
|
added options to pick tokenizer json and diffusion model (so I don't have to add it in later when I get bored and add in diffusion training)
|
2023-03-15 00:37:38 +00:00 |
|
|
07b684c4e7
|
removed redundant training data (they exist within tortoise itself anyways), added utility: view tokenized text
|
2023-03-14 21:51:27 +00:00 |
|
|
469dd47a44
|
fixes #131
|
2023-03-14 18:58:03 +00:00 |
|
|
84b7383428
|
fixes #134
|
2023-03-14 18:52:56 +00:00 |
|
|
4b952ea52a
|
fixes #132
|
2023-03-14 18:46:20 +00:00 |
|
|
fe03ae5839
|
fixes
|
2023-03-14 17:42:42 +00:00 |
|
|
9d2c7fb942
|
cleanup
|
2023-03-14 16:23:29 +00:00 |
|
|
65fe304267
|
fixed broken graph displaying
|
2023-03-14 16:04:56 +00:00 |
|
|
7b16b3e88a
|
;)
|
2023-03-14 15:48:09 +00:00 |
|
|
c85e32ff53
|
(:
|
2023-03-14 14:08:35 +00:00 |
|
|
54036fd780
|
:)
|
2023-03-14 05:02:14 +00:00 |
|
|
92a05d3c4c
|
added PYTHONUTF8 to start/train bats
|
2023-03-14 02:29:11 +00:00 |
|
|
dadb1fca6b
|
multichannel audio now report correct duration (surprised it took this long for me to source multichannel audio)
|
2023-03-13 21:24:51 +00:00 |
|
|
32d968a8cd
|
(disabled by default until i validate it working) added additional transcription text normalization (something else I'm experimenting with requires it)
|
2023-03-13 19:07:23 +00:00 |
|
|
66ac8ba766
|
added mel LR weight (as I finally understand when to adjust the text), added text validation on dataset creation
|
2023-03-13 18:51:53 +00:00 |
|
|
ee1b048d07
|
when creating the train/validatio datasets, use segments if the main audio's duration is too long, and slice to make the segments if they don't exist
|
2023-03-13 04:26:00 +00:00 |
|
|
0cf9db5e69
|
oops
|
2023-03-13 01:33:45 +00:00 |
|
|
050bcefd73
|
resample to 22.5K when creating training inputs (to avoid redundant downsampling when loaded for training, even though most of my inputs are already at 22.5K), generalized resampler function to cache and reuse them, do not unload whisper when done transcribing since it gets unloaded anyways for any other non-transcription task
|
2023-03-13 01:20:55 +00:00 |
|
|
7c9c0dc584
|
forgot to clean up debug prints
|
2023-03-13 00:44:37 +00:00 |
|
|
239c984850
|
move validating audio to creating the text files instead, consider audio longer than 11 seconds invalid, consider text lengths over 200 invalid
|
2023-03-12 23:39:00 +00:00 |
|
|
51ddc205cd
|
update submodules
|
2023-03-12 18:14:36 +00:00 |
|
|
ccbf2e6aff
|
blame mrq/ai-voice-cloning#122
|
2023-03-12 17:51:52 +00:00 |
|
|
9238df0b03
|
fixed last generation settings not actually load because brain worms
|
2023-03-12 15:49:50 +00:00 |
|
|
9594a960b0
|
Disable loss ETA for now until I fix it
|
2023-03-12 15:39:54 +00:00 |
|
mrq
|
51f6c347fe
|
Merge pull request 'updated several default configurations to not cause null/empty errors. also default samples/iterations to 16-30 ultra fast which is typically suggested.' (#122) from zim33/ai-voice-cloning:save_more_user_config into master
Reviewed-on: mrq/ai-voice-cloning#122
|
2023-03-12 15:38:34 +00:00 |
|
mrq
|
be8b290a1a
|
Merge branch 'master' into save_more_user_config
|
2023-03-12 15:38:08 +00:00 |
|