|
ee8270bdfb
|
preparations for training an IPA-based finetune
|
2023-03-16 04:25:33 +00:00 |
|
|
7b80f7a42f
|
fixed not cleaning up states while training (oops)
|
2023-03-15 02:48:05 +00:00 |
|
|
b31bf1206e
|
oops
|
2023-03-15 01:51:04 +00:00 |
|
|
d752a22331
|
print a warning if automatically deduced batch size returns 1
|
2023-03-15 01:20:15 +00:00 |
|
|
f6d34e1dd3
|
and maybe I should have actually tested with ./models/tokenizers/ made
|
2023-03-15 01:09:20 +00:00 |
|
|
5e4f6808ce
|
I guess I didn't test on a blank-ish slate
|
2023-03-15 00:54:27 +00:00 |
|
|
363d0b09b1
|
added options to pick tokenizer json and diffusion model (so I don't have to add it in later when I get bored and add in diffusion training)
|
2023-03-15 00:37:38 +00:00 |
|
|
07b684c4e7
|
removed redundant training data (they exist within tortoise itself anyways), added utility: view tokenized text
|
2023-03-14 21:51:27 +00:00 |
|
|
469dd47a44
|
fixes #131
|
2023-03-14 18:58:03 +00:00 |
|
|
84b7383428
|
fixes #134
|
2023-03-14 18:52:56 +00:00 |
|
|
4b952ea52a
|
fixes #132
|
2023-03-14 18:46:20 +00:00 |
|
|
fe03ae5839
|
fixes
|
2023-03-14 17:42:42 +00:00 |
|
|
9d2c7fb942
|
cleanup
|
2023-03-14 16:23:29 +00:00 |
|
|
65fe304267
|
fixed broken graph displaying
|
2023-03-14 16:04:56 +00:00 |
|
|
7b16b3e88a
|
;)
|
2023-03-14 15:48:09 +00:00 |
|
|
c85e32ff53
|
(:
|
2023-03-14 14:08:35 +00:00 |
|
|
54036fd780
|
:)
|
2023-03-14 05:02:14 +00:00 |
|
|
92a05d3c4c
|
added PYTHONUTF8 to start/train bats
|
2023-03-14 02:29:11 +00:00 |
|
|
dadb1fca6b
|
multichannel audio now report correct duration (surprised it took this long for me to source multichannel audio)
|
2023-03-13 21:24:51 +00:00 |
|
|
32d968a8cd
|
(disabled by default until i validate it working) added additional transcription text normalization (something else I'm experimenting with requires it)
|
2023-03-13 19:07:23 +00:00 |
|
|
66ac8ba766
|
added mel LR weight (as I finally understand when to adjust the text), added text validation on dataset creation
|
2023-03-13 18:51:53 +00:00 |
|
|
ee1b048d07
|
when creating the train/validatio datasets, use segments if the main audio's duration is too long, and slice to make the segments if they don't exist
|
2023-03-13 04:26:00 +00:00 |
|
|
0cf9db5e69
|
oops
|
2023-03-13 01:33:45 +00:00 |
|
|
050bcefd73
|
resample to 22.5K when creating training inputs (to avoid redundant downsampling when loaded for training, even though most of my inputs are already at 22.5K), generalized resampler function to cache and reuse them, do not unload whisper when done transcribing since it gets unloaded anyways for any other non-transcription task
|
2023-03-13 01:20:55 +00:00 |
|
|
7c9c0dc584
|
forgot to clean up debug prints
|
2023-03-13 00:44:37 +00:00 |
|
|
239c984850
|
move validating audio to creating the text files instead, consider audio longer than 11 seconds invalid, consider text lengths over 200 invalid
|
2023-03-12 23:39:00 +00:00 |
|
|
51ddc205cd
|
update submodules
|
2023-03-12 18:14:36 +00:00 |
|
|
ccbf2e6aff
|
blame mrq/ai-voice-cloning#122
|
2023-03-12 17:51:52 +00:00 |
|
|
9238df0b03
|
fixed last generation settings not actually load because brain worms
|
2023-03-12 15:49:50 +00:00 |
|
|
9594a960b0
|
Disable loss ETA for now until I fix it
|
2023-03-12 15:39:54 +00:00 |
|
mrq
|
51f6c347fe
|
Merge pull request 'updated several default configurations to not cause null/empty errors. also default samples/iterations to 16-30 ultra fast which is typically suggested.' (#122) from zim33/ai-voice-cloning:save_more_user_config into master
Reviewed-on: mrq/ai-voice-cloning#122
|
2023-03-12 15:38:34 +00:00 |
|
mrq
|
be8b290a1a
|
Merge branch 'master' into save_more_user_config
|
2023-03-12 15:38:08 +00:00 |
|
|
296129ba9c
|
output fixes, I'm not sure why ETA wasn't working but it works in testing
|
2023-03-12 15:17:07 +00:00 |
|
|
098d7ad635
|
uh I don't remember, small things
|
2023-03-12 14:47:48 +00:00 |
|
|
233baa4e45
|
updated several default configurations to not cause null/empty errors. also default samples/iterations to 16-30 ultra fast which is typically suggested.
|
2023-03-12 16:08:02 +02:00 |
|
mrq
|
1ac278e885
|
Merge pull request 'keep_training' (#118) from zim33/ai-voice-cloning:keep_training into master
Reviewed-on: mrq/ai-voice-cloning#118
|
2023-03-12 06:47:01 +00:00 |
|
|
29b3d1ae1d
|
Fixed Keep X Previous States
|
2023-03-12 08:01:08 +02:00 |
|
|
9e320a34c8
|
Fixed Keep X Previous States
|
2023-03-12 08:00:03 +02:00 |
|
mrq
|
8ed09f9b87
|
Merge pull request 'Catch OOM and run whisper on cpu automatically.' (#117) from zim33/ai-voice-cloning:vram into master
Reviewed-on: mrq/ai-voice-cloning#117
|
2023-03-12 05:09:53 +00:00 |
|
|
61500107ab
|
Catch OOM and run whisper on cpu automatically.
|
2023-03-12 06:48:28 +02:00 |
|
|
ede9804b76
|
added option to trim silence using torchaudio's VAD
|
2023-03-11 21:41:35 +00:00 |
|
|
dea2fa9caf
|
added fields to offset start/end slices to apply in bulk when slicing
|
2023-03-11 21:34:29 +00:00 |
|
|
89bb3d4419
|
rename transcribe button since it does more than transcribe
|
2023-03-11 21:18:04 +00:00 |
|
|
382a3e4104
|
rely on the whisper.json for handling a lot more things
|
2023-03-11 21:17:11 +00:00 |
|
|
9b376c381f
|
brain worm
|
2023-03-11 18:14:32 +00:00 |
|
|
94551fb9ac
|
split slicing dataset routine so it can be done after the fact
|
2023-03-11 17:27:01 +00:00 |
|
|
e3fdb79b49
|
rocm5.2 works for me desu so I bumped it back up
|
2023-03-11 17:02:56 +00:00 |
|
|
e680d84a13
|
removed the hotfix pip installs that whisperx requires now that whisperx is gone
|
2023-03-11 16:55:19 +00:00 |
|
|
cf41492f76
|
fall back to normal behavior if theres actually no audiofiles loaded from the dataset when using it for computing latents
|
2023-03-11 16:46:03 +00:00 |
|
|
b90c164778
|
Farewell, parasite
|
2023-03-11 16:40:34 +00:00 |
|