ai-voice-cloning

Jarod/ai-voice-cloning

Fork 0

forked from mrq/ai-voice-cloning

909325bb5a ugh mrq 2023-03-21 22:18:57 +0000
5a5fd9ca87 Added option to unsqueeze sample batches after sampling mrq 2023-03-21 21:34:26 +0000
9657c1d4ce oops mrq 2023-03-21 20:31:01 +0000
0c2a9168f8 DLAS is PIPified (but I'm still cloning it as a submodule to make updating it easier) mrq 2023-03-21 15:46:53 +0000
34ef0467b9 VALL-E config edits mrq 2023-03-20 01:22:53 +0000
2e33bf071a forgot to not require it to be relative mrq 2023-03-19 22:05:33 +0000
5cb86106ce option to set results folder location mrq 2023-03-19 22:03:41 +0000
74510e8623 doing what I do best: sourcing other configs and banging until it works (it doesnt work) mrq 2023-03-18 15:16:15 +0000
da9b4b5fb5 tweaks mrq 2023-03-18 15:14:22 +0000
f44895978d brain worms mrq 2023-03-17 20:08:08 +0000
b17260cddf added japanese tokenizer (experimental) mrq 2023-03-17 20:04:40 +0000
f34cc382c5 yammed mrq 2023-03-17 18:57:36 +0000
96b7f9d2cc yammed mrq 2023-03-17 13:08:34 +0000
249c6019af cleanup, metrics are grabbed for vall-e trainer mrq 2023-03-17 05:33:49 +0000
1b72d0bba0 forgot to separate phonemes by spaces for [redacted] mrq 2023-03-17 02:08:07 +0000
d4c50967a6 cleaned up some prepare dataset code mrq 2023-03-17 01:24:02 +0000
0b62ccc112 setup bnb on windows as needed mrq 2023-03-16 20:48:48 +0000
c4edfb7d5e unbump rocm5.4.2 because it does not work for me desu mrq 2023-03-16 15:33:23 +0000
520fbcd163 bumped torch up (CUDA: 11.8, ROCm, 5.4.2) mrq 2023-03-16 15:09:11 +0000
1a8c5de517 unk hunting mrq 2023-03-16 14:59:12 +0000
46ff3c476a fixes v2 mrq 2023-03-16 14:41:40 +0000
0408d44602 fixed reload tts being broken due to being as untouched as I am mrq 2023-03-16 14:24:44 +0000
aeb904a800 yammed mrq 2023-03-16 14:23:47 +0000
f9154c4db1 fixes mrq 2023-03-16 14:19:56 +0000
54f2fc792a ops mrq 2023-03-16 05:14:15 +0000
0a7d6f02a7 ops mrq 2023-03-16 04:54:17 +0000
4ac43fa3a3 I forgot I undid the thing in DLAS mrq 2023-03-16 04:51:35 +0000
da4f92681e oops mrq 2023-03-16 04:35:12 +0000
ee8270bdfb preparations for training an IPA-based finetune mrq 2023-03-16 04:25:33 +0000
7b80f7a42f fixed not cleaning up states while training (oops) mrq 2023-03-15 02:48:05 +0000
b31bf1206e oops mrq 2023-03-15 01:51:04 +0000
d752a22331 print a warning if automatically deduced batch size returns 1 mrq 2023-03-15 01:20:15 +0000
f6d34e1dd3 and maybe I should have actually tested with ./models/tokenizers/ made mrq 2023-03-15 01:09:20 +0000
5e4f6808ce I guess I didn't test on a blank-ish slate mrq 2023-03-15 00:54:27 +0000
363d0b09b1 added options to pick tokenizer json and diffusion model (so I don't have to add it in later when I get bored and add in diffusion training) mrq 2023-03-15 00:37:38 +0000
07b684c4e7 removed redundant training data (they exist within tortoise itself anyways), added utility: view tokenized text mrq 2023-03-14 21:51:27 +0000
469dd47a44 fixes #131 mrq 2023-03-14 18:58:03 +0000
84b7383428 fixes #134 mrq 2023-03-14 18:52:56 +0000
4b952ea52a fixes #132 mrq 2023-03-14 18:46:20 +0000
fe03ae5839 fixes mrq 2023-03-14 17:42:42 +0000
9d2c7fb942 cleanup mrq 2023-03-14 16:23:29 +0000
65fe304267 fixed broken graph displaying mrq 2023-03-14 16:04:56 +0000
7b16b3e88a ;) mrq 2023-03-14 15:48:09 +0000
c85e32ff53 (: mrq 2023-03-14 14:08:35 +0000
54036fd780 :) mrq 2023-03-14 05:02:14 +0000
92a05d3c4c added PYTHONUTF8 to start/train bats mrq 2023-03-14 02:29:11 +0000
dadb1fca6b multichannel audio now report correct duration (surprised it took this long for me to source multichannel audio) mrq 2023-03-13 21:24:51 +0000
32d968a8cd (disabled by default until i validate it working) added additional transcription text normalization (something else I'm experimenting with requires it) mrq 2023-03-13 19:07:23 +0000
66ac8ba766 added mel LR weight (as I finally understand when to adjust the text), added text validation on dataset creation mrq 2023-03-13 18:51:53 +0000
ee1b048d07 when creating the train/validatio datasets, use segments if the main audio's duration is too long, and slice to make the segments if they don't exist mrq 2023-03-13 04:26:00 +0000
0cf9db5e69 oops mrq 2023-03-13 01:33:45 +0000
050bcefd73 resample to 22.5K when creating training inputs (to avoid redundant downsampling when loaded for training, even though most of my inputs are already at 22.5K), generalized resampler function to cache and reuse them, do not unload whisper when done transcribing since it gets unloaded anyways for any other non-transcription task mrq 2023-03-13 01:20:55 +0000
7c9c0dc584 forgot to clean up debug prints mrq 2023-03-13 00:44:37 +0000
239c984850 move validating audio to creating the text files instead, consider audio longer than 11 seconds invalid, consider text lengths over 200 invalid mrq 2023-03-12 23:39:00 +0000
51ddc205cd update submodules mrq 2023-03-12 18:14:36 +0000
ccbf2e6aff blame mrq/ai-voice-cloning#122 mrq 2023-03-12 17:51:52 +0000
9238df0b03 fixed last generation settings not actually load because brain worms mrq 2023-03-12 15:49:50 +0000
9594a960b0 Disable loss ETA for now until I fix it mrq 2023-03-12 15:39:54 +0000
51f6c347fe Merge pull request 'updated several default configurations to not cause null/empty errors. also default samples/iterations to 16-30 ultra fast which is typically suggested.' (#122) from zim33/ai-voice-cloning:save_more_user_config into master mrq 2023-03-12 15:38:34 +0000
be8b290a1a Merge branch 'master' into save_more_user_config mrq 2023-03-12 15:38:08 +0000
296129ba9c output fixes, I'm not sure why ETA wasn't working but it works in testing mrq 2023-03-12 15:17:07 +0000
098d7ad635 uh I don't remember, small things mrq 2023-03-12 14:47:48 +0000
233baa4e45 updated several default configurations to not cause null/empty errors. also default samples/iterations to 16-30 ultra fast which is typically suggested. tigi6346 2023-03-12 16:08:02 +0200
1ac278e885 Merge pull request 'keep_training' (#118) from zim33/ai-voice-cloning:keep_training into master mrq 2023-03-12 06:47:01 +0000
29b3d1ae1d Fixed Keep X Previous States tigi6346 2023-03-12 08:01:08 +0200
9e320a34c8 Fixed Keep X Previous States tigi6346 2023-03-12 08:00:03 +0200
8ed09f9b87 Merge pull request 'Catch OOM and run whisper on cpu automatically.' (#117) from zim33/ai-voice-cloning:vram into master mrq 2023-03-12 05:09:53 +0000
61500107ab Catch OOM and run whisper on cpu automatically. tigi6346 2023-03-12 06:48:28 +0200
ede9804b76 added option to trim silence using torchaudio's VAD mrq 2023-03-11 21:41:35 +0000
dea2fa9caf added fields to offset start/end slices to apply in bulk when slicing mrq 2023-03-11 21:34:29 +0000
89bb3d4419 rename transcribe button since it does more than transcribe mrq 2023-03-11 21:18:04 +0000
382a3e4104 rely on the whisper.json for handling a lot more things mrq 2023-03-11 21:17:11 +0000
9b376c381f brain worm mrq 2023-03-11 18:14:32 +0000
94551fb9ac split slicing dataset routine so it can be done after the fact mrq 2023-03-11 17:27:01 +0000
e3fdb79b49 rocm5.2 works for me desu so I bumped it back up mrq 2023-03-11 17:02:56 +0000
e680d84a13 removed the hotfix pip installs that whisperx requires now that whisperx is gone mrq 2023-03-11 16:55:19 +0000
cf41492f76 fall back to normal behavior if theres actually no audiofiles loaded from the dataset when using it for computing latents mrq 2023-03-11 16:46:03 +0000
b90c164778 Farewell, parasite mrq 2023-03-11 16:40:34 +0000
2424c455cb added option to not slice audio when transcribing, added option to prepare validation dataset on audio duration, added a warning if youre using whisperx and you're slicing audio mrq 2023-03-11 16:32:35 +0000
dcdcf8516c master (#112) tigi6346 2023-03-11 03:28:04 +0000
008a1f5f8f simplified spawning the training process by having it spawn the distributed training processes in the train.py script, so it should work on Windows too mrq 2023-03-11 01:37:00 +0000
2feb6da0c0 cleanups and fixes, fix DLAS throwing errors from '''too short of sound files''' by just culling them during transcription mrq 2023-03-11 01:19:49 +0000
7f2da0f5fb rewrote how AIVC gets training metrics (need to clean up later) mrq 2023-03-10 22:35:32 +0000
df0edacc60 fix the cleanup actually only doing 2 despite requesting more than 2, surprised no one has pointed it out mrq 2023-03-10 14:04:07 +0000
8e890d3023 forgot to fix reset settings to use the new arg-agnostic way mrq 2023-03-10 13:49:39 +0000
d250e0ec17 brain fried mrq 2023-03-10 04:27:34 +0000
0b364b590e maybe don't --force-reinstall to try and force downgrading, it just forces everything to uninstall then reinstall mrq 2023-03-10 04:22:47 +0000
c231d842aa make dependencies after the one in this repo force reinstall to downgrade, i hope, I hav eother things to do than validate this works mrq 2023-03-10 03:53:21 +0000
c92b006129 I really hate YAML mrq 2023-03-10 03:48:46 +0000
d3184004fd only God knows why the YAML spec lets you specify string values without quotes mrq 2023-03-10 01:58:30 +0000
eb1551ee92 what I thought was an override and not a ternary mrq 2023-03-09 23:04:02 +0000
c3b43d2429 today I learned adamw_zero actually negates ANY LR schemes mrq 2023-03-09 19:42:31 +0000
cb273b8428 cleanup mrq 2023-03-09 18:34:52 +0000
7c71f7239c expose options for CosineAnnealingLR_Restart (seems to be able to train very quickly due to the restarts mrq 2023-03-09 14:17:01 +0000
2f6dd9c076 some cleanup mrq 2023-03-09 06:20:05 +0000
5460e191b0 added loss graph, because I'm going to experiment with cosine annealing LR and I need to view my loss mrq 2023-03-09 05:54:08 +0000
a182df8f4e is mrq 2023-03-09 04:33:12 +0000
a01eb10960 (try to) unload voicefixer if it raises an error during loading voicefixer mrq 2023-03-09 04:28:14 +0000
dc1902b91c cleanup block that makes embedding latents for random/microphone happen, remove builtin voice options from voice list to avoid duplicates mrq 2023-03-09 04:23:36 +0000
797882336b maybe remedy an issue that crops up if you have a non-wav and non-json file in a results folder (assuming) mrq 2023-03-09 04:06:07 +0000

Commit Graph Select branches Hide Pull Requests master Mono Color

Commit Graph

Select branches

Hide Pull Requests

master