Commit Graph

  • 909325bb5a ugh mrq 2023-03-21 22:18:57 +0000
  • 5a5fd9ca87 Added option to unsqueeze sample batches after sampling mrq 2023-03-21 21:34:26 +0000
  • 9657c1d4ce oops mrq 2023-03-21 20:31:01 +0000
  • 0c2a9168f8 DLAS is PIPified (but I'm still cloning it as a submodule to make updating it easier) mrq 2023-03-21 15:46:53 +0000
  • 34ef0467b9 VALL-E config edits mrq 2023-03-20 01:22:53 +0000
  • 2e33bf071a forgot to not require it to be relative mrq 2023-03-19 22:05:33 +0000
  • 5cb86106ce option to set results folder location mrq 2023-03-19 22:03:41 +0000
  • 74510e8623 doing what I do best: sourcing other configs and banging until it works (it doesnt work) mrq 2023-03-18 15:16:15 +0000
  • da9b4b5fb5 tweaks mrq 2023-03-18 15:14:22 +0000
  • f44895978d brain worms mrq 2023-03-17 20:08:08 +0000
  • b17260cddf added japanese tokenizer (experimental) mrq 2023-03-17 20:04:40 +0000
  • f34cc382c5 yammed mrq 2023-03-17 18:57:36 +0000
  • 96b7f9d2cc yammed mrq 2023-03-17 13:08:34 +0000
  • 249c6019af cleanup, metrics are grabbed for vall-e trainer mrq 2023-03-17 05:33:49 +0000
  • 1b72d0bba0 forgot to separate phonemes by spaces for [redacted] mrq 2023-03-17 02:08:07 +0000
  • d4c50967a6 cleaned up some prepare dataset code mrq 2023-03-17 01:24:02 +0000
  • 0b62ccc112 setup bnb on windows as needed mrq 2023-03-16 20:48:48 +0000
  • c4edfb7d5e unbump rocm5.4.2 because it does not work for me desu mrq 2023-03-16 15:33:23 +0000
  • 520fbcd163 bumped torch up (CUDA: 11.8, ROCm, 5.4.2) mrq 2023-03-16 15:09:11 +0000
  • 1a8c5de517 unk hunting mrq 2023-03-16 14:59:12 +0000
  • 46ff3c476a fixes v2 mrq 2023-03-16 14:41:40 +0000
  • 0408d44602 fixed reload tts being broken due to being as untouched as I am mrq 2023-03-16 14:24:44 +0000
  • aeb904a800 yammed mrq 2023-03-16 14:23:47 +0000
  • f9154c4db1 fixes mrq 2023-03-16 14:19:56 +0000
  • 54f2fc792a ops mrq 2023-03-16 05:14:15 +0000
  • 0a7d6f02a7 ops mrq 2023-03-16 04:54:17 +0000
  • 4ac43fa3a3 I forgot I undid the thing in DLAS mrq 2023-03-16 04:51:35 +0000
  • da4f92681e oops mrq 2023-03-16 04:35:12 +0000
  • ee8270bdfb preparations for training an IPA-based finetune mrq 2023-03-16 04:25:33 +0000
  • 7b80f7a42f fixed not cleaning up states while training (oops) mrq 2023-03-15 02:48:05 +0000
  • b31bf1206e oops mrq 2023-03-15 01:51:04 +0000
  • d752a22331 print a warning if automatically deduced batch size returns 1 mrq 2023-03-15 01:20:15 +0000
  • f6d34e1dd3 and maybe I should have actually tested with ./models/tokenizers/ made mrq 2023-03-15 01:09:20 +0000
  • 5e4f6808ce I guess I didn't test on a blank-ish slate mrq 2023-03-15 00:54:27 +0000
  • 363d0b09b1 added options to pick tokenizer json and diffusion model (so I don't have to add it in later when I get bored and add in diffusion training) mrq 2023-03-15 00:37:38 +0000
  • 07b684c4e7 removed redundant training data (they exist within tortoise itself anyways), added utility: view tokenized text mrq 2023-03-14 21:51:27 +0000
  • 469dd47a44 fixes #131 mrq 2023-03-14 18:58:03 +0000
  • 84b7383428 fixes #134 mrq 2023-03-14 18:52:56 +0000
  • 4b952ea52a fixes #132 mrq 2023-03-14 18:46:20 +0000
  • fe03ae5839 fixes mrq 2023-03-14 17:42:42 +0000
  • 9d2c7fb942 cleanup mrq 2023-03-14 16:23:29 +0000
  • 65fe304267 fixed broken graph displaying mrq 2023-03-14 16:04:56 +0000
  • 7b16b3e88a ;) mrq 2023-03-14 15:48:09 +0000
  • c85e32ff53 (: mrq 2023-03-14 14:08:35 +0000
  • 54036fd780 :) mrq 2023-03-14 05:02:14 +0000
  • 92a05d3c4c added PYTHONUTF8 to start/train bats mrq 2023-03-14 02:29:11 +0000
  • dadb1fca6b multichannel audio now report correct duration (surprised it took this long for me to source multichannel audio) mrq 2023-03-13 21:24:51 +0000
  • 32d968a8cd (disabled by default until i validate it working) added additional transcription text normalization (something else I'm experimenting with requires it) mrq 2023-03-13 19:07:23 +0000
  • 66ac8ba766 added mel LR weight (as I finally understand when to adjust the text), added text validation on dataset creation mrq 2023-03-13 18:51:53 +0000
  • ee1b048d07 when creating the train/validatio datasets, use segments if the main audio's duration is too long, and slice to make the segments if they don't exist mrq 2023-03-13 04:26:00 +0000
  • 0cf9db5e69 oops mrq 2023-03-13 01:33:45 +0000
  • 050bcefd73 resample to 22.5K when creating training inputs (to avoid redundant downsampling when loaded for training, even though most of my inputs are already at 22.5K), generalized resampler function to cache and reuse them, do not unload whisper when done transcribing since it gets unloaded anyways for any other non-transcription task mrq 2023-03-13 01:20:55 +0000
  • 7c9c0dc584 forgot to clean up debug prints mrq 2023-03-13 00:44:37 +0000
  • 239c984850 move validating audio to creating the text files instead, consider audio longer than 11 seconds invalid, consider text lengths over 200 invalid mrq 2023-03-12 23:39:00 +0000
  • 51ddc205cd update submodules mrq 2023-03-12 18:14:36 +0000
  • ccbf2e6aff blame mrq/ai-voice-cloning#122 mrq 2023-03-12 17:51:52 +0000
  • 9238df0b03 fixed last generation settings not actually load because brain worms mrq 2023-03-12 15:49:50 +0000
  • 9594a960b0 Disable loss ETA for now until I fix it mrq 2023-03-12 15:39:54 +0000
  • 51f6c347fe Merge pull request 'updated several default configurations to not cause null/empty errors. also default samples/iterations to 16-30 ultra fast which is typically suggested.' (#122) from zim33/ai-voice-cloning:save_more_user_config into master mrq 2023-03-12 15:38:34 +0000
  • be8b290a1a Merge branch 'master' into save_more_user_config mrq 2023-03-12 15:38:08 +0000
  • 296129ba9c output fixes, I'm not sure why ETA wasn't working but it works in testing mrq 2023-03-12 15:17:07 +0000
  • 098d7ad635 uh I don't remember, small things mrq 2023-03-12 14:47:48 +0000
  • 233baa4e45 updated several default configurations to not cause null/empty errors. also default samples/iterations to 16-30 ultra fast which is typically suggested. tigi6346 2023-03-12 16:08:02 +0200
  • 1ac278e885 Merge pull request 'keep_training' (#118) from zim33/ai-voice-cloning:keep_training into master mrq 2023-03-12 06:47:01 +0000
  • 29b3d1ae1d Fixed Keep X Previous States tigi6346 2023-03-12 08:01:08 +0200
  • 9e320a34c8 Fixed Keep X Previous States tigi6346 2023-03-12 08:00:03 +0200
  • 8ed09f9b87 Merge pull request 'Catch OOM and run whisper on cpu automatically.' (#117) from zim33/ai-voice-cloning:vram into master mrq 2023-03-12 05:09:53 +0000
  • 61500107ab Catch OOM and run whisper on cpu automatically. tigi6346 2023-03-12 06:48:28 +0200
  • ede9804b76 added option to trim silence using torchaudio's VAD mrq 2023-03-11 21:41:35 +0000
  • dea2fa9caf added fields to offset start/end slices to apply in bulk when slicing mrq 2023-03-11 21:34:29 +0000
  • 89bb3d4419 rename transcribe button since it does more than transcribe mrq 2023-03-11 21:18:04 +0000
  • 382a3e4104 rely on the whisper.json for handling a lot more things mrq 2023-03-11 21:17:11 +0000
  • 9b376c381f brain worm mrq 2023-03-11 18:14:32 +0000
  • 94551fb9ac split slicing dataset routine so it can be done after the fact mrq 2023-03-11 17:27:01 +0000
  • e3fdb79b49 rocm5.2 works for me desu so I bumped it back up mrq 2023-03-11 17:02:56 +0000
  • e680d84a13 removed the hotfix pip installs that whisperx requires now that whisperx is gone mrq 2023-03-11 16:55:19 +0000
  • cf41492f76 fall back to normal behavior if theres actually no audiofiles loaded from the dataset when using it for computing latents mrq 2023-03-11 16:46:03 +0000
  • b90c164778 Farewell, parasite mrq 2023-03-11 16:40:34 +0000
  • 2424c455cb added option to not slice audio when transcribing, added option to prepare validation dataset on audio duration, added a warning if youre using whisperx and you're slicing audio mrq 2023-03-11 16:32:35 +0000
  • dcdcf8516c master (#112) tigi6346 2023-03-11 03:28:04 +0000
  • 008a1f5f8f simplified spawning the training process by having it spawn the distributed training processes in the train.py script, so it should work on Windows too mrq 2023-03-11 01:37:00 +0000
  • 2feb6da0c0 cleanups and fixes, fix DLAS throwing errors from '''too short of sound files''' by just culling them during transcription mrq 2023-03-11 01:19:49 +0000
  • 7f2da0f5fb rewrote how AIVC gets training metrics (need to clean up later) mrq 2023-03-10 22:35:32 +0000
  • df0edacc60 fix the cleanup actually only doing 2 despite requesting more than 2, surprised no one has pointed it out mrq 2023-03-10 14:04:07 +0000
  • 8e890d3023 forgot to fix reset settings to use the new arg-agnostic way mrq 2023-03-10 13:49:39 +0000
  • d250e0ec17 brain fried mrq 2023-03-10 04:27:34 +0000
  • 0b364b590e maybe don't --force-reinstall to try and force downgrading, it just forces everything to uninstall then reinstall mrq 2023-03-10 04:22:47 +0000
  • c231d842aa make dependencies after the one in this repo force reinstall to downgrade, i hope, I hav eother things to do than validate this works mrq 2023-03-10 03:53:21 +0000
  • c92b006129 I really hate YAML mrq 2023-03-10 03:48:46 +0000
  • d3184004fd only God knows why the YAML spec lets you specify string values without quotes mrq 2023-03-10 01:58:30 +0000
  • eb1551ee92 what I thought was an override and not a ternary mrq 2023-03-09 23:04:02 +0000
  • c3b43d2429 today I learned adamw_zero actually negates ANY LR schemes mrq 2023-03-09 19:42:31 +0000
  • cb273b8428 cleanup mrq 2023-03-09 18:34:52 +0000
  • 7c71f7239c expose options for CosineAnnealingLR_Restart (seems to be able to train very quickly due to the restarts mrq 2023-03-09 14:17:01 +0000
  • 2f6dd9c076 some cleanup mrq 2023-03-09 06:20:05 +0000
  • 5460e191b0 added loss graph, because I'm going to experiment with cosine annealing LR and I need to view my loss mrq 2023-03-09 05:54:08 +0000
  • a182df8f4e is mrq 2023-03-09 04:33:12 +0000
  • a01eb10960 (try to) unload voicefixer if it raises an error during loading voicefixer mrq 2023-03-09 04:28:14 +0000
  • dc1902b91c cleanup block that makes embedding latents for random/microphone happen, remove builtin voice options from voice list to avoid duplicates mrq 2023-03-09 04:23:36 +0000
  • 797882336b maybe remedy an issue that crops up if you have a non-wav and non-json file in a results folder (assuming) mrq 2023-03-09 04:06:07 +0000