ai-voice-cloning

mrq/ai-voice-cloning

Fork 43

Commit Graph

Select branches

Hide Pull Requests

master

#112

#114

#117

#118

#122

#124

#191

#301

#328

#333

#334

#336

#341

#350

#369

#393

#448

#455

#474

#475

#5

#57

#65

#66

#67

#76

b6440091fb Very, very, VERY, barebones integration with Bark (documentation soon) mrq 2023-04-26 04:48:09 +0000
faa8da12d7 modified logic to determine valid voice folders, also allows subdirs within the folder (for example: ./voices/SH/james/ will be named SH/james) mrq 2023-04-13 21:10:38 +0000
02beb1dd8e should fix #203 mrq 2023-04-13 03:14:06 +0000
8f3e9447ba disable diarize button mrq 2023-04-12 20:03:54 +0000
d8b996911c a bunch of shit i had uncommited over the past while pertaining to VALL-E mrq 2023-04-12 20:02:46 +0000
b785192dfc Merge pull request 'Make convenient to use with Docker' (#191) from psr/ai-voice-cloning:docker into master 1719373636096455478/tmp_refs/heads/master 1719373636096455478/master 1719336150268956556/tmp_refs/heads/master 1719336150268956556/master 1719319893816093853/tmp_refs/heads/master 1719319893816093853/master 1719248671110143904/tmp_refs/heads/master 1719248671110143904/master 1719224828495987884/tmp_refs/heads/master 1719224828495987884/master 1719210458897772652/tmp_refs/heads/master 1719210458897772652/master 1719189769673937164/tmp_refs/heads/master 1719189769673937164/master 1719181514050339859/tmp_refs/heads/master 1719181514050339859/master 1719168172939032730/tmp_refs/heads/master 1719168172939032730/master 1719106432709480698/tmp_refs/heads/master 1719106432709480698/master 1719095132553664016/tmp_refs/heads/master 1719095132553664016/master 1719080382684497322/tmp_refs/heads/master 1719080382684497322/master 1719025161630620263/tmp_refs/heads/master 1719025161630620263/master 1718477128811967308/tmp_refs/heads/master 1718477128811967308/master 1718319592165977889/tmp_refs/heads/master 1718319592165977889/master 1718315208755040834/tmp_refs/heads/master 1718315208755040834/master 1718289566249380489/tmp_refs/heads/master 1718289566249380489/master 1718287587838754072/tmp_refs/heads/master 1718287587838754072/master 1718267195814662126/tmp_refs/heads/master 1718267195814662126/master 1718238355305234909/tmp_refs/heads/master 1718238355305234909/master 1718234001960155728/tmp_refs/heads/master 1718234001960155728/master 1718214109946968367/tmp_refs/heads/master 1718214109946968367/master 1718204779376670887/tmp_refs/heads/master 1718204779376670887/master 1718202771646667073/tmp_refs/heads/master 1718202771646667073/master 1718202730217575054/tmp_refs/heads/master 1718202730217575054/master 1718190556273790631/tmp_refs/heads/master 1718190556273790631/master 1718161111673942806/tmp_refs/heads/master 1718161111673942806/master 1718147168965160540/tmp_refs/heads/master 1718147168965160540/master 1718128345946219644/tmp_refs/heads/master 1718128345946219644/master 1718127056723076795/tmp_refs/heads/master 1718127056723076795/master 1718119785151024971/tmp_refs/heads/master 1718119785151024971/master 1718117373102861504/tmp_refs/heads/master 1718117373102861504/master 1718111632702345669/tmp_refs/heads/master 1718111632702345669/master 1718068695336302491/tmp_refs/heads/master 1718068695336302491/master 1718061861750099357/tmp_refs/heads/master 1718061861750099357/master 1718032521237969252/tmp_refs/heads/master 1718032521237969252/master 1717979867071547522/tmp_refs/heads/master 1717979867071547522/master 1717889637999330092/tmp_refs/heads/master 1717889637999330092/master 1717820257964981458/tmp_refs/heads/master 1717820257964981458/master 1717567490299402258/tmp_refs/heads/master 1717567490299402258/master 1716609579068487391/tmp_refs/heads/master 1716609579068487391/master 1716568970821298934/tmp_refs/heads/master 1716568970821298934/master 1716388237984825896/tmp_refs/heads/master 1716388237984825896/master 1716370187620736285/tmp_refs/heads/master 1716370187620736285/master 1716368331916285620/tmp_refs/heads/master 1716368331916285620/master 1716266808170581200/tmp_refs/heads/master 1716266808170581200/master 1716265371988873992/tmp_refs/heads/master 1716265371988873992/master 1716262385307213403/tmp_refs/heads/master 1716262385307213403/master 1716252918897772597/tmp_refs/heads/master 1716252918897772597/master 1716239533133343499/tmp_refs/heads/master 1716239533133343499/master 1716238883816000390/tmp_refs/heads/master 1716238883816000390/master 1715698302717244053/tmp_refs/heads/master 1715698302717244053/master 1712755243608008398/tmp_refs/heads/master 1712755243608008398/master mrq 2023-04-08 14:04:45 +0000
9afafc69c1 docker: add training script 1719178372617909202/tmp_refs/heads/docker 1719178372617909202/docker 1718921075993991830/tmp_refs/heads/docker 1718921075993991830/docker 1718444860582551892/tmp_refs/heads/docker 1718444860582551892/docker 1717814393909281101/tmp_refs/heads/docker 1717814393909281101/docker 1717701009943031383/tmp_refs/heads/docker 1717701009943031383/docker 1716583941042322229/tmp_refs/heads/docker 1716583941042322229/docker 1713987806166792005/tmp_refs/heads/docker 1713987806166792005/docker 1713976311021672503/tmp_refs/heads/docker 1713976311021672503/docker psr 2023-04-07 23:15:13 +0000
c018bfca9c docker: add ffmpeg for whisper and general cleanup psr 2023-04-07 23:14:05 +0000
d64cba667f docker support psr 2023-04-05 22:38:53 +0000
0440eac2bc #185 1719160104568603038/tmp_refs/heads/master 1719160104568603038/master 1718532341696176511/tmp_refs/heads/master 1718532341696176511/master 1718495818077718890/tmp_refs/heads/master 1718495818077718890/master 1718138867049869203/tmp_refs/heads/master 1718138867049869203/master 1718138439915398670/tmp_refs/heads/master 1718138439915398670/master 1717532285763488434/tmp_refs/heads/master 1717532285763488434/master 1717530385353635799/tmp_refs/heads/master 1717530385353635799/master 1717514265826572041/tmp_refs/heads/master 1717514265826572041/master 1717012868246891558/tmp_refs/heads/master 1717012868246891558/master 1717009465552171831/tmp_refs/heads/master 1717009465552171831/master 1716992749052423640/tmp_refs/heads/master 1716992749052423640/master 1716602720595294069/tmp_refs/heads/master 1716602720595294069/master 1716602708561410421/tmp_refs/heads/master 1716602708561410421/master 1713987842106006134/tmp_refs/heads/master 1713987842106006134/master 1710271458855113467/tmp_refs/heads/master 1710271458855113467/master mrq 2023-03-31 06:55:52 +0000
9f64153a28 fixes #185 mrq 2023-03-31 06:03:56 +0000
4744120be2 added VALL-E inference support (very rudimentary, gimped, but it will load a model trained on a config generated through the web UI) mrq 2023-03-31 03:26:00 +0000
9b01377667 only include auto in the list of models under setting, nothing else mrq 2023-03-29 19:53:23 +0000
f66281f10c added mixing models (shamelessly inspired from voldy's web ui) mrq 2023-03-29 19:29:13 +0000
c89c648b4a fixes #176 mrq 2023-03-26 11:05:50 +0000
41d47c7c2a for real this time show those new vall-e metrics mrq 2023-03-26 04:31:50 +0000
c4ca04cc92 added showing reported training accuracy and eval/validation metrics to graph mrq 2023-03-26 04:08:45 +0000
8c647c889d now there should be feature parity between trainers mrq 2023-03-25 04:12:03 +0000
fd9b2e082c x_lim and y_lim for graph mrq 2023-03-25 02:34:14 +0000
9856db5900 actually make parsing VALL-E metrics work mrq 2023-03-23 15:42:51 +0000
69d84bb9e0 I forget mrq 2023-03-23 04:53:31 +0000
444bcdaf62 my sanitizer actually did work, it was just batch sizes leading to problems when transcribing mrq 2023-03-23 04:41:56 +0000
a6daf289bc when the sanitizer thingy works in testing but it doesn't outside of testing, and you have to retranscribe for the fourth time today mrq 2023-03-23 02:37:44 +0000
86589fff91 why does this keep happening to me mrq 2023-03-23 01:55:16 +0000
0ea93a7f40 more cleanup, use 24KHz for preparing for VALL-E (encodec will resample to 24Khz anyways, makes audio a little nicer), some other things mrq 2023-03-23 01:52:26 +0000
d2a9ab9e41 remove redundant phonemize for vall-e (oops), quantize all files and then phonemize all files for cope optimization, load alignment model once instead of for every transcription (speedup with whisperx) mrq 2023-03-23 00:22:25 +0000
19c0854e6a do not write current whisper.json if there's no changes mrq 2023-03-22 22:24:07 +0000
932eaccdf5 added whisper transcription 'sanitizing' (collapse very short transcriptions to the previous segment) (I really have to stop having several copies spanning several machines for AIVC, I keep reverting shit) mrq 2023-03-22 22:10:01 +0000
736cdc8926 disable diarization for whisperx as it's just a useless performance hit (I don't have anything that's multispeaker within the same audio file at the moment) mrq 2023-03-22 20:38:58 +0000
aa5bdafb06 ugh mrq 2023-03-22 20:26:28 +0000
13605f980c now whisperx should output json that aligns with what's expected mrq 2023-03-22 20:01:30 +0000
8877960062 fixes for whisperx batching mrq 2023-03-22 19:53:42 +0000
4056a27bcb begrudgingly added back whisperx integration (VAD/Diarization testing, I really, really need accurate timestamps before dumping mondo amounts of time on training a dataset) mrq 2023-03-22 19:24:53 +0000
b8c3c4cfe2 Fixed #167 mrq 2023-03-22 18:21:37 +0000
da96161aaa oops mrq 2023-03-22 18:07:46 +0000
f822c87344 cleanups, realigning vall-e training mrq 2023-03-22 17:47:23 +0000
909325bb5a ugh mrq 2023-03-21 22:18:57 +0000
5a5fd9ca87 Added option to unsqueeze sample batches after sampling mrq 2023-03-21 21:34:26 +0000
9657c1d4ce oops mrq 2023-03-21 20:31:01 +0000
0c2a9168f8 DLAS is PIPified (but I'm still cloning it as a submodule to make updating it easier) mrq 2023-03-21 15:46:53 +0000
34ef0467b9 VALL-E config edits mrq 2023-03-20 01:22:53 +0000
2e33bf071a forgot to not require it to be relative mrq 2023-03-19 22:05:33 +0000
5cb86106ce option to set results folder location mrq 2023-03-19 22:03:41 +0000
74510e8623 doing what I do best: sourcing other configs and banging until it works (it doesnt work) 1719311123307856900/tmp_refs/heads/master 1719311123307856900/master 1719286918884781148/tmp_refs/heads/master 1719286918884781148/master 1719253482172443511/tmp_refs/heads/master 1719253482172443511/master 1719108876810718386/tmp_refs/heads/master 1719108876810718386/master 1718992223388852273/tmp_refs/heads/master 1718992223388852273/master 1718921741703597662/tmp_refs/heads/master 1718921741703597662/master 1718633017295081539/tmp_refs/heads/master 1718633017295081539/master 1718631708596684502/tmp_refs/heads/master 1718631708596684502/master 1716388557237180463/tmp_refs/heads/master 1716388557237180463/master 1716388437834836093/tmp_refs/heads/master 1716388437834836093/master 1715038007029102864/tmp_refs/heads/master 1715038007029102864/master 1714672194237014644/tmp_refs/heads/master 1714672194237014644/master 1712616702886394023/tmp_refs/heads/master 1712616702886394023/master mrq 2023-03-18 15:16:15 +0000
da9b4b5fb5 tweaks mrq 2023-03-18 15:14:22 +0000
f44895978d brain worms mrq 2023-03-17 20:08:08 +0000
b17260cddf added japanese tokenizer (experimental) mrq 2023-03-17 20:04:40 +0000
f34cc382c5 yammed mrq 2023-03-17 18:57:36 +0000
96b7f9d2cc yammed mrq 2023-03-17 13:08:34 +0000
249c6019af cleanup, metrics are grabbed for vall-e trainer mrq 2023-03-17 05:33:49 +0000
1b72d0bba0 forgot to separate phonemes by spaces for [redacted] mrq 2023-03-17 02:08:07 +0000
d4c50967a6 cleaned up some prepare dataset code mrq 2023-03-17 01:24:02 +0000
0b62ccc112 setup bnb on windows as needed mrq 2023-03-16 20:48:48 +0000
c4edfb7d5e unbump rocm5.4.2 because it does not work for me desu mrq 2023-03-16 15:33:23 +0000
520fbcd163 bumped torch up (CUDA: 11.8, ROCm, 5.4.2) mrq 2023-03-16 15:09:11 +0000
1a8c5de517 unk hunting mrq 2023-03-16 14:59:12 +0000
46ff3c476a fixes v2 mrq 2023-03-16 14:41:40 +0000
0408d44602 fixed reload tts being broken due to being as untouched as I am mrq 2023-03-16 14:24:44 +0000
aeb904a800 yammed mrq 2023-03-16 14:23:47 +0000
f9154c4db1 fixes mrq 2023-03-16 14:19:56 +0000
54f2fc792a ops mrq 2023-03-16 05:14:15 +0000
0a7d6f02a7 ops mrq 2023-03-16 04:54:17 +0000
4ac43fa3a3 I forgot I undid the thing in DLAS mrq 2023-03-16 04:51:35 +0000
da4f92681e oops mrq 2023-03-16 04:35:12 +0000
ee8270bdfb preparations for training an IPA-based finetune mrq 2023-03-16 04:25:33 +0000
7b80f7a42f fixed not cleaning up states while training (oops) mrq 2023-03-15 02:48:05 +0000
b31bf1206e oops mrq 2023-03-15 01:51:04 +0000
d752a22331 print a warning if automatically deduced batch size returns 1 mrq 2023-03-15 01:20:15 +0000
f6d34e1dd3 and maybe I should have actually tested with ./models/tokenizers/ made mrq 2023-03-15 01:09:20 +0000
5e4f6808ce I guess I didn't test on a blank-ish slate mrq 2023-03-15 00:54:27 +0000
363d0b09b1 added options to pick tokenizer json and diffusion model (so I don't have to add it in later when I get bored and add in diffusion training) mrq 2023-03-15 00:37:38 +0000
07b684c4e7 removed redundant training data (they exist within tortoise itself anyways), added utility: view tokenized text mrq 2023-03-14 21:51:27 +0000
469dd47a44 fixes #131 mrq 2023-03-14 18:58:03 +0000
84b7383428 fixes #134 mrq 2023-03-14 18:52:56 +0000
4b952ea52a fixes #132 mrq 2023-03-14 18:46:20 +0000
fe03ae5839 fixes mrq 2023-03-14 17:42:42 +0000
9d2c7fb942 cleanup mrq 2023-03-14 16:23:29 +0000
65fe304267 fixed broken graph displaying mrq 2023-03-14 16:04:56 +0000
7b16b3e88a ;) mrq 2023-03-14 15:48:09 +0000
c85e32ff53 (: mrq 2023-03-14 14:08:35 +0000
54036fd780 :) mrq 2023-03-14 05:02:14 +0000
92a05d3c4c added PYTHONUTF8 to start/train bats mrq 2023-03-14 02:29:11 +0000
dadb1fca6b multichannel audio now report correct duration (surprised it took this long for me to source multichannel audio) mrq 2023-03-13 21:24:51 +0000
32d968a8cd (disabled by default until i validate it working) added additional transcription text normalization (something else I'm experimenting with requires it) mrq 2023-03-13 19:07:23 +0000
66ac8ba766 added mel LR weight (as I finally understand when to adjust the text), added text validation on dataset creation mrq 2023-03-13 18:51:53 +0000
ee1b048d07 when creating the train/validatio datasets, use segments if the main audio's duration is too long, and slice to make the segments if they don't exist mrq 2023-03-13 04:26:00 +0000
0cf9db5e69 oops mrq 2023-03-13 01:33:45 +0000
050bcefd73 resample to 22.5K when creating training inputs (to avoid redundant downsampling when loaded for training, even though most of my inputs are already at 22.5K), generalized resampler function to cache and reuse them, do not unload whisper when done transcribing since it gets unloaded anyways for any other non-transcription task mrq 2023-03-13 01:20:55 +0000
7c9c0dc584 forgot to clean up debug prints mrq 2023-03-13 00:44:37 +0000
239c984850 move validating audio to creating the text files instead, consider audio longer than 11 seconds invalid, consider text lengths over 200 invalid mrq 2023-03-12 23:39:00 +0000
51ddc205cd update submodules mrq 2023-03-12 18:14:36 +0000
ccbf2e6aff blame #122 mrq 2023-03-12 17:51:52 +0000
478ed46e3b fixed empty training list prevent starting program tigi6346 2023-03-12 19:47:29 +0200
9238df0b03 fixed last generation settings not actually load because brain worms mrq 2023-03-12 15:49:50 +0000
9594a960b0 Disable loss ETA for now until I fix it mrq 2023-03-12 15:39:54 +0000
51f6c347fe Merge pull request 'updated several default configurations to not cause null/empty errors. also default samples/iterations to 16-30 ultra fast which is typically suggested.' (#122) from zim33/ai-voice-cloning:save_more_user_config into master mrq 2023-03-12 15:38:34 +0000
be8b290a1a Merge branch 'master' into save_more_user_config mrq 2023-03-12 15:38:08 +0000
296129ba9c output fixes, I'm not sure why ETA wasn't working but it works in testing mrq 2023-03-12 15:17:07 +0000
098d7ad635 uh I don't remember, small things mrq 2023-03-12 14:47:48 +0000
233baa4e45 updated several default configurations to not cause null/empty errors. also default samples/iterations to 16-30 ultra fast which is typically suggested. tigi6346 2023-03-12 16:08:02 +0200

Commit Graph Select branches Hide Pull Requests master #112 #114 #117 #118 #122 #124 #191 #301 #328 #333 #334 #336 #341 #350 #369 #393 #448 #455 #474 #475 #5 #57 #65 #66 #67 #76 Mono Color

Commit Graph

Select branches

Hide Pull Requests

master

#112

#114

#117

#118

#122

#124

#191

#301

#328

#333

#334

#336

#341

#350

#369

#393

#448

#455

#474

#475

#5

#57

#65

#66

#67

#76