1
0
Fork 0
Commit Graph

379 Commits (2fae5008fcdc8a6f80b28d944e327056b0a8c366)
 

Author SHA1 Message Date
mrq 5003bc89d3 cleaned up brain worms with wrapping around gradio progress by instead just using tqdm directly (slight regressions with some messages not getting pushed) 2023-05-04 23:40:33 +07:00
mrq 09d849a78f quick hotfix if it actually is a problem in the repo itself 2023-05-04 23:01:47 +07:00
mrq 853c7fdccf forgot to uncomment the block to transcribe and slice when using transcribe all because I was piece-processing a huge batch of LibriTTS and somehow that leaked over to the repo 2023-05-03 21:31:37 +07:00
mrq fd306d850d updated setup-directml.bat to not hard require torch version because it's updated to torch2 now 2023-04-29 00:50:16 +07:00
mrq eddb8aaa9a indentation fix 2023-04-28 15:56:57 +07:00
mrq 99387920e1 backported caching of phonemizer backend from mrq/vall-e 2023-04-28 15:31:45 +07:00
mrq c5e9b407fa boolean oops 2023-04-27 14:40:22 +07:00
mrq 3978921e71 forgot to make the transcription tab visible with the bark backend (god the code is a mess now, I'll suck you off if you clean this up for me (not really)) 2023-04-26 04:55:10 +07:00
mrq b6440091fb Very, very, VERY, barebones integration with Bark (documentation soon) 2023-04-26 04:48:09 +07:00
mrq faa8da12d7 modified logic to determine valid voice folders, also allows subdirs within the folder (for example: ./voices/SH/james/ will be named SH/james) 2023-04-13 21:10:38 +07:00
mrq 02beb1dd8e should fix #203 2023-04-13 03:14:06 +07:00
mrq 8f3e9447ba disable diarize button 2023-04-12 20:03:54 +07:00
mrq d8b996911c a bunch of shit i had uncommited over the past while pertaining to VALL-E 2023-04-12 20:02:46 +07:00
mrq b785192dfc Merge pull request 'Make convenient to use with Docker' (#191) from psr/ai-voice-cloning:docker into master
Reviewed-on: mrq/ai-voice-cloning#191
2023-04-08 14:04:45 +07:00
psr 9afafc69c1 docker: add training script 2023-04-07 23:15:13 +07:00
psr c018bfca9c docker: add ffmpeg for whisper and general cleanup 2023-04-07 23:14:05 +07:00
psr d64cba667f docker support 2023-04-07 21:52:18 +07:00
mrq 0440eac2bc #185 2023-03-31 06:55:52 +07:00
mrq 9f64153a28 fixes #185 2023-03-31 06:03:56 +07:00
mrq 4744120be2 added VALL-E inference support (very rudimentary, gimped, but it will load a model trained on a config generated through the web UI) 2023-03-31 03:26:00 +07:00
mrq 9b01377667 only include auto in the list of models under setting, nothing else 2023-03-29 19:53:23 +07:00
mrq f66281f10c added mixing models (shamelessly inspired from voldy's web ui) 2023-03-29 19:29:13 +07:00
mrq c89c648b4a fixes #176 2023-03-26 11:05:50 +07:00
mrq 41d47c7c2a for real this time show those new vall-e metrics 2023-03-26 04:31:50 +07:00
mrq c4ca04cc92 added showing reported training accuracy and eval/validation metrics to graph 2023-03-26 04:08:45 +07:00
mrq 8c647c889d now there should be feature parity between trainers 2023-03-25 04:12:03 +07:00
mrq fd9b2e082c x_lim and y_lim for graph 2023-03-25 02:34:14 +07:00
mrq 9856db5900 actually make parsing VALL-E metrics work 2023-03-23 15:42:51 +07:00
mrq 69d84bb9e0 I forget 2023-03-23 04:53:31 +07:00
mrq 444bcdaf62 my sanitizer actually did work, it was just batch sizes leading to problems when transcribing 2023-03-23 04:41:56 +07:00
mrq a6daf289bc when the sanitizer thingy works in testing but it doesn't outside of testing, and you have to retranscribe for the fourth time today 2023-03-23 02:37:44 +07:00
mrq 86589fff91 why does this keep happening to me 2023-03-23 01:55:16 +07:00
mrq 0ea93a7f40 more cleanup, use 24KHz for preparing for VALL-E (encodec will resample to 24Khz anyways, makes audio a little nicer), some other things 2023-03-23 01:52:26 +07:00
mrq d2a9ab9e41 remove redundant phonemize for vall-e (oops), quantize all files and then phonemize all files for cope optimization, load alignment model once instead of for every transcription (speedup with whisperx) 2023-03-23 00:22:25 +07:00
mrq 19c0854e6a do not write current whisper.json if there's no changes 2023-03-22 22:24:07 +07:00
mrq 932eaccdf5 added whisper transcription 'sanitizing' (collapse very short transcriptions to the previous segment) (I really have to stop having several copies spanning several machines for AIVC, I keep reverting shit) 2023-03-22 22:10:01 +07:00
mrq 736cdc8926 disable diarization for whisperx as it's just a useless performance hit (I don't have anything that's multispeaker within the same audio file at the moment) 2023-03-22 20:38:58 +07:00
mrq aa5bdafb06 ugh 2023-03-22 20:26:28 +07:00
mrq 13605f980c now whisperx should output json that aligns with what's expected 2023-03-22 20:01:30 +07:00
mrq 8877960062 fixes for whisperx batching 2023-03-22 19:53:42 +07:00
mrq 4056a27bcb begrudgingly added back whisperx integration (VAD/Diarization testing, I really, really need accurate timestamps before dumping mondo amounts of time on training a dataset) 2023-03-22 19:24:53 +07:00
mrq b8c3c4cfe2 Fixed #167 2023-03-22 18:21:37 +07:00
mrq da96161aaa oops 2023-03-22 18:07:46 +07:00
mrq f822c87344 cleanups, realigning vall-e training 2023-03-22 17:47:23 +07:00
mrq 909325bb5a ugh 2023-03-21 22:18:57 +07:00
mrq 5a5fd9ca87 Added option to unsqueeze sample batches after sampling 2023-03-21 21:34:26 +07:00
mrq 9657c1d4ce oops 2023-03-21 20:31:01 +07:00
mrq 0c2a9168f8 DLAS is PIPified (but I'm still cloning it as a submodule to make updating it easier) 2023-03-21 15:46:53 +07:00
mrq 34ef0467b9 VALL-E config edits 2023-03-20 01:22:53 +07:00
mrq 2e33bf071a forgot to not require it to be relative 2023-03-19 22:05:33 +07:00