Commit Graph

329 Commits

Author SHA1 Message Date
mrq
5003bc89d3 cleaned up brain worms with wrapping around gradio progress by instead just using tqdm directly (slight regressions with some messages not getting pushed) 2023-05-04 23:40:33 +00:00
mrq
09d849a78f quick hotfix if it actually is a problem in the repo itself 2023-05-04 23:01:47 +00:00
mrq
853c7fdccf forgot to uncomment the block to transcribe and slice when using transcribe all because I was piece-processing a huge batch of LibriTTS and somehow that leaked over to the repo 2023-05-03 21:31:37 +00:00
mrq
fd306d850d updated setup-directml.bat to not hard require torch version because it's updated to torch2 now 2023-04-29 00:50:16 +00:00
mrq
eddb8aaa9a indentation fix 2023-04-28 15:56:57 +00:00
mrq
99387920e1 backported caching of phonemizer backend from mrq/vall-e 2023-04-28 15:31:45 +00:00
mrq
c5e9b407fa boolean oops 2023-04-27 14:40:22 +00:00
mrq
3978921e71 forgot to make the transcription tab visible with the bark backend (god the code is a mess now, I'll suck you off if you clean this up for me (not really)) 2023-04-26 04:55:10 +00:00
mrq
b6440091fb Very, very, VERY, barebones integration with Bark (documentation soon) 2023-04-26 04:48:09 +00:00
mrq
faa8da12d7 modified logic to determine valid voice folders, also allows subdirs within the folder (for example: ./voices/SH/james/ will be named SH/james) 2023-04-13 21:10:38 +00:00
mrq
02beb1dd8e should fix #203 2023-04-13 03:14:06 +00:00
mrq
8f3e9447ba disable diarize button 2023-04-12 20:03:54 +00:00
mrq
d8b996911c a bunch of shit i had uncommited over the past while pertaining to VALL-E 2023-04-12 20:02:46 +00:00
mrq
b785192dfc Merge pull request 'Make convenient to use with Docker' (#191) from psr/ai-voice-cloning:docker into master
Reviewed-on: #191
2023-04-08 14:04:45 +00:00
psr
9afafc69c1 docker: add training script 2023-04-07 23:15:13 +00:00
psr
c018bfca9c docker: add ffmpeg for whisper and general cleanup 2023-04-07 23:14:05 +00:00
psr
d64cba667f docker support 2023-04-07 21:52:18 +00:00
mrq
0440eac2bc #185 2023-03-31 06:55:52 +00:00
mrq
9f64153a28 fixes #185 2023-03-31 06:03:56 +00:00
mrq
4744120be2 added VALL-E inference support (very rudimentary, gimped, but it will load a model trained on a config generated through the web UI) 2023-03-31 03:26:00 +00:00
mrq
9b01377667 only include auto in the list of models under setting, nothing else 2023-03-29 19:53:23 +00:00
mrq
f66281f10c added mixing models (shamelessly inspired from voldy's web ui) 2023-03-29 19:29:13 +00:00
mrq
c89c648b4a fixes #176 2023-03-26 11:05:50 +00:00
mrq
41d47c7c2a for real this time show those new vall-e metrics 2023-03-26 04:31:50 +00:00
mrq
c4ca04cc92 added showing reported training accuracy and eval/validation metrics to graph 2023-03-26 04:08:45 +00:00
mrq
8c647c889d now there should be feature parity between trainers 2023-03-25 04:12:03 +00:00
mrq
fd9b2e082c x_lim and y_lim for graph 2023-03-25 02:34:14 +00:00
mrq
9856db5900 actually make parsing VALL-E metrics work 2023-03-23 15:42:51 +00:00
mrq
69d84bb9e0 I forget 2023-03-23 04:53:31 +00:00
mrq
444bcdaf62 my sanitizer actually did work, it was just batch sizes leading to problems when transcribing 2023-03-23 04:41:56 +00:00
mrq
a6daf289bc when the sanitizer thingy works in testing but it doesn't outside of testing, and you have to retranscribe for the fourth time today 2023-03-23 02:37:44 +00:00
mrq
86589fff91 why does this keep happening to me 2023-03-23 01:55:16 +00:00
mrq
0ea93a7f40 more cleanup, use 24KHz for preparing for VALL-E (encodec will resample to 24Khz anyways, makes audio a little nicer), some other things 2023-03-23 01:52:26 +00:00
mrq
d2a9ab9e41 remove redundant phonemize for vall-e (oops), quantize all files and then phonemize all files for cope optimization, load alignment model once instead of for every transcription (speedup with whisperx) 2023-03-23 00:22:25 +00:00
mrq
19c0854e6a do not write current whisper.json if there's no changes 2023-03-22 22:24:07 +00:00
mrq
932eaccdf5 added whisper transcription 'sanitizing' (collapse very short transcriptions to the previous segment) (I really have to stop having several copies spanning several machines for AIVC, I keep reverting shit) 2023-03-22 22:10:01 +00:00
mrq
736cdc8926 disable diarization for whisperx as it's just a useless performance hit (I don't have anything that's multispeaker within the same audio file at the moment) 2023-03-22 20:38:58 +00:00
mrq
aa5bdafb06 ugh 2023-03-22 20:26:28 +00:00
mrq
13605f980c now whisperx should output json that aligns with what's expected 2023-03-22 20:01:30 +00:00
mrq
8877960062 fixes for whisperx batching 2023-03-22 19:53:42 +00:00
mrq
4056a27bcb begrudgingly added back whisperx integration (VAD/Diarization testing, I really, really need accurate timestamps before dumping mondo amounts of time on training a dataset) 2023-03-22 19:24:53 +00:00
mrq
b8c3c4cfe2 Fixed #167 2023-03-22 18:21:37 +00:00
mrq
da96161aaa oops 2023-03-22 18:07:46 +00:00
mrq
f822c87344 cleanups, realigning vall-e training 2023-03-22 17:47:23 +00:00
mrq
909325bb5a ugh 2023-03-21 22:18:57 +00:00
mrq
5a5fd9ca87 Added option to unsqueeze sample batches after sampling 2023-03-21 21:34:26 +00:00
mrq
9657c1d4ce oops 2023-03-21 20:31:01 +00:00
mrq
0c2a9168f8 DLAS is PIPified (but I'm still cloning it as a submodule to make updating it easier) 2023-03-21 15:46:53 +00:00
mrq
34ef0467b9 VALL-E config edits 2023-03-20 01:22:53 +00:00
mrq
2e33bf071a forgot to not require it to be relative 2023-03-19 22:05:33 +00:00