89d7642a0fMerge pull request 'fixed setup scripts and Dockerfile to NOT use extra-index-url and instead use index-url (how this happened I don't know, since pytorch instructions use index-url), '''fixed''' phonemizing japanese for VALL-E with pykakasi' (#1) from mrq/ai-voice-cloning:master into masterterminator2023-10-22 18:26:55 +0000
2830d1fa96fixed setup scripts and Dockerfile to NOT use extra-index-url and instead use index-url (how this happened I don't know, since pytorch instructions use index-url), '''fixed''' phonemizing japanese for VALL-E with pykakasimrq2023-10-12 00:27:46 +0000
a961141fe6refac: remove trailing space and add custom themeTerminator2023-10-01 15:25:03 -0300
17acfee5d0fixed culling for validation based on audio duration not workingmrq2023-09-21 22:33:11 +0000
2fae5008fcMerge pull request 'Freeze beartype==0.15.0' (#393) from Jarod/ai-voice-cloning:master into master
mrq
2023-09-19 02:25:04 +0000
7dd8b740e8freeze beartype==0.15.0, unfrozen comes from x-clip in dlasJarod Mica2023-09-18 17:18:44 -0700
5f80ee9b38set use-deepspeed to false because it's not a dependency and installing it as a dependency under windows is a huge nightmaremrq2023-09-04 22:09:09 +0000
b72f2216bfadded websocket server arguments to enabled it (now disabled by default) and to specify the address/port to listen onben_mkiv2023-08-26 17:38:58 +0200
690947ad36Do not double phonemize if using VALL-E backend (I wonder how many hours I've wasted from this oversight)mrq2023-08-26 00:02:17 +0000
6f0f148782websocket server: fix for model loading (just overriding args didn't do it after all...)ben_mkiv2023-08-26 01:40:35 +0200
578a5bcaddwebsocket server: fix for model loading (just overriding args didn't do it after all...)ben_mkiv2023-08-26 01:40:35 +0200
b4dc103931I don't know how I did not commit the 'sample from the voices to construct the input prompt for vall-e' change but this helpsmrq2023-08-25 04:26:48 +0000
a657623cbcupdated vall-e training template to use path-based speakers because it would just have a batch/epoch size of 1 otherwise; revert hardcoded 'spit processed dataset to this path' from my training rig to spit it out in a sane spotmrq2023-08-24 21:45:50 +0000
533b73e083fixed the overwrite regression for bark and vall-e backends toomrq2023-08-24 19:46:42 +0000
f5fab33e9cfixed defaults for vall-e backendmrq2023-08-24 19:44:52 +0000
4aa240d48aMerge pull request 'fix filename generation which didn't work and overwrote existing files' (#341) from ben_mkiv/ai-voice-cloning:master into master
mrq
2023-08-24 12:29:59 +0000
00b173857dfix filename generation which didn't work and overwrote existing filesben_mkiv2023-08-24 09:57:01 +0200
dc46fdc7d0fixed another issue from haphazardly copying my changes from my training machinemrq2023-08-23 22:09:22 +0000
29290f574eshould fix issue that arises when trying to prepare the dataset without slicing segmentsmrq2023-08-23 21:49:22 +0000
0a5483e57aupdated valle yaml templatemrq2023-08-23 21:42:32 +0000
e613299304Merge pull request 'favor existing arguments from parameters (kwargs) over global (args)' (#336) from ben_mkiv/ai-voice-cloning:master into master
mrq
2023-08-23 21:05:36 +0000
ce24ba41e2Websocket server, override args parameters for model settings (squashed)ben_mkiv2023-08-22 23:09:42 +0200
5f4215b3efMerge pull request 'websocket server: API change(!), better response format' (#334) from ben_mkiv/ai-voice-cloning:master into master
mrq
2023-08-22 20:35:42 +0000
5d73d9e71csmall QoL change to the StringNone helper, to allow generated text to be "None", maybe someone wants to generate that, we never know...ben_mkiv2023-08-22 21:49:49 +0200
9abcb0f193websocket server: API change(!), better response formatben_mkiv2023-08-22 21:37:19 +0200
fb1cfd059fMerge pull request 'websocket server: small fix' (#333) from ben_mkiv/ai-voice-cloning:master into master
mrq
2023-08-22 19:26:37 +0000
a902913780websocket server: workaround for values and None typeben_mkiv2023-08-22 20:20:49 +0200
2060b6f21cfixed issue with sliced audio being the wrong sample ratemrq2023-08-22 14:22:39 +0000
eeddd4cb6bforgot the important reason I even started working on AIVC againmrq2023-08-21 03:42:12 +0000
72a38ff2fcmade initialization faster if there's a lot of voice files (because glob fucking sucks), commiting changes buried on my training rigmrq2023-08-21 03:31:49 +0000
91a0c495ffMerge pull request 'added simple websocket server which allows to start tts generation tasks, retrieving autoregressive models and voices list' (#328) from ben_mkiv/ai-voice-cloning:master into master
mrq
2023-08-16 14:01:44 +0000
2626364c40added simple websocket server which allows to start tts generation tasks, retrieving autoregressive models and voices listben_mkiv2023-08-16 12:51:13 +0200
ac645e0a20no longer need to install bark under ./modules/mrq2023-07-11 16:20:28 +0000
e2a6dc1c0aunder bark, properly use transcribed audio if the audio wasn't actually sliced (oops)mrq2023-07-11 14:53:32 +0000
a325496661Merge pull request 'Freeze pydantic package to 1.10.11' (#301) from Jarod/ai-voice-cloning:master into master
mrq
2023-07-09 15:06:31 +0000
350d2d5a95Freeze pydantic package to 1.10.11
Jarod
2023-07-09 02:36:23 +0000
6c3f48efbauses gitmylo/bark-voice-cloning-HuBERT-quantizer for creating custom voices (it slightly works better over the base method, but still not very good desu)mrq2023-07-03 02:46:10 +0000
547e1d1277updated bark support, it'll also query for vocos, it actually works (I don't know what specifically was the issue)mrq2023-07-03 01:22:02 +0000
e227ab8e08updated whisperX integration for use with the latest version (v3) (NOTE: you WILL need to also update whisperx if you pull this commit)mrq2023-06-09 02:41:29 +0000
805d7d35e8the power of a separate setup for testingmrq2023-05-22 17:36:28 +0000
baa6b76b85added gradio API for changing AR modelmrq2023-05-21 23:20:39 +0000
31da215c5fadded checkboxes to use the original method for calculating latents (ignores the voice chunk field)mrq2023-05-21 01:47:48 +0000
9e3eca2261freeze gradio because I forgot to do it last week when it brokemrq2023-05-18 14:45:49 +0000
cbe21745dfI am very smart (need to validate)mrq2023-05-12 17:41:26 +0000
74bd0f0cdcrevert local change that made its way upstream (showing graphs by it instead of epoch)mrq2023-05-11 03:30:54 +0000
149aaca554fixed the whisperx has no attribute named load_model whatever because I guess whisperx has as stable of an API as I domrq2023-05-06 10:45:17 +0000
5003bc89d3cleaned up brain worms with wrapping around gradio progress by instead just using tqdm directly (slight regressions with some messages not getting pushed)mrq2023-05-04 23:40:33 +0000
09d849a78fquick hotfix if it actually is a problem in the repo itselfmrq2023-05-04 23:01:47 +0000
853c7fdccfforgot to uncomment the block to transcribe and slice when using transcribe all because I was piece-processing a huge batch of LibriTTS and somehow that leaked over to the repomrq2023-05-03 21:31:37 +0000
fd306d850dupdated setup-directml.bat to not hard require torch version because it's updated to torch2 nowmrq2023-04-29 00:50:16 +0000
3978921e71forgot to make the transcription tab visible with the bark backend (god the code is a mess now, I'll suck you off if you clean this up for me (not really))mrq2023-04-26 04:55:10 +0000
faa8da12d7modified logic to determine valid voice folders, also allows subdirs within the folder (for example: ./voices/SH/james/ will be named SH/james)mrq2023-04-13 21:10:38 +0000
4744120be2added VALL-E inference support (very rudimentary, gimped, but it will load a model trained on a config generated through the web UI)mrq2023-03-31 03:26:00 +0000
9b01377667only include auto in the list of models under setting, nothing elsemrq2023-03-29 19:53:23 +0000
f66281f10cadded mixing models (shamelessly inspired from voldy's web ui)mrq2023-03-29 19:29:13 +0000
444bcdaf62my sanitizer actually did work, it was just batch sizes leading to problems when transcribingmrq2023-03-23 04:41:56 +0000
a6daf289bcwhen the sanitizer thingy works in testing but it doesn't outside of testing, and you have to retranscribe for the fourth time todaymrq2023-03-23 02:37:44 +0000
86589fff91why does this keep happening to memrq2023-03-23 01:55:16 +0000
0ea93a7f40more cleanup, use 24KHz for preparing for VALL-E (encodec will resample to 24Khz anyways, makes audio a little nicer), some other thingsmrq2023-03-23 01:52:26 +0000
d2a9ab9e41remove redundant phonemize for vall-e (oops), quantize all files and then phonemize all files for cope optimization, load alignment model once instead of for every transcription (speedup with whisperx)mrq2023-03-23 00:22:25 +0000
19c0854e6ado not write current whisper.json if there's no changesmrq2023-03-22 22:24:07 +0000
932eaccdf5added whisper transcription 'sanitizing' (collapse very short transcriptions to the previous segment) (I really have to stop having several copies spanning several machines for AIVC, I keep reverting shit)mrq2023-03-22 22:10:01 +0000
736cdc8926disable diarization for whisperx as it's just a useless performance hit (I don't have anything that's multispeaker within the same audio file at the moment)mrq2023-03-22 20:38:58 +0000
13605f980cnow whisperx should output json that aligns with what's expectedmrq2023-03-22 20:01:30 +0000
8877960062fixes for whisperx batchingmrq2023-03-22 19:53:42 +0000
4056a27bcbbegrudgingly added back whisperx integration (VAD/Diarization testing, I really, really need accurate timestamps before dumping mondo amounts of time on training a dataset)mrq2023-03-22 19:24:53 +0000