faa8da12d7modified logic to determine valid voice folders, also allows subdirs within the folder (for example: ./voices/SH/james/ will be named SH/james)mrq2023-04-13 21:10:38 +0000
4744120be2added VALL-E inference support (very rudimentary, gimped, but it will load a model trained on a config generated through the web UI)mrq2023-03-31 03:26:00 +0000
9b01377667only include auto in the list of models under setting, nothing elsemrq2023-03-29 19:53:23 +0000
f66281f10cadded mixing models (shamelessly inspired from voldy's web ui)mrq2023-03-29 19:29:13 +0000
444bcdaf62my sanitizer actually did work, it was just batch sizes leading to problems when transcribingmrq2023-03-23 04:41:56 +0000
a6daf289bcwhen the sanitizer thingy works in testing but it doesn't outside of testing, and you have to retranscribe for the fourth time todaymrq2023-03-23 02:37:44 +0000
86589fff91why does this keep happening to memrq2023-03-23 01:55:16 +0000
0ea93a7f40more cleanup, use 24KHz for preparing for VALL-E (encodec will resample to 24Khz anyways, makes audio a little nicer), some other thingsmrq2023-03-23 01:52:26 +0000
d2a9ab9e41remove redundant phonemize for vall-e (oops), quantize all files and then phonemize all files for cope optimization, load alignment model once instead of for every transcription (speedup with whisperx)mrq2023-03-23 00:22:25 +0000
19c0854e6ado not write current whisper.json if there's no changesmrq2023-03-22 22:24:07 +0000
932eaccdf5added whisper transcription 'sanitizing' (collapse very short transcriptions to the previous segment) (I really have to stop having several copies spanning several machines for AIVC, I keep reverting shit)mrq2023-03-22 22:10:01 +0000
736cdc8926disable diarization for whisperx as it's just a useless performance hit (I don't have anything that's multispeaker within the same audio file at the moment)mrq2023-03-22 20:38:58 +0000
13605f980cnow whisperx should output json that aligns with what's expectedmrq2023-03-22 20:01:30 +0000
8877960062fixes for whisperx batchingmrq2023-03-22 19:53:42 +0000
4056a27bcbbegrudgingly added back whisperx integration (VAD/Diarization testing, I really, really need accurate timestamps before dumping mondo amounts of time on training a dataset)mrq2023-03-22 19:24:53 +0000
d752a22331print a warning if automatically deduced batch size returns 1mrq2023-03-15 01:20:15 +0000
f6d34e1dd3and maybe I should have actually tested with ./models/tokenizers/ mademrq2023-03-15 01:09:20 +0000
5e4f6808ceI guess I didn't test on a blank-ish slatemrq2023-03-15 00:54:27 +0000
363d0b09b1added options to pick tokenizer json and diffusion model (so I don't have to add it in later when I get bored and add in diffusion training)mrq2023-03-15 00:37:38 +0000
07b684c4e7removed redundant training data (they exist within tortoise itself anyways), added utility: view tokenized textmrq2023-03-14 21:51:27 +0000
92a05d3c4cadded PYTHONUTF8 to start/train batsmrq2023-03-14 02:29:11 +0000
dadb1fca6bmultichannel audio now report correct duration (surprised it took this long for me to source multichannel audio)mrq2023-03-13 21:24:51 +0000
32d968a8cd(disabled by default until i validate it working) added additional transcription text normalization (something else I'm experimenting with requires it)mrq2023-03-13 19:07:23 +0000
66ac8ba766added mel LR weight (as I finally understand when to adjust the text), added text validation on dataset creationmrq2023-03-13 18:51:53 +0000
ee1b048d07when creating the train/validatio datasets, use segments if the main audio's duration is too long, and slice to make the segments if they don't existmrq2023-03-13 04:26:00 +0000
050bcefd73resample to 22.5K when creating training inputs (to avoid redundant downsampling when loaded for training, even though most of my inputs are already at 22.5K), generalized resampler function to cache and reuse them, do not unload whisper when done transcribing since it gets unloaded anyways for any other non-transcription taskmrq2023-03-13 01:20:55 +0000
7c9c0dc584forgot to clean up debug printsmrq2023-03-13 00:44:37 +0000
239c984850move validating audio to creating the text files instead, consider audio longer than 11 seconds invalid, consider text lengths over 200 invalidmrq2023-03-12 23:39:00 +0000
478ed46e3bfixed empty training list prevent starting programtigi63462023-03-12 19:47:29 +0200
9238df0b03fixed last generation settings not actually load because brain wormsmrq2023-03-12 15:49:50 +0000
9594a960b0Disable loss ETA for now until I fix itmrq2023-03-12 15:39:54 +0000
51f6c347feMerge pull request 'updated several default configurations to not cause null/empty errors. also default samples/iterations to 16-30 ultra fast which is typically suggested.' (#122) from zim33/ai-voice-cloning:save_more_user_config into master
mrq
2023-03-12 15:38:34 +0000
be8b290a1aMerge branch 'master' into save_more_user_config
mrq
2023-03-12 15:38:08 +0000
296129ba9coutput fixes, I'm not sure why ETA wasn't working but it works in testingmrq2023-03-12 15:17:07 +0000
098d7ad635uh I don't remember, small thingsmrq2023-03-12 14:47:48 +0000
233baa4e45updated several default configurations to not cause null/empty errors. also default samples/iterations to 16-30 ultra fast which is typically suggested.tigi63462023-03-12 16:08:02 +0200