Commit Graph

120 Commits

Author SHA1 Message Date
mrq
ac0a572679 arg to skip voice latents for grabbing voice lists (for preparing datasets) 2023-02-17 04:50:02 +00:00
mrq
9392a11cdd one more update 2023-02-16 23:18:02 +00:00
mrq
605ce2a706 fixing brain worms 2023-02-16 21:36:49 +00:00
mrq
efa43274bd oops 2023-02-16 13:23:07 +00:00
mrq
63bcadcbbe actually for real fixed incrementing filenames because i had a regex that actually only worked if candidates or lines>1, cuda now takes priority over dml if you're a nut with both of them installed because you can just specify an override anyways 2023-02-16 01:06:32 +00:00
mrq
7a4460ddf0 added setting "device-override", less naively decide the number to use for results, some other thing 2023-02-15 21:51:22 +00:00
mrq
f4d2d0d7f8 added option: force cpu for conditioning latents, for when you want low chunk counts but your GPU keeps OOMing because fuck fragmentation 2023-02-15 05:01:40 +00:00
mrq
2ee6068f98 done away with kludgy shit code, just have the user decide how many chunks to slice concat'd samples to (since it actually does improve vocie replicability) 2023-02-15 04:39:31 +00:00
mrq
c12ada600b added reset generation settings to default button, revamped utilities tab to double as plain jane voice importer (and runs through voicefixer despite it not really doing anything if your voice samples are already of decent quality anyways), ditched load_wav_to_torch or whatever it was called because it literally exists as torchaudio.load, sample voice is now a combined waveform of all your samples and will always return even if using a latents file 2023-02-14 21:20:04 +00:00
mrq
15924bd3ec updates chunk size to the chunked tensor length, just in case 2023-02-14 17:13:34 +00:00
mrq
b4ca260de9 added flag to enable/disable voicefixer using CUDA because I'll OOM on my 2060, changed from naively subdividing eavenly (2,4,8,16 pieces) to just incrementing by 1 (1,2,3,4) when trying to subdivide within constraints of the max chunk size for computing voice latents 2023-02-14 16:47:34 +00:00
mrq
b16eb99538 history tab doesn't naively reuse the voice dir instead for results, experimental "divide total sound size until it fits under requests max chunk size" doesn't have a +1 to mess things up (need to re-evaluate how I want to calculate sizes of bests fits eventually) 2023-02-14 16:23:04 +00:00
mrq
2427c98333 Implemented kv_cache "fix" (from 1f3c1b5f4a); guess I should find out why it's crashing DirectML backend 2023-02-13 13:48:31 +00:00
mrq
4ced0296a2 DirectML: fixed redaction/aligner by forcing it to stay on CPU 2023-02-12 20:52:04 +00:00
mrq
b85c9921d7 added button to recalculate voice latents, added experimental switch for computing voice latents 2023-02-12 18:11:40 +00:00
mrq
2210b49cb6 fixed regression with computing conditional latencies outside of the CPU 2023-02-12 17:44:39 +00:00
mrq
a2d95fe208 fixed silently crashing from enabling kv_cache-ing if using the DirectML backend, throw an error when reading a generated audio file that does not have any embedded metadata in it, cleaned up the blocks of code that would DMA/transfer tensors/models between GPU and CPU 2023-02-12 14:46:21 +00:00
mrq
5f1c032312 fixed regression where the auto_conds do not move to the GPU and causes a problem during CVVP compare pass 2023-02-11 20:34:12 +00:00
mrq
c5337a6b51 Added integration for "voicefixer", fixed issue where candidates>1 and lines>1 only outputs the last combined candidate, numbered step for each generation in progress, output time per generation step 2023-02-11 15:02:11 +00:00
mrq
8641cc9906 revamped result formatting, added "kludgy" stop button 2023-02-10 22:12:37 +00:00
mrq
7471bc209c Moved voices out of the tortoise folder because it kept being processed for setup.py 2023-02-10 20:11:56 +00:00
mrq
2bce24b9dd Cleanup 2023-02-10 19:55:33 +00:00
mrq
39b81318f2 Added new options: "Output Sample Rate", "Output Volume", and documentation 2023-02-10 03:02:09 +00:00
mrq
77b39e59ac oops 2023-02-09 22:17:57 +00:00
mrq
3621e16ef9 Added 'Only Load Models Locally' setting 2023-02-09 22:06:55 +00:00
mrq
d7443dfa06 Added option: listen path 2023-02-09 20:42:38 +00:00
mrq
38ee19cd57 I didn't have to suck off a wizard for DirectML support (courtesy of https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/7600 for leading the way) 2023-02-09 05:05:21 +00:00
mrq
a37546ad99 owari da... 2023-02-09 01:53:25 +00:00
mrq
6255c98006 beginning to add DirectML support 2023-02-08 23:03:52 +00:00
mrq
6ebdde58f0 (finally) added the CVVP model weigh slider, latents export more data too for weighing against CVVP 2023-02-07 20:55:56 +00:00
mrq
793515772a un-hardcoded input output sampling rates (changing them "works" but leads to wrong audio, naturally) 2023-02-07 18:34:29 +00:00
mrq
5f934c5feb (maybe) fixed an issue with using prompt redactions (emotions) on CPU causing a crash, because for some reason the wav2vec_alignment assumed CUDA was always available 2023-02-07 07:51:05 -06:00
mrq
d6b5d67f79 forgot to auto compute batch size again if set to 0 2023-02-06 23:14:17 -06:00
mrq
be6fab9dcb added setting to adjust autoregressive sample batch size 2023-02-06 22:31:06 +00:00
mrq
b441a84615 added flag (--cond-latent-max-chunk-size) that should restrict the maximum chunk size when chunking for calculating conditional latents, to avoid OOMing on VRAM 2023-02-06 05:10:07 +00:00
mrq
a1f3b6a4da fixed up the computing conditional latents 2023-02-06 03:44:34 +00:00
mrq
945136330c Forgot to rename the cached latents to the new filename 2023-02-05 23:51:52 +00:00
mrq
5bf21fdbe1 modified how conditional latents are computed (before, it just happened to only bother reading the first 102400/24000=4.26 seconds per audio input, now it will chunk it all to compute latents) 2023-02-05 23:25:41 +00:00
mrq
f66754b557 oops 2023-02-05 20:10:40 +00:00
mrq
1c582b5dc8 added button to refresh voice list, enabling KV caching for a bonerific speed increase (credit to https://github.com/152334H/tortoise-tts-fast/) 2023-02-05 17:59:13 +00:00
mrq
8831522de9 New tunable: pause size/breathing room (governs pause at the end of clips) 2023-02-05 14:45:51 +00:00
mrq
bf32efe503 Added multi-line parsing 2023-02-05 06:17:51 +00:00
mrq
84a9758ab9 Set transformer and model folder to local './models/' instead of for the user profile, because I'm sick of more bloat polluting my C:\ 2023-02-05 04:18:35 +00:00
mrq
ed33e34fcc Added choices to choose between diffusion samplers (p, ddim) 2023-02-05 01:28:31 +00:00
mrq
5c876b81f3 Added small optimization with caching latents, dropped Anaconda for just a py3.9 + pip + venv setup, added helper install scripts for such, cleaned up app.py, added flag '--low-vram' to disable minor optimizations 2023-02-04 01:50:57 +00:00
mrq
8f20afc18f Reverted slight improvement patch, as it's just enough to OOM on GPUs with low VRAM 2023-02-03 21:45:06 +00:00
mrq
e8d4a4f89c Added progress for transforming to audio, changed number inputs to sliders instead 2023-02-03 04:56:30 +00:00
mrq
ea751d7b6c forgot to copy the alleged slight performance improvement patch, added detailed progress information with passing gr.Progress, save a little more info with output 2023-02-03 04:20:01 +00:00
mrq
74f447e5d0 QoL fixes 2023-02-02 21:13:28 +00:00
James Betker
5dc3e269b3
Merge pull request #233 from kianmeng/fix-typos
Fix typos
2023-01-17 18:24:24 -07:00