Commit Graph

262 Commits

Author SHA1 Message Date
64ae4bb563 Updated setup scripts to use cuda 11.8 and torch 2.0.0 to fix RTX 4090 compatibility
Added API to the generate function, so it can be called from other scripts
2023-02-11 19:46:26 +11:00
mrq
9bf1ea5b0a History tab (3/10 it works) 2023-02-11 01:45:25 +00:00
mrq
340a89f883 Numbering predicates on input_#.json files instead of "number of wavs" 2023-02-10 22:51:56 +00:00
mrq
8641cc9906 revamped result formatting, added "kludgy" stop button 2023-02-10 22:12:37 +00:00
mrq
8f789d17b9 Slight notebook adjust 2023-02-10 20:22:12 +00:00
mrq
7471bc209c Moved voices out of the tortoise folder because it kept being processed for setup.py 2023-02-10 20:11:56 +00:00
mrq
2bce24b9dd Cleanup 2023-02-10 19:55:33 +00:00
mrq
811539b20a Added the remaining input settings 2023-02-10 16:47:57 +00:00
mrq
f5ed5499a0 Added a link to the colab notebook 2023-02-10 16:26:13 +00:00
mrq
07c54ad361 Colab notebook (part II) 2023-02-10 16:12:11 +00:00
mrq
939c89f16e Colab notebook (part 1) 2023-02-10 15:58:56 +00:00
mrq
39b81318f2 Added new options: "Output Sample Rate", "Output Volume", and documentation 2023-02-10 03:02:09 +00:00
mrq
77b39e59ac oops 2023-02-09 22:17:57 +00:00
mrq
3621e16ef9 Added 'Only Load Models Locally' setting 2023-02-09 22:06:55 +00:00
mrq
dccedc3f66 Added and documented 2023-02-09 21:07:51 +00:00
mrq
8c30cd1aa4 Oops 2023-02-09 20:49:22 +00:00
mrq
d7443dfa06 Added option: listen path 2023-02-09 20:42:38 +00:00
mrq
38ee19cd57 I didn't have to suck off a wizard for DirectML support (courtesy of https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/7600 for leading the way) 2023-02-09 05:05:21 +00:00
mrq
716e227953 oops 2023-02-09 02:39:08 +00:00
mrq
a37546ad99 owari da... 2023-02-09 01:53:25 +00:00
mrq
6255c98006 beginning to add DirectML support 2023-02-08 23:03:52 +00:00
mrq
d9a9fa6a82 Added two flags/settings: embed output settings, slimmer computed voice latents 2023-02-08 14:14:28 +00:00
mrq
f03b6b8d97 disable telemetry/what-have-you if not requesting a public Gradio URL 2023-02-07 21:44:16 +00:00
mrq
479f30c808 Merge pull request 'Added convert.sh' (#8) from lightmare/tortoise-tts:convert_sh into main
Reviewed-on: mrq/tortoise-tts#8
2023-02-07 21:09:00 +00:00
lightmare
40f52fa8d1 Added convert.sh 2023-02-07 21:09:00 +00:00
mrq
6ebdde58f0 (finally) added the CVVP model weigh slider, latents export more data too for weighing against CVVP 2023-02-07 20:55:56 +00:00
mrq
793515772a un-hardcoded input output sampling rates (changing them "works" but leads to wrong audio, naturally) 2023-02-07 18:34:29 +00:00
mrq
5f934c5feb (maybe) fixed an issue with using prompt redactions (emotions) on CPU causing a crash, because for some reason the wav2vec_alignment assumed CUDA was always available 2023-02-07 07:51:05 -06:00
mrq
d6b5d67f79 forgot to auto compute batch size again if set to 0 2023-02-06 23:14:17 -06:00
mrq
66cc6e2791 changed ROCm pip index URL from 5.2 to 5.1.1, because it's what worked for me desu 2023-02-06 22:52:40 -06:00
mrq
6515d3b6de added shell scripts for linux, wrapped sorted() for voice list, I guess 2023-02-06 21:54:31 -06:00
mrq
edd642c3d3 fixed combining audio, somehow this broke, oops 2023-02-07 00:26:22 +00:00
mrq
be6fab9dcb added setting to adjust autoregressive sample batch size 2023-02-06 22:31:06 +00:00
mrq
100b4d7e61 Added settings page, added checking for updates (disabled by default), some other things that I don't remember 2023-02-06 21:43:01 +00:00
mrq
240858487f Added encoding and ripping latents used to generate the voice 2023-02-06 16:32:09 +00:00
mrq
92cf9e1efe Added tab to read and copy settings from a voice clip (in the future, I'll see about enmbedding the latent used to generate the voice) 2023-02-06 16:00:44 +00:00
mrq
5affc777e0 added another (somewhat adequate) example, added metadata storage to generated files (need to add in a viewer later) 2023-02-06 14:17:41 +00:00
mrq
b441a84615 added flag (--cond-latent-max-chunk-size) that should restrict the maximum chunk size when chunking for calculating conditional latents, to avoid OOMing on VRAM 2023-02-06 05:10:07 +00:00
mrq
a1f3b6a4da fixed up the computing conditional latents 2023-02-06 03:44:34 +00:00
mrq
2cfd3bc213 updated README (before I go mad trying to nitpick and edit it while getting distracted from an iToddler sperging) 2023-02-06 00:56:17 +00:00
mrq
945136330c Forgot to rename the cached latents to the new filename 2023-02-05 23:51:52 +00:00
mrq
5bf21fdbe1 modified how conditional latents are computed (before, it just happened to only bother reading the first 102400/24000=4.26 seconds per audio input, now it will chunk it all to compute latents) 2023-02-05 23:25:41 +00:00
mrq
f66754b557 oops 2023-02-05 20:10:40 +00:00
mrq
1c582b5dc8 added button to refresh voice list, enabling KV caching for a bonerific speed increase (credit to https://github.com/152334H/tortoise-tts-fast/) 2023-02-05 17:59:13 +00:00
mrq
8831522de9 New tunable: pause size/breathing room (governs pause at the end of clips) 2023-02-05 14:45:51 +00:00
mrq
c7f85dbba2 Fix to keep prompted emotion for every split line 2023-02-05 06:55:09 +00:00
mrq
79e0b85602 Updated .gitignore (that does not apply to me because I have a bad habit of having a repo copy separate from a working copy) 2023-02-05 06:40:50 +00:00
mrq
bc567d7263 Skip combining if not splitting, also avoids reading back the audio files to combine them by keeping them in memory 2023-02-05 06:35:32 +00:00
mrq
bf32efe503 Added multi-line parsing 2023-02-05 06:17:51 +00:00
mrq
cd94cc8459 Fixed accidentally not passing user-provided samples/iteration values (oops), fixed error thrown when trying to write unicode because python sucks 2023-02-05 05:51:57 +00:00