tortoise-tts

Author	SHA1	Message	Date
mrq	2e777e8a67	done away with kludgy shit code, just have the user decide how many chunks to slice concat'd samples to (since it actually does improve vocie replicability)	2023-02-15 04:39:31 +00:00
mrq	0bc2c1f540	updates chunk size to the chunked tensor length, just in case	2023-02-14 17:13:34 +00:00
mrq	48275899e8	added flag to enable/disable voicefixer using CUDA because I'll OOM on my 2060, changed from naively subdividing eavenly (2,4,8,16 pieces) to just incrementing by 1 (1,2,3,4) when trying to subdivide within constraints of the max chunk size for computing voice latents	2023-02-14 16:47:34 +00:00
mrq	b648186691	history tab doesn't naively reuse the voice dir instead for results, experimental "divide total sound size until it fits under requests max chunk size" doesn't have a +1 to mess things up (need to re-evaluate how I want to calculate sizes of bests fits eventually)	2023-02-14 16:23:04 +00:00
mrq	5b5e32338c	DirectML: fixed redaction/aligner by forcing it to stay on CPU	2023-02-12 20:52:04 +00:00
mrq	4d01bbd429	added button to recalculate voice latents, added experimental switch for computing voice latents	2023-02-12 18:11:40 +00:00
mrq	88529fda43	fixed regression with computing conditional latencies outside of the CPU	2023-02-12 17:44:39 +00:00
mrq	65f74692a0	fixed silently crashing from enabling kv_cache-ing if using the DirectML backend, throw an error when reading a generated audio file that does not have any embedded metadata in it, cleaned up the blocks of code that would DMA/transfer tensors/models between GPU and CPU	2023-02-12 14:46:21 +00:00
mrq	1b55730e67	fixed regression where the auto_conds do not move to the GPU and causes a problem during CVVP compare pass	2023-02-11 20:34:12 +00:00
mrq	a7330164ab	Added integration for "voicefixer", fixed issue where candidates>1 and lines>1 only outputs the last combined candidate, numbered step for each generation in progress, output time per generation step	2023-02-11 15:02:11 +00:00
mrq	4f903159ee	revamped result formatting, added "kludgy" stop button	2023-02-10 22:12:37 +00:00
mrq	efa556b793	Added new options: "Output Sample Rate", "Output Volume", and documentation	2023-02-10 03:02:09 +00:00
mrq	57af25c6c0	oops	2023-02-09 22:17:57 +00:00
mrq	504db0d1ac	Added 'Only Load Models Locally' setting	2023-02-09 22:06:55 +00:00
mrq	3f8302a680	I didn't have to suck off a wizard for DirectML support (courtesy of https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/7600 for leading the way)	2023-02-09 05:05:21 +00:00
mrq	b23d6b4b4c	owari da...	2023-02-09 01:53:25 +00:00
mrq	494f3c84a1	beginning to add DirectML support	2023-02-08 23:03:52 +00:00
mrq	e45e4431d1	(finally) added the CVVP model weigh slider, latents export more data too for weighing against CVVP	2023-02-07 20:55:56 +00:00
mrq	f7274112c3	un-hardcoded input output sampling rates (changing them "works" but leads to wrong audio, naturally)	2023-02-07 18:34:29 +00:00
mrq	55058675d2	(maybe) fixed an issue with using prompt redactions (emotions) on CPU causing a crash, because for some reason the wav2vec_alignment assumed CUDA was always available	2023-02-07 07:51:05 -06:00
mrq	328deeddae	forgot to auto compute batch size again if set to 0	2023-02-06 23:14:17 -06:00
mrq	a3c077ba13	added setting to adjust autoregressive sample batch size	2023-02-06 22:31:06 +00:00
mrq	b8b15d827d	added flag (--cond-latent-max-chunk-size) that should restrict the maximum chunk size when chunking for calculating conditional latents, to avoid OOMing on VRAM	2023-02-06 05:10:07 +00:00
mrq	319e7ec0a6	fixed up the computing conditional latents	2023-02-06 03:44:34 +00:00
mrq	c2c9b1b683	modified how conditional latents are computed (before, it just happened to only bother reading the first 102400/24000=4.26 seconds per audio input, now it will chunk it all to compute latents)	2023-02-05 23:25:41 +00:00
mrq	4ea997106e	oops	2023-02-05 20:10:40 +00:00
mrq	daebc6c21c	added button to refresh voice list, enabling KV caching for a bonerific speed increase (credit to https://github.com/152334H/tortoise-tts-fast/)	2023-02-05 17:59:13 +00:00
mrq	7b767e1442	New tunable: pause size/breathing room (governs pause at the end of clips)	2023-02-05 14:45:51 +00:00
mrq	f38c479e9b	Added multi-line parsing	2023-02-05 06:17:51 +00:00
mrq	111c45b181	Set transformer and model folder to local './models/' instead of for the user profile, because I'm sick of more bloat polluting my C:\	2023-02-05 04:18:35 +00:00
mrq	078dc0c6e2	Added choices to choose between diffusion samplers (p, ddim)	2023-02-05 01:28:31 +00:00
mrq	4274cce218	Added small optimization with caching latents, dropped Anaconda for just a py3.9 + pip + venv setup, added helper install scripts for such, cleaned up app.py, added flag '--low-vram' to disable minor optimizations	2023-02-04 01:50:57 +00:00
mrq	061aa65ac4	Reverted slight improvement patch, as it's just enough to OOM on GPUs with low VRAM	2023-02-03 21:45:06 +00:00
mrq	4f359bffa4	Added progress for transforming to audio, changed number inputs to sliders instead	2023-02-03 04:56:30 +00:00
mrq	ef237c70d0	forgot to copy the alleged slight performance improvement patch, added detailed progress information with passing gr.Progress, save a little more info with output	2023-02-03 04:20:01 +00:00
Johan Nordberg	dba14650cb	Typofix	2022-06-11 21:19:07 +09:00
Johan Nordberg	5c7a50820c	Allow running on CPU	2022-06-11 20:03:14 +09:00
Johan Nordberg	a641d8f29b	Add tortoise_cli.py	2022-05-28 05:25:23 +00:00
Johan Nordberg	f396dcc023	Skip CLVP if cvvp_amount is 1 Also fixes formatting bug in log message	2022-05-25 11:12:53 +00:00
Johan Nordberg	0ca4d8f291	Revive CVVP model	2022-05-25 10:22:50 +00:00
James Betker	a1c131bde9	Merge remote-tracking branch 'origin/main' # Conflicts: # tortoise/read.py	2022-05-19 10:34:54 -06:00
Johan Nordberg	00730d2786	Allow setting models path from environment variable	2022-05-19 21:02:09 +09:00
James Betker	8fdf516e62	Remove CVVP After training a similar model for a different purpose, I realized that this model is faulty: the contrastive loss it uses only pays attention to high-frequency details which do not contribute meaningfully to output quality. I validated this by comparing a no-CVVP output with a baseline using tts-scores and found no differences.	2022-05-17 12:21:25 -06:00
James Betker	a1ae84c49d	Add a way to get deterministic behavior from tortoise and add debug states for reporting	2022-05-17 12:11:18 -06:00
James Betker	0570034eda	Automatically pick batch size based on available GPU memory	2022-05-13 10:30:02 -06:00
James Betker	b3b36c0041	update model paths (including clvp2!)	2022-05-12 20:18:11 -06:00
James Betker	44a4419348	CLVP2!	2022-05-12 13:23:03 -06:00
Mark Baushenko	cc38333249	Optimizing graphics card memory During inference it does not store gradients, which take up most of the video memory	2022-05-11 16:35:11 +03:00
James Betker	e4e9523900	re-enable redaction	2022-05-06 09:36:42 -06:00
James Betker	9151650559	temporarily disable redaction	2022-05-06 09:06:20 -06:00

1 2

61 Commits