to23oise-tts

Author	SHA1	Message	Date
mrq	ac0a572679	arg to skip voice latents for grabbing voice lists (for preparing datasets)	2023-02-17 04:50:02 +00:00
mrq	efa43274bd	oops	2023-02-16 13:23:07 +00:00
mrq	63bcadcbbe	actually for real fixed incrementing filenames because i had a regex that actually only worked if candidates or lines>1, cuda now takes priority over dml if you're a nut with both of them installed because you can just specify an override anyways	2023-02-16 01:06:32 +00:00
mrq	7a4460ddf0	added setting "device-override", less naively decide the number to use for results, some other thing	2023-02-15 21:51:22 +00:00
mrq	c12ada600b	added reset generation settings to default button, revamped utilities tab to double as plain jane voice importer (and runs through voicefixer despite it not really doing anything if your voice samples are already of decent quality anyways), ditched load_wav_to_torch or whatever it was called because it literally exists as torchaudio.load, sample voice is now a combined waveform of all your samples and will always return even if using a latents file	2023-02-14 21:20:04 +00:00
mrq	c5337a6b51	Added integration for "voicefixer", fixed issue where candidates>1 and lines>1 only outputs the last combined candidate, numbered step for each generation in progress, output time per generation step	2023-02-11 15:02:11 +00:00
mrq	7471bc209c	Moved voices out of the tortoise folder because it kept being processed for setup.py	2023-02-10 20:11:56 +00:00
mrq	d7443dfa06	Added option: listen path	2023-02-09 20:42:38 +00:00
mrq	38ee19cd57	I didn't have to suck off a wizard for DirectML support (courtesy of https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/7600 for leading the way)	2023-02-09 05:05:21 +00:00
mrq	a37546ad99	owari da...	2023-02-09 01:53:25 +00:00
mrq	793515772a	un-hardcoded input output sampling rates (changing them "works" but leads to wrong audio, naturally)	2023-02-07 18:34:29 +00:00
mrq	5f934c5feb	(maybe) fixed an issue with using prompt redactions (emotions) on CPU causing a crash, because for some reason the wav2vec_alignment assumed CUDA was always available	2023-02-07 07:51:05 -06:00
mrq	945136330c	Forgot to rename the cached latents to the new filename	2023-02-05 23:51:52 +00:00
mrq	5bf21fdbe1	modified how conditional latents are computed (before, it just happened to only bother reading the first 102400/24000=4.26 seconds per audio input, now it will chunk it all to compute latents)	2023-02-05 23:25:41 +00:00
mrq	1c582b5dc8	added button to refresh voice list, enabling KV caching for a bonerific speed increase (credit to https://github.com/152334H/tortoise-tts-fast/)	2023-02-05 17:59:13 +00:00
mrq	ed33e34fcc	Added choices to choose between diffusion samplers (p, ddim)	2023-02-05 01:28:31 +00:00
mrq	5c876b81f3	Added small optimization with caching latents, dropped Anaconda for just a py3.9 + pip + venv setup, added helper install scripts for such, cleaned up app.py, added flag '--low-vram' to disable minor optimizations	2023-02-04 01:50:57 +00:00
mrq	e8d4a4f89c	Added progress for transforming to audio, changed number inputs to sliders instead	2023-02-03 04:56:30 +00:00
Johan Nordberg	b876a6b32c	Allow running on CPU	2022-06-11 20:03:14 +09:00
James Betker	68c1580f94	Merge pull request #74 from jnordberg/improved-cli Add CLI tool	2022-05-28 21:33:53 -06:00
Johan Nordberg	d8f98c07b4	Remove some assumptions about working directory This allows cli tool to run when not standing in repository dir	2022-05-29 01:10:19 +00:00
Johan Nordberg	9f6ae0f0b3	Add tortoise_cli.py	2022-05-28 05:25:23 +00:00
Johan Nordberg	561ae9a31e	Typofix	2022-05-28 01:29:34 +00:00
Johan Nordberg	6a71d90316	Improve splitting on text that has many quotes	2022-05-28 01:22:21 +00:00
Johan Nordberg	f199d6b85c	Add riding hood test Also fix a bug discovered by the test that would seek past the text end if it ended in a boundary	2022-05-27 23:08:53 +00:00
Johan Nordberg	b294f0217f	Improve sentence boundary detection	2022-05-27 05:58:09 +00:00
Josh Ziegler	5b0e50eaa6	avoid mutable default in aligner	2022-05-26 16:20:09 -04:00
Johan Nordberg	e34ffca8fb	Allow passing additional voice directories when loading voices	2022-05-19 21:02:11 +09:00
Danila Berezin	dc3d7b1667	Fix bug in load_voices in audio.py The read.py script did not work with pth latents, so I fix bug in audio.py. It seems that in the elif statement, instead of voice, voices should be clip, clips. And torch stack doesn't work with tuples, so I had to split this operation.	2022-05-17 18:34:54 +03:00
James Betker	11e80b0dae	Merge pull request #42 from jnordberg/main Improve sentence splitting	2022-05-14 08:52:46 -06:00
James Betker	50690e4465	Automatically pick batch size based on available GPU memory	2022-05-13 10:30:02 -06:00
James Betker	cb7adf16af	Remove samples_generator Was useful but the page is more detailed now.	2022-05-13 10:28:16 -06:00
Johan Nordberg	5197904660	Improve sentence splitting	2022-05-13 11:02:17 +00:00
James Betker	75b0e03ab3	Add error message	2022-05-12 20:15:40 -06:00
James Betker	317d55c252	re-enable redaction	2022-05-06 09:36:42 -06:00
James Betker	ffd0238a16	v2.2	2022-05-06 00:11:10 -06:00
James Betker	ddb19f6b0f	Enable redaction by default	2022-05-03 21:21:52 -06:00
James Betker	ee6f9b15ce	Use librosa for loading mp3s	2022-05-03 20:44:31 -06:00
James Betker	9acce239d3	fix paths	2022-05-02 20:56:28 -06:00
James Betker	f499d66493	misc fixes	2022-05-02 18:00:57 -06:00
James Betker	cdf44d7506	more fixes	2022-05-02 16:44:47 -06:00
James Betker	5a95c34a01	fix warning	2022-05-02 16:36:02 -06:00
James Betker	39ec1b0db5	Support totally random voices (and make fixes to previous changes)	2022-05-02 15:40:03 -06:00
James Betker	9007955d88	Add redaction support	2022-05-02 14:57:29 -06:00
James Betker	0ffc191408	Add support for extracting and feeding conditioning latents directly into the model - Adds a new script and API endpoints for doing this - Reworks autoregressive and diffusion models so that the conditioning is computed separately (which will actually provide a mild performance boost) - Updates README This is untested. Need to do the following manual tests (and someday write unit tests for this behemoth before it becomes a problem..) 1) Does get_conditioning_latents.py work? 2) Can I feed those latents back into the model by creating a new voice? 3) Can I still mix and match voices (both with conditioning latents and normal voices) with read.py?	2022-05-01 17:25:18 -06:00
James Betker	a8264f5cef	more cleanup	2022-05-01 16:28:39 -06:00
James Betker	f7c8decfdb	Move everything into the tortoise/ subdirectory For eventual packaging.	2022-05-01 16:24:24 -06:00

47 Commits