1
1
forked from mrq/tortoise-tts
Commit Graph

137 Commits

Author SHA1 Message Date
mrq
42cb1f3674 added args for tokenizer and diffusion model (so I don't have to add it later) 2023-03-15 00:30:28 +00:00
mrq
65a43deb9e why didn't I also have it use chunks for computing the AR conditional latents (instead of just the diffusion aspect) 2023-03-14 01:13:49 +00:00
mrq
97cd58e7eb maybe solved that odd VRAM spike when doing the clvp pass 2023-03-12 12:48:29 -05:00
mrq
fec0685405 revert muh clean code 2023-03-10 00:56:29 +00:00
mrq
0514f011ff how did I botch this, I don't think it affects anything since it never thrown an error 2023-03-09 22:36:12 +00:00
mrq
00be48670b i am very smart 2023-03-09 02:06:44 +00:00
mrq
bbeee40ab3 forgot to convert to gigabytes 2023-03-09 00:51:13 +00:00
mrq
6410df569b expose VRAM easily 2023-03-09 00:38:31 +00:00
mrq
3dd5cad324 reverting additional auto-suggested batch sizes, per mrq/ai-voice-cloning#87 proving it in fact, is not a good idea 2023-03-07 19:38:02 +00:00
mrq
cc36c0997c didn't get a chance to commit this this morning 2023-03-07 15:43:09 +00:00
mrq
fffea7fc03 unmarried the config.json to the bigvgan by downloading the right one 2023-03-07 13:37:45 +00:00
mrq
26133c2031 do not reload AR/vocoder if already loaded 2023-03-07 04:33:49 +00:00
mrq
e2db36af60 added loading vocoders on the fly 2023-03-07 02:44:09 +00:00
mrq
7b2aa51abc oops 2023-03-06 21:32:20 +00:00
mrq
7f98727ad5 added option to specify autoregressive model at tts generation time (for a spicy feature later) 2023-03-06 20:31:19 +00:00
mrq
6fcd8c604f moved bigvgan model to a huggingspace repo 2023-03-05 19:47:22 +00:00
mrq
06bdf72b89 load the model on CPU because torch doesn't like loading models directly to GPU (it just follows the default vocoder loading behavior) 2023-03-03 13:53:21 +00:00
mrq
2ba0e056cd attribution 2023-03-03 06:45:35 +00:00
mrq
aca32a71f7 added BigVGAN in place of default vocoder (credit to https://github.com/deviandice/tortoise-tts-BigVGAN) 2023-03-03 06:30:58 +00:00
mrq
a9de016230 added storing the loaded model's hash to the TTS object instead of relying on jerryrig injecting it (although I still have to for the weirdos who refuse to update the right way), added a parameter when loading voices to load a latent tagged with a model's hash so latents are per-model now 2023-03-02 00:44:42 +00:00
mrq
7b839a4263 applied the bitsandbytes wrapper to tortoise inference (not sure if it matters) 2023-02-28 01:42:10 +00:00
mrq
7cc0250a1a added more kill checks, since it only actually did it for the first iteration of a loop 2023-02-24 23:10:04 +00:00
mrq
de46cf7831 adding magically deleted files back (might have a hunch on what happened) 2023-02-24 19:30:04 +00:00
mrq
34b232927e Oops 2023-02-19 01:54:21 +00:00
mrq
d8c6739820 added constructor argument and function to load a user-specified autoregressive model 2023-02-18 14:08:45 +00:00
mrq
00cb19b6cf arg to skip voice latents for grabbing voice lists (for preparing datasets) 2023-02-17 04:50:02 +00:00
mrq
6ad3477bfd one more update 2023-02-16 23:18:02 +00:00
mrq
30298b9ca3 fixing brain worms 2023-02-16 21:36:49 +00:00
mrq
d159346572 oops 2023-02-16 13:23:07 +00:00
mrq
eca61af016 actually for real fixed incrementing filenames because i had a regex that actually only worked if candidates or lines>1, cuda now takes priority over dml if you're a nut with both of them installed because you can just specify an override anyways 2023-02-16 01:06:32 +00:00
mrq
ec80ca632b added setting "device-override", less naively decide the number to use for results, some other thing 2023-02-15 21:51:22 +00:00
mrq
ea1bc770aa added option: force cpu for conditioning latents, for when you want low chunk counts but your GPU keeps OOMing because fuck fragmentation 2023-02-15 05:01:40 +00:00
mrq
2e777e8a67 done away with kludgy shit code, just have the user decide how many chunks to slice concat'd samples to (since it actually does improve vocie replicability) 2023-02-15 04:39:31 +00:00
mrq
314feaeea1 added reset generation settings to default button, revamped utilities tab to double as plain jane voice importer (and runs through voicefixer despite it not really doing anything if your voice samples are already of decent quality anyways), ditched load_wav_to_torch or whatever it was called because it literally exists as torchaudio.load, sample voice is now a combined waveform of all your samples and will always return even if using a latents file 2023-02-14 21:20:04 +00:00
mrq
0bc2c1f540 updates chunk size to the chunked tensor length, just in case 2023-02-14 17:13:34 +00:00
mrq
48275899e8 added flag to enable/disable voicefixer using CUDA because I'll OOM on my 2060, changed from naively subdividing eavenly (2,4,8,16 pieces) to just incrementing by 1 (1,2,3,4) when trying to subdivide within constraints of the max chunk size for computing voice latents 2023-02-14 16:47:34 +00:00
mrq
b648186691 history tab doesn't naively reuse the voice dir instead for results, experimental "divide total sound size until it fits under requests max chunk size" doesn't have a +1 to mess things up (need to re-evaluate how I want to calculate sizes of bests fits eventually) 2023-02-14 16:23:04 +00:00
mrq
8250a79b23 Implemented kv_cache "fix" (from 1f3c1b5f4a); guess I should find out why it's crashing DirectML backend 2023-02-13 13:48:31 +00:00
mrq
5b5e32338c DirectML: fixed redaction/aligner by forcing it to stay on CPU 2023-02-12 20:52:04 +00:00
mrq
4d01bbd429 added button to recalculate voice latents, added experimental switch for computing voice latents 2023-02-12 18:11:40 +00:00
mrq
88529fda43 fixed regression with computing conditional latencies outside of the CPU 2023-02-12 17:44:39 +00:00
mrq
65f74692a0 fixed silently crashing from enabling kv_cache-ing if using the DirectML backend, throw an error when reading a generated audio file that does not have any embedded metadata in it, cleaned up the blocks of code that would DMA/transfer tensors/models between GPU and CPU 2023-02-12 14:46:21 +00:00
mrq
1b55730e67 fixed regression where the auto_conds do not move to the GPU and causes a problem during CVVP compare pass 2023-02-11 20:34:12 +00:00
mrq
a7330164ab Added integration for "voicefixer", fixed issue where candidates>1 and lines>1 only outputs the last combined candidate, numbered step for each generation in progress, output time per generation step 2023-02-11 15:02:11 +00:00
mrq
4f903159ee revamped result formatting, added "kludgy" stop button 2023-02-10 22:12:37 +00:00
mrq
52a9ed7858 Moved voices out of the tortoise folder because it kept being processed for setup.py 2023-02-10 20:11:56 +00:00
mrq
efa556b793 Added new options: "Output Sample Rate", "Output Volume", and documentation 2023-02-10 03:02:09 +00:00
mrq
57af25c6c0 oops 2023-02-09 22:17:57 +00:00
mrq
504db0d1ac Added 'Only Load Models Locally' setting 2023-02-09 22:06:55 +00:00
mrq
729be135ef Added option: listen path 2023-02-09 20:42:38 +00:00