|
7b2aa51abc
|
oops
|
2023-03-06 21:32:20 +00:00 |
|
|
7f98727ad5
|
added option to specify autoregressive model at tts generation time (for a spicy feature later)
|
2023-03-06 20:31:19 +00:00 |
|
|
6fcd8c604f
|
moved bigvgan model to a huggingspace repo
|
2023-03-05 19:47:22 +00:00 |
|
|
06bdf72b89
|
load the model on CPU because torch doesn't like loading models directly to GPU (it just follows the default vocoder loading behavior)
|
2023-03-03 13:53:21 +00:00 |
|
|
2ba0e056cd
|
attribution
|
2023-03-03 06:45:35 +00:00 |
|
|
aca32a71f7
|
added BigVGAN in place of default vocoder (credit to https://github.com/deviandice/tortoise-tts-BigVGAN)
|
2023-03-03 06:30:58 +00:00 |
|
|
a9de016230
|
added storing the loaded model's hash to the TTS object instead of relying on jerryrig injecting it (although I still have to for the weirdos who refuse to update the right way), added a parameter when loading voices to load a latent tagged with a model's hash so latents are per-model now
|
2023-03-02 00:44:42 +00:00 |
|
|
7b839a4263
|
applied the bitsandbytes wrapper to tortoise inference (not sure if it matters)
|
2023-02-28 01:42:10 +00:00 |
|
|
7cc0250a1a
|
added more kill checks, since it only actually did it for the first iteration of a loop
|
2023-02-24 23:10:04 +00:00 |
|
|
de46cf7831
|
adding magically deleted files back (might have a hunch on what happened)
|
2023-02-24 19:30:04 +00:00 |
|
|
34b232927e
|
Oops
|
2023-02-19 01:54:21 +00:00 |
|
|
d8c6739820
|
added constructor argument and function to load a user-specified autoregressive model
|
2023-02-18 14:08:45 +00:00 |
|
|
00cb19b6cf
|
arg to skip voice latents for grabbing voice lists (for preparing datasets)
|
2023-02-17 04:50:02 +00:00 |
|
|
6ad3477bfd
|
one more update
|
2023-02-16 23:18:02 +00:00 |
|
|
30298b9ca3
|
fixing brain worms
|
2023-02-16 21:36:49 +00:00 |
|
|
d159346572
|
oops
|
2023-02-16 13:23:07 +00:00 |
|
|
eca61af016
|
actually for real fixed incrementing filenames because i had a regex that actually only worked if candidates or lines>1, cuda now takes priority over dml if you're a nut with both of them installed because you can just specify an override anyways
|
2023-02-16 01:06:32 +00:00 |
|
|
ec80ca632b
|
added setting "device-override", less naively decide the number to use for results, some other thing
|
2023-02-15 21:51:22 +00:00 |
|
|
ea1bc770aa
|
added option: force cpu for conditioning latents, for when you want low chunk counts but your GPU keeps OOMing because fuck fragmentation
|
2023-02-15 05:01:40 +00:00 |
|
|
2e777e8a67
|
done away with kludgy shit code, just have the user decide how many chunks to slice concat'd samples to (since it actually does improve vocie replicability)
|
2023-02-15 04:39:31 +00:00 |
|
|
314feaeea1
|
added reset generation settings to default button, revamped utilities tab to double as plain jane voice importer (and runs through voicefixer despite it not really doing anything if your voice samples are already of decent quality anyways), ditched load_wav_to_torch or whatever it was called because it literally exists as torchaudio.load, sample voice is now a combined waveform of all your samples and will always return even if using a latents file
|
2023-02-14 21:20:04 +00:00 |
|
|
0bc2c1f540
|
updates chunk size to the chunked tensor length, just in case
|
2023-02-14 17:13:34 +00:00 |
|
|
48275899e8
|
added flag to enable/disable voicefixer using CUDA because I'll OOM on my 2060, changed from naively subdividing eavenly (2,4,8,16 pieces) to just incrementing by 1 (1,2,3,4) when trying to subdivide within constraints of the max chunk size for computing voice latents
|
2023-02-14 16:47:34 +00:00 |
|
|
b648186691
|
history tab doesn't naively reuse the voice dir instead for results, experimental "divide total sound size until it fits under requests max chunk size" doesn't have a +1 to mess things up (need to re-evaluate how I want to calculate sizes of bests fits eventually)
|
2023-02-14 16:23:04 +00:00 |
|
|
8250a79b23
|
Implemented kv_cache "fix" (from 1f3c1b5f4a ); guess I should find out why it's crashing DirectML backend
|
2023-02-13 13:48:31 +00:00 |
|
|
5b5e32338c
|
DirectML: fixed redaction/aligner by forcing it to stay on CPU
|
2023-02-12 20:52:04 +00:00 |
|
|
4d01bbd429
|
added button to recalculate voice latents, added experimental switch for computing voice latents
|
2023-02-12 18:11:40 +00:00 |
|
|
88529fda43
|
fixed regression with computing conditional latencies outside of the CPU
|
2023-02-12 17:44:39 +00:00 |
|
|
65f74692a0
|
fixed silently crashing from enabling kv_cache-ing if using the DirectML backend, throw an error when reading a generated audio file that does not have any embedded metadata in it, cleaned up the blocks of code that would DMA/transfer tensors/models between GPU and CPU
|
2023-02-12 14:46:21 +00:00 |
|
|
1b55730e67
|
fixed regression where the auto_conds do not move to the GPU and causes a problem during CVVP compare pass
|
2023-02-11 20:34:12 +00:00 |
|
|
a7330164ab
|
Added integration for "voicefixer", fixed issue where candidates>1 and lines>1 only outputs the last combined candidate, numbered step for each generation in progress, output time per generation step
|
2023-02-11 15:02:11 +00:00 |
|
|
4f903159ee
|
revamped result formatting, added "kludgy" stop button
|
2023-02-10 22:12:37 +00:00 |
|
|
52a9ed7858
|
Moved voices out of the tortoise folder because it kept being processed for setup.py
|
2023-02-10 20:11:56 +00:00 |
|
|
efa556b793
|
Added new options: "Output Sample Rate", "Output Volume", and documentation
|
2023-02-10 03:02:09 +00:00 |
|
|
57af25c6c0
|
oops
|
2023-02-09 22:17:57 +00:00 |
|
|
504db0d1ac
|
Added 'Only Load Models Locally' setting
|
2023-02-09 22:06:55 +00:00 |
|
|
729be135ef
|
Added option: listen path
|
2023-02-09 20:42:38 +00:00 |
|
|
3f8302a680
|
I didn't have to suck off a wizard for DirectML support (courtesy of https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/7600 for leading the way)
|
2023-02-09 05:05:21 +00:00 |
|
|
b23d6b4b4c
|
owari da...
|
2023-02-09 01:53:25 +00:00 |
|
|
494f3c84a1
|
beginning to add DirectML support
|
2023-02-08 23:03:52 +00:00 |
|
|
e45e4431d1
|
(finally) added the CVVP model weigh slider, latents export more data too for weighing against CVVP
|
2023-02-07 20:55:56 +00:00 |
|
|
f7274112c3
|
un-hardcoded input output sampling rates (changing them "works" but leads to wrong audio, naturally)
|
2023-02-07 18:34:29 +00:00 |
|
|
55058675d2
|
(maybe) fixed an issue with using prompt redactions (emotions) on CPU causing a crash, because for some reason the wav2vec_alignment assumed CUDA was always available
|
2023-02-07 07:51:05 -06:00 |
|
|
328deeddae
|
forgot to auto compute batch size again if set to 0
|
2023-02-06 23:14:17 -06:00 |
|
|
a3c077ba13
|
added setting to adjust autoregressive sample batch size
|
2023-02-06 22:31:06 +00:00 |
|
|
b8b15d827d
|
added flag (--cond-latent-max-chunk-size) that should restrict the maximum chunk size when chunking for calculating conditional latents, to avoid OOMing on VRAM
|
2023-02-06 05:10:07 +00:00 |
|
|
319e7ec0a6
|
fixed up the computing conditional latents
|
2023-02-06 03:44:34 +00:00 |
|
|
b23f583c4e
|
Forgot to rename the cached latents to the new filename
|
2023-02-05 23:51:52 +00:00 |
|
|
c2c9b1b683
|
modified how conditional latents are computed (before, it just happened to only bother reading the first 102400/24000=4.26 seconds per audio input, now it will chunk it all to compute latents)
|
2023-02-05 23:25:41 +00:00 |
|
|
4ea997106e
|
oops
|
2023-02-05 20:10:40 +00:00 |
|