|
1f674a468f
|
added flag to disable preprocessing (because some IPAs will turn into ASCII, implicitly enable for using the specific ipa.json tokenizer vocab)
|
2023-03-16 04:33:03 +00:00 |
|
|
42cb1f3674
|
added args for tokenizer and diffusion model (so I don't have to add it later)
|
2023-03-15 00:30:28 +00:00 |
|
|
65a43deb9e
|
why didn't I also have it use chunks for computing the AR conditional latents (instead of just the diffusion aspect)
|
2023-03-14 01:13:49 +00:00 |
|
|
97cd58e7eb
|
maybe solved that odd VRAM spike when doing the clvp pass
|
2023-03-12 12:48:29 -05:00 |
|
|
fec0685405
|
revert muh clean code
|
2023-03-10 00:56:29 +00:00 |
|
|
0514f011ff
|
how did I botch this, I don't think it affects anything since it never thrown an error
|
2023-03-09 22:36:12 +00:00 |
|
|
00be48670b
|
i am very smart
|
2023-03-09 02:06:44 +00:00 |
|
|
bbeee40ab3
|
forgot to convert to gigabytes
|
2023-03-09 00:51:13 +00:00 |
|
|
6410df569b
|
expose VRAM easily
|
2023-03-09 00:38:31 +00:00 |
|
|
3dd5cad324
|
reverting additional auto-suggested batch sizes, per mrq/ai-voice-cloning#87 proving it in fact, is not a good idea
|
2023-03-07 19:38:02 +00:00 |
|
|
cc36c0997c
|
didn't get a chance to commit this this morning
|
2023-03-07 15:43:09 +00:00 |
|
|
fffea7fc03
|
unmarried the config.json to the bigvgan by downloading the right one
|
2023-03-07 13:37:45 +00:00 |
|
|
26133c2031
|
do not reload AR/vocoder if already loaded
|
2023-03-07 04:33:49 +00:00 |
|
|
e2db36af60
|
added loading vocoders on the fly
|
2023-03-07 02:44:09 +00:00 |
|
|
7b2aa51abc
|
oops
|
2023-03-06 21:32:20 +00:00 |
|
|
7f98727ad5
|
added option to specify autoregressive model at tts generation time (for a spicy feature later)
|
2023-03-06 20:31:19 +00:00 |
|
|
6fcd8c604f
|
moved bigvgan model to a huggingspace repo
|
2023-03-05 19:47:22 +00:00 |
|
|
0f3261e071
|
you should have migrated by now, if anything breaks it's on (You)
|
2023-03-05 14:03:18 +00:00 |
|
|
06bdf72b89
|
load the model on CPU because torch doesn't like loading models directly to GPU (it just follows the default vocoder loading behavior)
|
2023-03-03 13:53:21 +00:00 |
|
|
2ba0e056cd
|
attribution
|
2023-03-03 06:45:35 +00:00 |
|
|
aca32a71f7
|
added BigVGAN in place of default vocoder (credit to https://github.com/deviandice/tortoise-tts-BigVGAN)
|
2023-03-03 06:30:58 +00:00 |
|
|
a9de016230
|
added storing the loaded model's hash to the TTS object instead of relying on jerryrig injecting it (although I still have to for the weirdos who refuse to update the right way), added a parameter when loading voices to load a latent tagged with a model's hash so latents are per-model now
|
2023-03-02 00:44:42 +00:00 |
|
|
7b839a4263
|
applied the bitsandbytes wrapper to tortoise inference (not sure if it matters)
|
2023-02-28 01:42:10 +00:00 |
|
|
7cc0250a1a
|
added more kill checks, since it only actually did it for the first iteration of a loop
|
2023-02-24 23:10:04 +00:00 |
|
|
de46cf7831
|
adding magically deleted files back (might have a hunch on what happened)
|
2023-02-24 19:30:04 +00:00 |
|
|
2c7c02eb5c
|
moved the old readme back, to align with how DLAS is setup, sorta
|
2023-02-19 17:37:36 +00:00 |
|
|
34b232927e
|
Oops
|
2023-02-19 01:54:21 +00:00 |
|
|
d8c6739820
|
added constructor argument and function to load a user-specified autoregressive model
|
2023-02-18 14:08:45 +00:00 |
|
|
00cb19b6cf
|
arg to skip voice latents for grabbing voice lists (for preparing datasets)
|
2023-02-17 04:50:02 +00:00 |
|
|
b255a77a05
|
updated notebooks to use the new "main" setup
|
2023-02-17 03:31:19 +00:00 |
|
|
150138860c
|
oops
|
2023-02-17 01:46:38 +00:00 |
|
|
6ad3477bfd
|
one more update
|
2023-02-16 23:18:02 +00:00 |
|
|
413703b572
|
fixed colab to use the new repo, reorder loading tortoise before the web UI for people who don't wait
|
2023-02-16 22:12:13 +00:00 |
|
|
30298b9ca3
|
fixing brain worms
|
2023-02-16 21:36:49 +00:00 |
|
|
d53edf540e
|
pip-ifying things
|
2023-02-16 19:48:06 +00:00 |
|
|
d159346572
|
oops
|
2023-02-16 13:23:07 +00:00 |
|
|
eca61af016
|
actually for real fixed incrementing filenames because i had a regex that actually only worked if candidates or lines>1, cuda now takes priority over dml if you're a nut with both of them installed because you can just specify an override anyways
|
2023-02-16 01:06:32 +00:00 |
|
|
ec80ca632b
|
added setting "device-override", less naively decide the number to use for results, some other thing
|
2023-02-15 21:51:22 +00:00 |
|
|
dcc5c140e6
|
fixes
|
2023-02-15 15:33:08 +00:00 |
|
|
729b292515
|
oops x2
|
2023-02-15 05:57:42 +00:00 |
|
|
5bf98de301
|
oops
|
2023-02-15 05:55:01 +00:00 |
|
|
3e8365fdec
|
voicefixed files do not overwrite, as my autism wants to hear the difference between them, incrementing file format fixed for real
|
2023-02-15 05:49:28 +00:00 |
|
|
ea1bc770aa
|
added option: force cpu for conditioning latents, for when you want low chunk counts but your GPU keeps OOMing because fuck fragmentation
|
2023-02-15 05:01:40 +00:00 |
|
|
b721e395b5
|
modified conversion scripts to not give a shit about bitrate and formats since torchaudio.load handles all of that anyways, and it all gets resampled anyways
|
2023-02-15 04:44:14 +00:00 |
|
|
2e777e8a67
|
done away with kludgy shit code, just have the user decide how many chunks to slice concat'd samples to (since it actually does improve vocie replicability)
|
2023-02-15 04:39:31 +00:00 |
|
|
314feaeea1
|
added reset generation settings to default button, revamped utilities tab to double as plain jane voice importer (and runs through voicefixer despite it not really doing anything if your voice samples are already of decent quality anyways), ditched load_wav_to_torch or whatever it was called because it literally exists as torchaudio.load, sample voice is now a combined waveform of all your samples and will always return even if using a latents file
|
2023-02-14 21:20:04 +00:00 |
|
|
0bc2c1f540
|
updates chunk size to the chunked tensor length, just in case
|
2023-02-14 17:13:34 +00:00 |
|
|
48275899e8
|
added flag to enable/disable voicefixer using CUDA because I'll OOM on my 2060, changed from naively subdividing eavenly (2,4,8,16 pieces) to just incrementing by 1 (1,2,3,4) when trying to subdivide within constraints of the max chunk size for computing voice latents
|
2023-02-14 16:47:34 +00:00 |
|
|
b648186691
|
history tab doesn't naively reuse the voice dir instead for results, experimental "divide total sound size until it fits under requests max chunk size" doesn't have a +1 to mess things up (need to re-evaluate how I want to calculate sizes of bests fits eventually)
|
2023-02-14 16:23:04 +00:00 |
|
|
47f4b5bf81
|
voicefixer uses CUDA if exposed
|
2023-02-13 15:30:49 +00:00 |
|