mrq
|
815ae5d707
|
Merge pull request 'feat: support .flac voice files' (#43) from NtTestAlert/tortoise-tts:support_flac_voice into main
Reviewed-on: #43
|
2023-04-01 16:37:56 +00:00 |
|
|
2cd7b72688
|
feat: support .flac voice files
|
2023-04-01 15:08:31 +02:00 |
|
|
0bcdf81d04
|
option to decouple sample batch size from CLVP candidate selection size (currently just unsqueezes the batches)
|
2023-03-21 21:33:46 +00:00 |
|
|
d1ad634ea9
|
added japanese preprocessor for tokenizer
|
2023-03-17 20:03:02 +00:00 |
|
|
af78e3978a
|
deduce if preprocessing text by checking the JSON itself instead
|
2023-03-16 14:41:04 +00:00 |
|
|
e201746eeb
|
added diffusion_model and tokenizer_json as arguments for settings editing
|
2023-03-16 14:19:24 +00:00 |
|
|
1f674a468f
|
added flag to disable preprocessing (because some IPAs will turn into ASCII, implicitly enable for using the specific ipa.json tokenizer vocab)
|
2023-03-16 04:33:03 +00:00 |
|
|
42cb1f3674
|
added args for tokenizer and diffusion model (so I don't have to add it later)
|
2023-03-15 00:30:28 +00:00 |
|
|
65a43deb9e
|
why didn't I also have it use chunks for computing the AR conditional latents (instead of just the diffusion aspect)
|
2023-03-14 01:13:49 +00:00 |
|
|
97cd58e7eb
|
maybe solved that odd VRAM spike when doing the clvp pass
|
2023-03-12 12:48:29 -05:00 |
|
|
fec0685405
|
revert muh clean code
|
2023-03-10 00:56:29 +00:00 |
|
|
0514f011ff
|
how did I botch this, I don't think it affects anything since it never thrown an error
|
2023-03-09 22:36:12 +00:00 |
|
|
00be48670b
|
i am very smart
|
2023-03-09 02:06:44 +00:00 |
|
|
bbeee40ab3
|
forgot to convert to gigabytes
|
2023-03-09 00:51:13 +00:00 |
|
|
6410df569b
|
expose VRAM easily
|
2023-03-09 00:38:31 +00:00 |
|
|
3dd5cad324
|
reverting additional auto-suggested batch sizes, per mrq/ai-voice-cloning#87 proving it in fact, is not a good idea
|
2023-03-07 19:38:02 +00:00 |
|
|
cc36c0997c
|
didn't get a chance to commit this this morning
|
2023-03-07 15:43:09 +00:00 |
|
|
fffea7fc03
|
unmarried the config.json to the bigvgan by downloading the right one
|
2023-03-07 13:37:45 +00:00 |
|
|
26133c2031
|
do not reload AR/vocoder if already loaded
|
2023-03-07 04:33:49 +00:00 |
|
|
e2db36af60
|
added loading vocoders on the fly
|
2023-03-07 02:44:09 +00:00 |
|
|
7b2aa51abc
|
oops
|
2023-03-06 21:32:20 +00:00 |
|
|
7f98727ad5
|
added option to specify autoregressive model at tts generation time (for a spicy feature later)
|
2023-03-06 20:31:19 +00:00 |
|
|
6fcd8c604f
|
moved bigvgan model to a huggingspace repo
|
2023-03-05 19:47:22 +00:00 |
|
|
0f3261e071
|
you should have migrated by now, if anything breaks it's on (You)
|
2023-03-05 14:03:18 +00:00 |
|
|
06bdf72b89
|
load the model on CPU because torch doesn't like loading models directly to GPU (it just follows the default vocoder loading behavior)
|
2023-03-03 13:53:21 +00:00 |
|
|
2ba0e056cd
|
attribution
|
2023-03-03 06:45:35 +00:00 |
|
|
aca32a71f7
|
added BigVGAN in place of default vocoder (credit to https://github.com/deviandice/tortoise-tts-BigVGAN)
|
2023-03-03 06:30:58 +00:00 |
|
|
a9de016230
|
added storing the loaded model's hash to the TTS object instead of relying on jerryrig injecting it (although I still have to for the weirdos who refuse to update the right way), added a parameter when loading voices to load a latent tagged with a model's hash so latents are per-model now
|
2023-03-02 00:44:42 +00:00 |
|
|
7b839a4263
|
applied the bitsandbytes wrapper to tortoise inference (not sure if it matters)
|
2023-02-28 01:42:10 +00:00 |
|
|
7cc0250a1a
|
added more kill checks, since it only actually did it for the first iteration of a loop
|
2023-02-24 23:10:04 +00:00 |
|
|
de46cf7831
|
adding magically deleted files back (might have a hunch on what happened)
|
2023-02-24 19:30:04 +00:00 |
|
|
2c7c02eb5c
|
moved the old readme back, to align with how DLAS is setup, sorta
|
2023-02-19 17:37:36 +00:00 |
|
|
34b232927e
|
Oops
|
2023-02-19 01:54:21 +00:00 |
|
|
d8c6739820
|
added constructor argument and function to load a user-specified autoregressive model
|
2023-02-18 14:08:45 +00:00 |
|
|
00cb19b6cf
|
arg to skip voice latents for grabbing voice lists (for preparing datasets)
|
2023-02-17 04:50:02 +00:00 |
|
|
b255a77a05
|
updated notebooks to use the new "main" setup
|
2023-02-17 03:31:19 +00:00 |
|
|
150138860c
|
oops
|
2023-02-17 01:46:38 +00:00 |
|
|
6ad3477bfd
|
one more update
|
2023-02-16 23:18:02 +00:00 |
|
|
413703b572
|
fixed colab to use the new repo, reorder loading tortoise before the web UI for people who don't wait
|
2023-02-16 22:12:13 +00:00 |
|
|
30298b9ca3
|
fixing brain worms
|
2023-02-16 21:36:49 +00:00 |
|
|
d53edf540e
|
pip-ifying things
|
2023-02-16 19:48:06 +00:00 |
|
|
d159346572
|
oops
|
2023-02-16 13:23:07 +00:00 |
|
|
eca61af016
|
actually for real fixed incrementing filenames because i had a regex that actually only worked if candidates or lines>1, cuda now takes priority over dml if you're a nut with both of them installed because you can just specify an override anyways
|
2023-02-16 01:06:32 +00:00 |
|
|
ec80ca632b
|
added setting "device-override", less naively decide the number to use for results, some other thing
|
2023-02-15 21:51:22 +00:00 |
|
|
dcc5c140e6
|
fixes
|
2023-02-15 15:33:08 +00:00 |
|
|
729b292515
|
oops x2
|
2023-02-15 05:57:42 +00:00 |
|
|
5bf98de301
|
oops
|
2023-02-15 05:55:01 +00:00 |
|
|
3e8365fdec
|
voicefixed files do not overwrite, as my autism wants to hear the difference between them, incrementing file format fixed for real
|
2023-02-15 05:49:28 +00:00 |
|
|
ea1bc770aa
|
added option: force cpu for conditioning latents, for when you want low chunk counts but your GPU keeps OOMing because fuck fragmentation
|
2023-02-15 05:01:40 +00:00 |
|
|
b721e395b5
|
modified conversion scripts to not give a shit about bitrate and formats since torchaudio.load handles all of that anyways, and it all gets resampled anyways
|
2023-02-15 04:44:14 +00:00 |
|