Suggestions #16
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Making it accessible for plebs without a good rig.
Thanks for the great work.
Thought, I'd share a few more things after testing tortoise a bit more:
Have you considered adding the following?
top_p
diffusion_temperature
length_penalty
repetition_penalty
autoregressive_batch_size
cond_free_k
typical_sampling (for testing only)
especially top_p seems to greatly influence the outcome.
I'll see about fiddling with replacing the provided colab notebook, since I'm already feeling like I'm tackled most problems already and am running out of things to work on.
I've peeped at some of them, but most of the remaining settings deal with the diffusion/vocoder pass and that doesn't really seem like it needs tinkering much, as it's (mostly) just creating the sound file itself.
autoregressive_batch_size
is exposed asSample Batch Size
in theSettings
tab. Bigger batch size = faster throughput at the cost of more VRAM consumed. If anything though, I need to replace the function that gives a "suggested" batch size, such as taking character count into account.cond_free_k
is related to Condition-Free, which does have a (slightly) noticeable quality bump.I'll definitely add in knobs and sliders for them all to test with.
Thanks, I've tested it locally, but it's not feasible to run on my machine. Tried getting gradio to run via colab, but failed at that. Before I found your repo, I've built a colab UI to make tortoise more userfriendly and yesterday I've tested yours through that. To be honest, it'd be just better to get your gradio setup running, but I'll share it anyway: https://colab.research.google.com/drive/1WyLFXSDzre14Ig-gpomMS3Ugus5K9oZu
I've noticed that especially top_p influences my outputs quite a bit. Can't put my finger on it, but it certainly helped tinkering with to get some voices right. I've discussed your project yesterday on discord and we've been testing out various settings to nail down how to use these. Some people are interested in training a new model, etc. Maybe it's of interest to you, for feedback, knowledge, sharing, getting voice assests, etc. https://discord. gg/bM44JGSeHT
Thank you very much!
Added a link to a colab notebook in commit
f5ed5499a0
. It uses Gradio's Colab integration to embed itself in the notebook itself, rather than require opening a public Gradio link.Added the remaining input settings in commit
811539b20a
.Very nice!
Two things I noticed:
The embedded view is too small for the user interface and uploaded voices return an error when trying to generate.

Ah, there's a
height
argument I can pass through on launch. I'll push a fix later when I'm available.Did you remember to click
Refresh Voice List
? Works fine for me when I do that.Legend.
After getting the error on two different runtimes, I wasn't able to reproduce it now. Works!
I've been thinking about a few more quality of life suggestions, just in case you're interested and feel inspired by them:
Honestly, it's wonderful that you've put this together and decided to share it. I'm very grateful you're making this accessible to everyone! :)
A lot of those suggestions require heavy Gradio modification, or very dirty solutions.
Tried it. The solutions in mind are either spawning a child process to do the generation that can be killed to cancel it, or a lot of nasty patches to check for a kill request, which doesn't guarantee nice cleanup.I suppose I could do some funky state variable checking with mytqdm
override for reporting progress, but I'm not sure of the rammifications of dirtily terminating a generation procedure, as models and data do move between CPU and GPU.Added in commit
8641cc9906
.No clean way in mind, as it'd require a way to procedurally add a variable number of gradio components on user request, or enforcingCandidates
as an almost-constant and require the UI to refresh on change.Added in commit
4b3b0ead1a
.Considered it while I was thinking of cleaning up how the seed is outputted, but I got sidetracked. Might get around to it and outputting stats when I run out of things to do.Added in commit
9bf1ea5b0a
.I believe the originaldo_tts.py
script did that. The current results structure is an artifact from when I initially was toying with TorToiSe, and hated how it was originally handled. Funny enough, the giant cluster of timestamped folders were getting annoying too.Should be an easy thing to change to.Added in commit
8641cc9906
.While I know it won't hurt to store these as well, I don't necessarily see any need for it. They're not generation parameters, and sample rate always has to be deduced anyways on playback. The sample rate slider is just to shortcut throwing it into a upsampler/interpolator.
Considered it; it wouldn't really be anything elegant. Even Voldy's Web UI never got a permanent one, as the few days it had a history function, it was extremely kludgy.At best, it'd be something like a drop down to list voices, a submit button, and a text box to print out a text list of everything. Although, I suppose that's functional enough despite being rather "primitive".Added in commit
9bf1ea5b0a
.You mean all the other experimental settings? Despite being easier to try and toy with them all if they were in the main tab, for normal users I'd rather try to shy them from touching those knobs until some better guarantees are known.Added in commit
84316d8f80
.Technically you can by dropping the
Concurrency Count
to 1, and Gradio should store any events on queue. However, any thing that relies on a gradio event, like presets updating, will stall until generation is complete.Beautiful!
After some testing, my experience and a few errors I ran into:
Would it be possible to have the same kind of drop down for candidates like you did in the history tab?
I assume Gradio doesn't allow for more complex drop downs, like having the "experimental" settings collapse, revealing several sliders?
Errors:
When first executing "Running", en error is shown, upon re-trying, it starts (only happens when using "run all", having the "Running" cell queued up, maybe a colab updating issue? Screenshot 1)
Sometimes on first starting /"copy settings", trying to generate results in an error (only seems to be fixed by loading a different voice and hitting refresh again, Screenshot 2)
"Reload TTS" results in crashing the running cell entirely
Stopping during generation works, but upon clicking generate again results in an error (loading voices back and forth fixes this, screenshot 2 again)
The history tab is great! Sometimes it's not updating, not showing the voices that previously generated clips. When it worked, I noticed that CVVP is not updating correctly, always showing 0.00. I tried checking for diffusion_temperature too, but couldn't get it to show my voice again.
And I had two more thoughts on design:
That'd work.Added in commit
4b3b0ead1a
.I suppose. I know Stable Diffusion elements under dropdowns, but I'm sure that's a custom thing.As a compromise I can just have a checkbox that sets the experimental settings visible when checked.Added in commit
84316d8f80
.The colab runtime needs to restart to reset PIL, for whatever reason, because it gets updated during setup, hence the comment above the
exit()
saying its a hotfix.I'll see about replicating it, but in testing in the colab specifically with importing latents to a voice folder, I had no issues.
Because it's not an elegant way of reloading, it was only there for a specific niche case that I don't really need it for anyways.If you do not have enough free VRAM to load a second copy before the first one gets destructed, you'll OOM. It's only there if you really need to change a setting without restarting the entire process (which you should, initialization is pretty fast).usingdel tts
to delete the first instance and then initializing it threw an error in testing.Should be fixed in commit
50073e635f
.As I said before, there is no elegant way of stopping a generation. It relies on throwing an exception during an iteration step and the state gets gummed up.
You'll still have some models on the wrong device. I suppose if I get an elegant way of reloading TorToiSe, then I could just have it reload after a kill request.Added in commit50073e635f
.Cannot replicate.
As both documented in the README and the error message that prints, CVVP is predicated on disabling
Slimmer Computed Latents
. Comparing candidates to the CVVP model relies on additional data from computing the latents that don't get saved because it bloats the latents file which bloats every sound output from the latents being embedded.Shows up fine.
Already added.
Suggestion: Set up a colabto SuggestionsThanks for clarifying again for me! If tooltips are possible in Gradio, that'd be a good one to add.
My obvservations after testing the latest updates:
Generating always shows [1/1]
So I got the history working again, but only after restarting the runtime and executing the "Running" cell again. Otherwise after first running, it doesn't load my previously loaded voices (tested multiple times, closing the session and restarting).
Is it possible to get rid of the "Select Candidate" button, by executing its function automatically after selecting a .wav from the dropdown?
Fixed in commit
5f1c032312
.Intentional.
No. It makes the UI continuously spaz out. If I was able to, I wouldn't need buttons for the History tab.
I see now what it refers to now.
Got it!