Suggestions #16

Closed
opened 2023-02-10 12:44:43 +00:00 by inthemorningsir · 14 comments

Making it accessible for plebs without a good rig.

Thanks for the great work.

Making it accessible for plebs without a good rig. Thanks for the great work.

Thought, I'd share a few more things after testing tortoise a bit more:

Have you considered adding the following?

top_p
diffusion_temperature
length_penalty
repetition_penalty
autoregressive_batch_size
cond_free_k

typical_sampling (for testing only)

especially top_p seems to greatly influence the outcome.

Thought, I'd share a few more things after testing tortoise a bit more: Have you considered adding the following? top_p diffusion_temperature length_penalty repetition_penalty autoregressive_batch_size cond_free_k typical_sampling (for testing only) especially top_p seems to greatly influence the outcome.
Owner

Set up a colab

I'll see about fiddling with replacing the provided colab notebook, since I'm already feeling like I'm tackled most problems already and am running out of things to work on.

Have you considered adding the following?

I've peeped at some of them, but most of the remaining settings deal with the diffusion/vocoder pass and that doesn't really seem like it needs tinkering much, as it's (mostly) just creating the sound file itself.

autoregressive_batch_size is exposed as Sample Batch Size in the Settings tab. Bigger batch size = faster throughput at the cost of more VRAM consumed. If anything though, I need to replace the function that gives a "suggested" batch size, such as taking character count into account.

cond_free_k is related to Condition-Free, which does have a (slightly) noticeable quality bump.

I'll definitely add in knobs and sliders for them all to test with.

> Set up a colab I'll see about fiddling with replacing the provided colab notebook, since I'm already feeling like I'm tackled most problems already and am running out of things to work on. > Have you considered adding the following? I've peeped at some of them, but most of the remaining settings deal with the diffusion/vocoder pass and that doesn't really seem like it needs tinkering much, as it's (mostly) just creating the sound file itself. `autoregressive_batch_size` is exposed as `Sample Batch Size` in the `Settings` tab. Bigger batch size = faster throughput at the cost of more VRAM consumed. If anything though, I need to replace the function that gives a "suggested" batch size, such as taking character count into account. `cond_free_k` is related to Condition-Free, which does have a (slightly) noticeable quality bump. I'll definitely add in knobs and sliders for them all to test with.

I'll see about fiddling with replacing the provided colab notebook, since I'm already feeling like I'm tackled most problems already and am running out of things to work on.

Thanks, I've tested it locally, but it's not feasible to run on my machine. Tried getting gradio to run via colab, but failed at that. Before I found your repo, I've built a colab UI to make tortoise more userfriendly and yesterday I've tested yours through that. To be honest, it'd be just better to get your gradio setup running, but I'll share it anyway: https://colab.research.google.com/drive/1WyLFXSDzre14Ig-gpomMS3Ugus5K9oZu

I've peeped at some of them, but most of the remaining settings deal with the diffusion/vocoder pass and that doesn't really seem like it needs tinkering much, as it's (mostly) just creating the sound file itself.

I've noticed that especially top_p influences my outputs quite a bit. Can't put my finger on it, but it certainly helped tinkering with to get some voices right. I've discussed your project yesterday on discord and we've been testing out various settings to nail down how to use these. Some people are interested in training a new model, etc. Maybe it's of interest to you, for feedback, knowledge, sharing, getting voice assests, etc. https://discord. gg/bM44JGSeHT

I'll definitely add in knobs and sliders for them all to test with.

Thank you very much!

> I'll see about fiddling with replacing the provided colab notebook, since I'm already feeling like I'm tackled most problems already and am running out of things to work on. Thanks, I've tested it locally, but it's not feasible to run on my machine. Tried getting gradio to run via colab, but failed at that. Before I found your repo, I've built a colab UI to make tortoise more userfriendly and yesterday I've tested yours through that. To be honest, it'd be just better to get your gradio setup running, but I'll share it anyway: https://colab.research.google.com/drive/1WyLFXSDzre14Ig-gpomMS3Ugus5K9oZu > I've peeped at some of them, but most of the remaining settings deal with the diffusion/vocoder pass and that doesn't really seem like it needs tinkering much, as it's (mostly) just creating the sound file itself. I've noticed that especially top_p influences my outputs quite a bit. Can't put my finger on it, but it certainly helped tinkering with to get some voices right. I've discussed your project yesterday on discord and we've been testing out various settings to nail down how to use these. Some people are interested in training a new model, etc. Maybe it's of interest to you, for feedback, knowledge, sharing, getting voice assests, etc. https://discord. gg/bM44JGSeHT > I'll definitely add in knobs and sliders for them all to test with. Thank you very much!
Owner

Added a link to a colab notebook in commit f5ed5499a0. It uses Gradio's Colab integration to embed itself in the notebook itself, rather than require opening a public Gradio link.

Added the remaining input settings in commit 811539b20a.

Added a link to a colab notebook in commit f5ed5499a039ae03729fc6a98ffe6c4f3046b056. It uses Gradio's Colab integration to embed itself in the notebook itself, rather than require opening a public Gradio link. Added the remaining input settings in commit 811539b20adfe6d85d2bc3e6728d55fd2427aae0.

Very nice!

Two things I noticed:

The embedded view is too small for the user interface and uploaded voices return an error when trying to generate.
image

image

Very nice! Two things I noticed: The embedded view is too small for the user interface and uploaded voices return an error when trying to generate. ![image](/attachments/a228c65e-4a31-4fc5-a109-454157962eb5) ![image](/attachments/4bb2e955-6268-4213-b20c-ef2007be4fa5)
Owner

The embedded view is too small for the user interface

Ah, there's a height argument I can pass through on launch. I'll push a fix later when I'm available.

uploaded voices return an error when trying to generate

Did you remember to click Refresh Voice List? Works fine for me when I do that.

> The embedded view is too small for the user interface Ah, there's a `height` argument I can pass through on launch. I'll push a fix later when I'm available. > uploaded voices return an error when trying to generate Did you remember to click `Refresh Voice List`? Works fine for me when I do that.

Ah, there's a height argument I can pass through on launch. I'll push a fix later when I'm available.

Legend.

Did you remember to click Refresh Voice List? Works fine for me when I do that.

After getting the error on two different runtimes, I wasn't able to reproduce it now. Works!

I've been thinking about a few more quality of life suggestions, just in case you're interested and feel inspired by them:

  • a cancel button to stop the current generation
  • display multiple candidates in the output section
  • display the render time underneath the output section
  • having the voice name included in the filename (/results/voice/voice_001.wav)
  • including output volume, sample rate in the metadata
  • a history tab with previously generated voices to compare quality, generation time, maybe a note/rating section
  • having the settings that 'direct' the voice all on one tab
  • adding a render queue

Honestly, it's wonderful that you've put this together and decided to share it. I'm very grateful you're making this accessible to everyone! :)

> Ah, there's a height argument I can pass through on launch. I'll push a fix later when I'm available. Legend. > Did you remember to click Refresh Voice List? Works fine for me when I do that. > After getting the error on two different runtimes, I wasn't able to reproduce it now. Works! I've been thinking about a few more quality of life suggestions, just in case you're interested and feel inspired by them: * a cancel button to stop the current generation * display multiple candidates in the output section * display the render time underneath the output section * having the voice name included in the filename (/results/voice/voice_001.wav) * including output volume, sample rate in the metadata * a history tab with previously generated voices to compare quality, generation time, maybe a note/rating section * having the settings that 'direct' the voice all on one tab * adding a render queue Honestly, it's wonderful that you've put this together and decided to share it. I'm very grateful you're making this accessible to everyone! :)
Owner

A lot of those suggestions require heavy Gradio modification, or very dirty solutions.

a cancel button to stop the current generation

Tried it. The solutions in mind are either spawning a child process to do the generation that can be killed to cancel it, or a lot of nasty patches to check for a kill request, which doesn't guarantee nice cleanup.

I suppose I could do some funky state variable checking with my tqdm override for reporting progress, but I'm not sure of the rammifications of dirtily terminating a generation procedure, as models and data do move between CPU and GPU.

Added in commit 8641cc9906.

display multiple candidates in the output section

No clean way in mind, as it'd require a way to procedurally add a variable number of gradio components on user request, or enforcing Candidates as an almost-constant and require the UI to refresh on change.

Added in commit 4b3b0ead1a.

display the render time underneath the output section

Considered it while I was thinking of cleaning up how the seed is outputted, but I got sidetracked. Might get around to it and outputting stats when I run out of things to do.

Added in commit 9bf1ea5b0a.

having the voice name included in the filename (/results/voice/voice_001.wav)

I believe the original do_tts.py script did that. The current results structure is an artifact from when I initially was toying with TorToiSe, and hated how it was originally handled. Funny enough, the giant cluster of timestamped folders were getting annoying too.

Should be an easy thing to change to.

Added in commit 8641cc9906.

including output volume, sample rate in the metadata

While I know it won't hurt to store these as well, I don't necessarily see any need for it. They're not generation parameters, and sample rate always has to be deduced anyways on playback. The sample rate slider is just to shortcut throwing it into a upsampler/interpolator.

a history tab with previously generated voices to compare quality, generation time, maybe a note/rating section

Considered it; it wouldn't really be anything elegant. Even Voldy's Web UI never got a permanent one, as the few days it had a history function, it was extremely kludgy.

At best, it'd be something like a drop down to list voices, a submit button, and a text box to print out a text list of everything. Although, I suppose that's functional enough despite being rather "primitive".

Added in commit 9bf1ea5b0a.

having the settings that 'direct' the voice all on one tab

You mean all the other experimental settings? Despite being easier to try and toy with them all if they were in the main tab, for normal users I'd rather try to shy them from touching those knobs until some better guarantees are known.

Added in commit 84316d8f80.

adding a render queue

Technically you can by dropping the Concurrency Count to 1, and Gradio should store any events on queue. However, any thing that relies on a gradio event, like presets updating, will stall until generation is complete.

A lot of those suggestions require heavy Gradio modification, or very dirty solutions. > a cancel button to stop the current generation ~~Tried it. The solutions in mind are either spawning a child process to do the generation that can be killed to cancel it, or a lot of nasty patches to check for a kill request, which doesn't guarantee nice cleanup.~~ ~~I *suppose* I could do some funky state variable checking with my `tqdm` override for reporting progress, but I'm not sure of the rammifications of dirtily terminating a generation procedure, as models and data do move between CPU and GPU.~~ Added in commit 8641cc990644e75527770cdf992f54c63574e782. > display multiple candidates in the output section ~~No clean way in mind, as it'd require a way to procedurally add a variable number of gradio components on user request, or enforcing `Candidates` as an almost-constant and require the UI to refresh on change.~~ Added in commit 4b3b0ead1a98f62ba972c8e59731c9cb3201cab8. > display the render time underneath the output section ~~Considered it while I was thinking of cleaning up how the seed is outputted, but I got sidetracked. Might get around to it and outputting stats when I run out of things to do.~~ Added in commit 9bf1ea5b0a347e662195bebed266f07f13128c28. > having the voice name included in the filename (/results/voice/voice_001.wav) ~~I believe the original `do_tts.py` script did that. The current results structure is an artifact from when I initially was toying with TorToiSe, and hated how it was originally handled. Funny enough, the giant cluster of timestamped folders were getting annoying too.~~ ~~Should be an easy thing to change to.~~ Added in commit 8641cc990644e75527770cdf992f54c63574e782. > including output volume, sample rate in the metadata While I know it won't hurt to store these as well, I don't necessarily see any need for it. They're not generation parameters, and sample rate always has to be deduced anyways on playback. The sample rate slider is just to shortcut throwing it into a upsampler/interpolator. > a history tab with previously generated voices to compare quality, generation time, maybe a note/rating section ~~Considered it; it wouldn't really be anything elegant. Even Voldy's Web UI never got a permanent one, as the few days it had a history function, it was extremely kludgy.~~ ~~At best, it'd be something like a drop down to list voices, a submit button, and a text box to print out a text list of everything. Although, I suppose that's functional enough despite being rather "primitive".~~ Added in commit 9bf1ea5b0a347e662195bebed266f07f13128c28. > having the settings that 'direct' the voice all on one tab ~~You mean all the other experimental settings? Despite being easier to try and toy with them all if they were in the main tab, for normal users I'd rather try to shy them from touching those knobs until some better guarantees are known.~~ Added in commit 84316d8f809ee8564eeff40e22a386ca501cc930. > adding a render queue *Technically* you can by dropping the `Concurrency Count` to 1, and Gradio should store any events on queue. However, *any* thing that relies on a gradio event, like presets updating, will stall until generation is complete.
Added in commit 8641cc9906.
Added in commit 9bf1ea5b0a.
Added in commit 8641cc9906.

Beautiful!

``` Added in commit 8641cc9906. Added in commit 9bf1ea5b0a. Added in commit 8641cc9906. ``` Beautiful!

After some testing, my experience and a few errors I ran into:

No clean way in mind, as it'd require a way to procedurally add a variable number of gradio components on user request, or enforcing Candidates as an almost-constant and require the UI to refresh on change.

Would it be possible to have the same kind of drop down for candidates like you did in the history tab?

You mean all the other experimental settings? Despite being easier to try and toy with them all if they were in the main tab, for normal users I'd rather try to shy them from touching those knobs until some better guarantees are known.

I assume Gradio doesn't allow for more complex drop downs, like having the "experimental" settings collapse, revealing several sliders?

Errors:

  1. When first executing "Running", en error is shown, upon re-trying, it starts (only happens when using "run all", having the "Running" cell queued up, maybe a colab updating issue? Screenshot 1)

  2. Sometimes on first starting /"copy settings", trying to generate results in an error (only seems to be fixed by loading a different voice and hitting refresh again, Screenshot 2)

  3. "Reload TTS" results in crashing the running cell entirely

  4. Stopping during generation works, but upon clicking generate again results in an error (loading voices back and forth fixes this, screenshot 2 again)

  5. The history tab is great! Sometimes it's not updating, not showing the voices that previously generated clips. When it worked, I noticed that CVVP is not updating correctly, always showing 0.00. I tried checking for diffusion_temperature too, but couldn't get it to show my voice again.

And I had two more thoughts on design:

  1. "Apply Settings" would be clearer than "Copy Settings"
  2. Is it possible to split up progression bar section, so it's clear what part of the generation is currently at (see screenshot 3 and apologies, I imagine this is one of the things that would be nervewrecking due to Gradio's limitations)?
After some testing, my experience and a few errors I ran into: > No clean way in mind, as it'd require a way to procedurally add a variable number of gradio components on user request, or enforcing Candidates as an almost-constant and require the UI to refresh on change. Would it be possible to have the same kind of drop down for candidates like you did in the history tab? > You mean all the other experimental settings? Despite being easier to try and toy with them all if they were in the main tab, for normal users I'd rather try to shy them from touching those knobs until some better guarantees are known. I assume Gradio doesn't allow for more complex drop downs, like having the "experimental" settings collapse, revealing several sliders? Errors: 1. When first executing "Running", en error is shown, upon re-trying, it starts (only happens when using "run all", having the "Running" cell queued up, maybe a colab updating issue? Screenshot 1) 2. Sometimes on first starting /"copy settings", trying to generate results in an error (only seems to be fixed by loading a different voice and hitting refresh again, Screenshot 2) 3. "Reload TTS" results in crashing the running cell entirely 4. Stopping during generation works, but upon clicking generate again results in an error (loading voices back and forth fixes this, screenshot 2 again) 5. The history tab is great! Sometimes it's not updating, not showing the voices that previously generated clips. When it worked, I noticed that CVVP is not updating correctly, always showing 0.00. I tried checking for diffusion_temperature too, but couldn't get it to show my voice again. And I had two more thoughts on design: 1. "Apply Settings" would be clearer than "Copy Settings" 2. Is it possible to split up progression bar section, so it's clear what part of the generation is currently at (see screenshot 3 and apologies, I imagine this is one of the things that would be nervewrecking due to Gradio's limitations)?
Owner

Would it be possible to have the same kind of drop down for candidates like you did in the history tab?

That'd work.

Added in commit 4b3b0ead1a.

I assume Gradio doesn't allow for more complex drop downs, like having the "experimental" settings collapse, revealing several sliders?

I suppose. I know Stable Diffusion elements under dropdowns, but I'm sure that's a custom thing.

As a compromise I can just have a checkbox that sets the experimental settings visible when checked.

Added in commit 84316d8f80.

When first executing "Running", en error is shown

The colab runtime needs to restart to reset PIL, for whatever reason, because it gets updated during setup, hence the comment above the exit() saying its a hotfix.

Sometimes on first starting /"copy settings", trying to generate results in an error

I'll see about replicating it, but in testing in the colab specifically with importing latents to a voice folder, I had no issues.

"Reload TTS" results in crashing the running cell entirely

Because it's not an elegant way of reloading, it was only there for a specific niche case that I don't really need it for anyways.

If you do not have enough free VRAM to load a second copy before the first one gets destructed, you'll OOM. It's only there if you really need to change a setting without restarting the entire process (which you should, initialization is pretty fast).

using del tts to delete the first instance and then initializing it threw an error in testing.

Should be fixed in commit 50073e635f.

Stopping during generation works, but upon clicking generate again results in an error

As I said before, there is no elegant way of stopping a generation. It relies on throwing an exception during an iteration step and the state gets gummed up. You'll still have some models on the wrong device. I suppose if I get an elegant way of reloading TorToiSe, then I could just have it reload after a kill request. Added in commit 50073e635f.

Sometimes it's not updating, not showing the voices that previously generated clips

Cannot replicate.

When it worked, I noticed that CVVP is not updating correctly, always showing 0.00

As both documented in the README and the error message that prints, CVVP is predicated on disabling Slimmer Computed Latents. Comparing candidates to the CVVP model relies on additional data from computing the latents that don't get saved because it bloats the latents file which bloats every sound output from the latents being embedded.

I tried checking for diffusion_temperature too

Shows up fine.

Is it possible to split up progression bar section

Already added.

> Would it be possible to have the same kind of drop down for candidates like you did in the history tab? ~~That'd work.~~ Added in commit 4b3b0ead1a98f62ba972c8e59731c9cb3201cab8. > I assume Gradio doesn't allow for more complex drop downs, like having the "experimental" settings collapse, revealing several sliders? ~~I suppose. I know Stable Diffusion elements under dropdowns, but I'm sure that's a custom thing.~~ ~~As a compromise I can just have a checkbox that sets the experimental settings visible when checked.~~ Added in commit 84316d8f809ee8564eeff40e22a386ca501cc930. > When first executing "Running", en error is shown The colab runtime needs to restart to reset PIL, for whatever reason, because it gets updated during setup, hence the comment above the `exit()` saying its a hotfix. > Sometimes on first starting /"copy settings", trying to generate results in an error I'll see about replicating it, but in testing in the colab specifically with importing latents to a voice folder, I had no issues. > "Reload TTS" results in crashing the running cell entirely ~~Because it's not an elegant way of reloading, it was only there for a specific niche case that I don't really need it for anyways.~~ ~~If you do not have enough free VRAM to load a second copy before the first one gets destructed, you'll OOM. It's only there if you ***really*** need to change a setting without restarting the entire process (which you should, initialization is pretty fast).~~ ~~using `del tts` to delete the first instance and then initializing it threw an error in testing.~~ Should be fixed in commit 50073e635ff191f6cbff9037ffe9001d351a440d. > Stopping during generation works, but upon clicking generate again results in an error As I said before, there is no elegant way of stopping a generation. It relies on throwing an exception during an iteration step and the state gets gummed up. ~~You'll still have some models on the wrong device. I *suppose* if I get an elegant way of reloading TorToiSe, then I could just have it reload after a kill request.~~ Added in commit 50073e635ff191f6cbff9037ffe9001d351a440d. > Sometimes it's not updating, not showing the voices that previously generated clips Cannot replicate. > When it worked, I noticed that CVVP is not updating correctly, always showing 0.00 As both documented in the README and the error message that prints, CVVP is predicated on disabling `Slimmer Computed Latents`. Comparing candidates to the CVVP model relies on additional data from computing the latents that don't get saved because it bloats the latents file which bloats every sound output from the latents being embedded. > I tried checking for diffusion_temperature too Shows up fine. > Is it possible to split up progression bar section Already added.
mrq changed title from Suggestion: Set up a colab to Suggestions 2023-02-11 15:35:01 +00:00

CVVP is predicated on disabling Slimmer Computed Latents.

Thanks for clarifying again for me! If tooltips are possible in Gradio, that'd be a good one to add.

My obvservations after testing the latest updates:

  • Attempts to run voice fix, even when it's disabled, adding around 20 seconds to the total time until the output is displayed. It may even be running it, despite being disabled? At least, it reads:
VoiceFix starting
VoiceFix finished
  • Generating always shows [1/1]

  • So I got the history working again, but only after restarting the runtime and executing the "Running" cell again. Otherwise after first running, it doesn't load my previously loaded voices (tested multiple times, closing the session and restarting).

  • Is it possible to get rid of the "Select Candidate" button, by executing its function automatically after selecting a .wav from the dropdown?

> CVVP is predicated on disabling Slimmer Computed Latents. Thanks for clarifying again for me! If tooltips are possible in Gradio, that'd be a good one to add. My obvservations after testing the latest updates: * Attempts to run voice fix, even when it's disabled, adding around 20 seconds to the total time until the output is displayed. It may even be running it, despite being disabled? At least, it reads: ``` VoiceFix starting VoiceFix finished ``` * Generating always shows [1/1] * So I got the history working again, but only after restarting the runtime and executing the "Running" cell again. Otherwise after first running, it doesn't load my previously loaded voices (tested multiple times, closing the session and restarting). * Is it possible to get rid of the "Select Candidate" button, by executing its function automatically after selecting a .wav from the dropdown?
Owner

Attempts to run voice fix, even when it's disabled, adding around 20 seconds to the total time until the output is displayed. It may even be running it, despite being disabled? At least, it reads:

Fixed in commit 5f1c032312.

Generating always shows [1/1]

Intentional.

Is it possible to get rid of the "Select Candidate" button, by executing its function automatically after selecting a .wav from the dropdown?

No. It makes the UI continuously spaz out. If I was able to, I wouldn't need buttons for the History tab.

> Attempts to run voice fix, even when it's disabled, adding around 20 seconds to the total time until the output is displayed. It may even be running it, despite being disabled? At least, it reads: Fixed in commit 5f1c032312048596e1b00e836622fc7201bdf6be. > Generating always shows [1/1] Intentional. > Is it possible to get rid of the "Select Candidate" button, by executing its function automatically after selecting a .wav from the dropdown? No. It makes the UI continuously spaz out. If I was able to, I wouldn't need buttons for the History tab.

Intentional.

I see now what it refers to now.

No. It makes the UI continuously spaz out. If I was able to, I wouldn't need buttons for the History tab.

Got it!

> Intentional. I see now what it refers to now. > No. It makes the UI continuously spaz out. If I was able to, I wouldn't need buttons for the History tab. Got it!
mrq closed this issue 2023-02-14 21:25:32 +00:00
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/tortoise-tts#16
No description provided.