1
1
forked from mrq/tortoise-tts

Moved experimental settings to main tab, hidden under a check box

This commit is contained in:
mrq 2023-02-11 17:21:08 +00:00
parent 126f1a0afe
commit 3a8ce5a110
2 changed files with 31 additions and 22 deletions

View File

@ -173,6 +173,18 @@ You'll be presented with a bunch of options in the default `Generate` tab, but d
`P` refers to the default, vanilla sampling method in `diffusion.py`. `P` refers to the default, vanilla sampling method in `diffusion.py`.
To reiterate, this ***only*** is useful for the diffusion decoding path, after the autoregressive outputs are generated. To reiterate, this ***only*** is useful for the diffusion decoding path, after the autoregressive outputs are generated.
Below are an explanation of experimental flags. Messing with these might impact performance, as these are exposed only if you know what you are doing.
* `Half-Precision`: (attempts to) hint to PyTorch to auto-cast to float16 (half precision) for compute. Disabled by default, due to it making computations slower.
* `Conditional Free`: a quality boosting improvement at the cost of some performance. Enabled by default, as I think the penaly is negligible in the end.
* `CVVP Weight`: governs how much weight the CVVP model should influence candidates. The original documentation mentions this is deprecated as it does not really influence things, but you're still free to play around with it.
Currently, setting requires regenerating your voice latents, as I forgot to have it return some extra data that weighing against the CVVP model uses. Oops.
Setting this to 1 leads to bad behavior.
* `Top P`: P value used in nucleus sampling; lower values mean the decoder produces more "likely" (aka boring) outputs.
* `Diffusion Temperature`: the variance of the noise fed into the diffusion model; values at 0 are the "mean" prediction of the diffusion network and will sound bland and smeared.
* `Length Penalty`: a length penalty applied to the autoregressive decoder; higher settings causes the model to produce more terse outputs.
* `Repetition Penalty`: a penalty that prevents the autoregressive decoder from repeating itself during decoding. Can be used to reduce the incidence of long silences or "uhhhhhhs", etc.
* `Conditioning-Free K`: determintes balancing the conditioning free signal with the conditioning-present signal.
After you fill everything out, click `Run`, and wait for your output in the output window. The sampled voice is also returned, but if you're using multiple files, it'll return the first file, rather than a combined file. After you fill everything out, click `Run`, and wait for your output in the output window. The sampled voice is also returned, but if you're using multiple files, it'll return the first file, rather than a combined file.
All outputs are saved under `./result/[voice name]/[timestamp]/` as `result.wav`, and the settings in `input.txt`. There doesn't seem to be an inherent way to add a Download button in Gradio, so keep that folder in mind. All outputs are saved under `./result/[voice name]/[timestamp]/` as `result.wav`, and the settings in `input.txt`. There doesn't seem to be an inherent way to add a Download button in Gradio, so keep that folder in mind.
@ -224,18 +236,6 @@ Below are settings that override the default launch arguments. Some of these req
* `Output Sample Rate`: the sample rate to save the generated audio as. It provides a bit of slight bump in quality * `Output Sample Rate`: the sample rate to save the generated audio as. It provides a bit of slight bump in quality
* `Output Volume`: adjusts the volume through amplitude scaling * `Output Volume`: adjusts the volume through amplitude scaling
Below are an explanation of experimental flags. Messing with these might impact performance, as these are exposed only if you know what you are doing.
* `Half-Precision`: (attempts to) hint to PyTorch to auto-cast to float16 (half precision) for compute. Disabled by default, due to it making computations slower.
* `Conditional Free`: a quality boosting improvement at the cost of some performance. Enabled by default, as I think the penaly is negligible in the end.
* `CVVP Weight`: governs how much weight the CVVP model should influence candidates. The original documentation mentions this is deprecated as it does not really influence things, but you're still free to play around with it.
Currently, setting requires regenerating your voice latents, as I forgot to have it return some extra data that weighing against the CVVP model uses. Oops.
Setting this to 1 leads to bad behavior.
* `Top P`: P value used in nucleus sampling; lower values mean the decoder produces more "likely" (aka boring) outputs.
* `Diffusion Temperature`: the variance of the noise fed into the diffusion model; values at 0 are the "mean" prediction of the diffusion network and will sound bland and smeared.
* `Length Penalty`: a length penalty applied to the autoregressive decoder; higher settings causes the model to produce more terse outputs.
* `Repetition Penalty`: a penalty that prevents the autoregressive decoder from repeating itself during decoding. Can be used to reduce the incidence of long silences or "uhhhhhhs", etc.
* `Conditioning-Free K`: determintes balancing the conditioning free signal with the conditioning-present signal.
## Example(s) ## Example(s)
Below are some (rather outdated) outputs I deem substantial enough to share. As I continue delving into TorToiSe, I'll supply more examples and the values I use. Below are some (rather outdated) outputs I deem substantial enough to share. As I continue delving into TorToiSe, I'll supply more examples and the values I use.

View File

@ -643,6 +643,24 @@ def setup_gradio():
diffusion_iterations, diffusion_iterations,
], ],
) )
show_experimental_settings = gr.Checkbox(label="Show Experimental Settings")
with gr.Column(visible=False) as col:
experimental_column = col
experimental_checkboxes = gr.CheckboxGroup(["Half Precision", "Conditioning-Free"], value=["Conditioning-Free"], label="Experimental Flags")
cvvp_weight = gr.Slider(value=0, minimum=0, maximum=1, label="CVVP Weight")
top_p = gr.Slider(value=0.8, minimum=0, maximum=1, label="Top P")
diffusion_temperature = gr.Slider(value=1.0, minimum=0, maximum=1, label="Diffusion Temperature")
length_penalty = gr.Slider(value=1.0, minimum=0, maximum=8, label="Length Penalty")
repetition_penalty = gr.Slider(value=2.0, minimum=0, maximum=8, label="Repetition Penalty")
cond_free_k = gr.Slider(value=2.0, minimum=0, maximum=4, label="Conditioning-Free K")
show_experimental_settings.change(
fn=lambda x: gr.update(visible=x),
inputs=show_experimental_settings,
outputs=experimental_column
)
with gr.Column(): with gr.Column():
submit = gr.Button(value="Generate") submit = gr.Button(value="Generate")
stop = gr.Button(value="Stop") stop = gr.Button(value="Stop")
@ -783,16 +801,7 @@ def setup_gradio():
i.change( i.change(
fn=export_exec_settings, fn=export_exec_settings,
inputs=exec_inputs inputs=exec_inputs
) )
with gr.Column():
experimental_checkboxes = gr.CheckboxGroup(["Half Precision", "Conditioning-Free"], value=["Conditioning-Free"], label="Experimental Flags")
cvvp_weight = gr.Slider(value=0, minimum=0, maximum=1, label="CVVP Weight")
top_p = gr.Slider(value=0.8, minimum=0, maximum=1, label="Top P")
diffusion_temperature = gr.Slider(value=1.0, minimum=0, maximum=1, label="Diffusion Temperature")
length_penalty = gr.Slider(value=1.0, minimum=0, maximum=8, label="Length Penalty")
repetition_penalty = gr.Slider(value=2.0, minimum=0, maximum=8, label="Repetition Penalty")
cond_free_k = gr.Slider(value=2.0, minimum=0, maximum=4, label="Conditioning-Free K")
input_settings = [ input_settings = [
text, text,