Implement BigVGAN Full Fat #78

Closed
opened 2023-03-06 23:43:00 +00:00 by deviandice · 3 comments

So I fucked up. Admitiedly it was 3-4AM (Don't remember) I was meant to download the full fat version of the vocoder but downloaded the small version instead. There is a fair difference in quality as you can see from this graphic.

However I realise a 450mb Vocoder might be pushing it, so it might be worth having a toggle.

image

So I fucked up. Admitiedly it was 3-4AM (Don't remember) I was meant to download the full fat version of the vocoder but downloaded the small version instead. There is a fair difference in quality as you can see from this graphic. However I realise a 450mb Vocoder might be pushing it, so it might be worth having a toggle. ![image](/attachments/8ea11028-c3fb-4109-bee4-8b0b91925fd7)
2.2 MiB
Owner

No dice. image

It just whines a bunch when using bigvgan_24khz_100band.

No dice. ![image](/attachments/5bba5ff6-65d3-482f-b6d6-9fe4561e3974) It just whines a bunch when using bigvgan_24khz_100band.
298 KiB
Author

There's a seperate config file for it. Here's the raw JSON.

Also, funny joke ;)

config.json
{
    "resblock": "1",
    "num_gpus": 0,
    "batch_size": 32,
    "learning_rate": 0.0001,
    "adam_b1": 0.8,
    "adam_b2": 0.99,
    "lr_decay": 0.999,
    "seed": 1234,

    "upsample_rates": [4,4,2,2,2,2],
    "upsample_kernel_sizes": [8,8,4,4,4,4],
    "upsample_initial_channel": 1536,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

    "activation": "snakebeta",
    "snake_logscale": true,

    "discriminator": "mrd",
    "resolutions": [[1024, 120, 600], [2048, 240, 1200], [512, 50, 240]],
    "mpd_reshapes": [2, 3, 5, 7, 11],
    "use_spectral_norm": false,
    "discriminator_channel_mult": 1,

    "segment_size": 8192,
    "num_mels": 100,
    "num_freq": 1025,
    "n_fft": 1024,
    "hop_size": 256,
    "win_size": 1024,

    "sampling_rate": 24000,

    "fmin": 0,
    "fmax": 12000,
    "fmax_for_loss": null,

    "num_workers": 4,

    "dist_config": {
        "dist_backend": "nccl",
        "dist_url": "tcp://localhost:54321",
        "world_size": 1
    }
}
There's a seperate config file for it. Here's the raw JSON. Also, funny joke ;) ``` config.json { "resblock": "1", "num_gpus": 0, "batch_size": 32, "learning_rate": 0.0001, "adam_b1": 0.8, "adam_b2": 0.99, "lr_decay": 0.999, "seed": 1234, "upsample_rates": [4,4,2,2,2,2], "upsample_kernel_sizes": [8,8,4,4,4,4], "upsample_initial_channel": 1536, "resblock_kernel_sizes": [3,7,11], "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]], "activation": "snakebeta", "snake_logscale": true, "discriminator": "mrd", "resolutions": [[1024, 120, 600], [2048, 240, 1200], [512, 50, 240]], "mpd_reshapes": [2, 3, 5, 7, 11], "use_spectral_norm": false, "discriminator_channel_mult": 1, "segment_size": 8192, "num_mels": 100, "num_freq": 1025, "n_fft": 1024, "hop_size": 256, "win_size": 1024, "sampling_rate": 24000, "fmin": 0, "fmax": 12000, "fmax_for_loss": null, "num_workers": 4, "dist_config": { "dist_backend": "nccl", "dist_url": "tcp://localhost:54321", "world_size": 1 } } ```
Owner

Right, I forgot they're married to JSON configs. Fixed in mrq/tortoise-tts commit fffea7fc038cc20945aa2208faf91c5434719b69.

Right, I forgot they're married to JSON configs. Fixed in mrq/tortoise-tts commit fffea7fc038cc20945aa2208faf91c5434719b69.
mrq closed this issue 2023-03-07 13:42:12 +00:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#78
No description provided.