Does bark feature fine tuning? #324

New Issue

Bluebomber182 · 2023-08-09T08:15:17Z

Bluebomber182 commented

2023-08-09 08:15:17 +00:00

If not, can you add this feature from this fork that features fine tuning?
https://github.com/serp-ai/bark-with-voice-clone

If not, can you add this feature from this fork that features fine tuning? https://github.com/serp-ai/bark-with-voice-clone

mrq commented

2023-08-10 18:24:15 +00:00

mmm...

desu Bark's integration with the web UI isn't going to be any more tightly coupled than it loosely is now. I genuinely don't remember why I added it outside of having it be an easy way to test its output and draw conclusion from it (it didn't wow me enough to continue).
the patches I lifted from that fork itself to allow for user-provided voices isn't working. I don't know why, and I've made three attempts to try to get it to work, and nothing will make it sound decent. It'd just be better to use the original fork that allows it at that point.
integrating another training script is a bit too much; I've effectively neglected the training script for what was my VALL-E fork, since I never actually bothered using the web UI to do my test trainings with VALL-E. Any and all code that's going to be used to do Bark finetunes of the fine/course models is just going to be pain.
extending on that, the web UI's code is already a giant mess that I'm still dreading with making the initiative to ~~clean it up~~ rewrite it. Stapling on training for another model is just going to exacerbate issues.

and above all:

for an extreme lack of a better term, the """meta""" seems to be around just using a normal TTS (base TorToiSe or base Bark) and running it through RVC for "great" quality output (I still need to actually see/hear for myself). Aside from the rewrite, I imagine it would just be better to add in using RVC into the "stack".
- and as an addendum, I feel that the RVC sphere of utilities is rather solved. From what pieces I gleaned over time, people just prefer using RVC.

If by some miracle I manage to get around to it, sure, I'll try. It's just, as things currently stand, Bark is already very low priority, much less finetuning it.

mmm... * desu Bark's integration with the web UI isn't going to be any more tightly coupled than it loosely is now. I genuinely don't remember why I added it outside of having it be an easy way to test its output and draw conclusion from it (it didn't wow me enough to continue). * the patches I lifted from that fork itself to allow for user-provided voices isn't working. I don't know why, and I've made three attempts to try to get it to work, and nothing will make it sound *decent*. It'd just be better to use the original fork that allows it at that point. * integrating another training script is a bit too much; I've effectively neglected the training script for what was my VALL-E fork, since I never actually bothered using the web UI to do my test trainings with VALL-E. Any and all code that's going to be used to do Bark finetunes of the fine/course models is just going to be pain. * extending on that, the web UI's code is already a giant mess that I'm still dreading with making the initiative to ~~clean it up~~ rewrite it. Stapling on training for another model is just going to exacerbate issues. and above all: * for an extreme lack of a better term, the """meta""" seems to be around just using a normal TTS (base TorToiSe or base Bark) and running it through RVC for "great" quality output (I still need to actually see/hear for myself). Aside from the rewrite, I imagine it would just be better to add in using RVC into the "stack". - and as an addendum, I feel that the RVC sphere of utilities is rather solved. From what pieces I gleaned over time, people just prefer using RVC. ***If*** by some miracle I manage to get around to it, sure, I'll try. It's just, as things currently stand, Bark is already very low priority, much less finetuning it.

Bluebomber182 closed this issue

2023-08-25 01:24:10 +00:00

Sign in to join this conversation.