web interface #397
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#397
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This is indeed very nice module and I have really experienced it very excellent.
As far as I understood, it adds emotions as well bz web interface, can I have the diagram how it works to add emotions_ ? or which model is the main source to add emotions? is this CLIP or CLAP or else?
Major cloning takes place with CLVP and diffusion.
Please if you can guide me would let me understand its workflow easily.
The "emotion" control is simply just adding in
[I am really {emotion}],
to the start of the text prompt. If I remember right from the original neonbjb/tortoise-tts repo, it "leverages" the AR model's ability to derive emotion from the text prompt, and then redacts the text with wav2vec2 alignment at the end of the inference call.I never found it useful enough in my testing, but it was a feature toted in the original, and when porting all the available features from the
do_tts.py
into a web UI, it was also carried over for feature completeness.Additionally, you can also make use of the text redaction to try and influence the output by wrapping the text in
[]
. It will try and influence the final output but will get removed in the final clip.Thank you. Really nice explanation.
Indeed a nice helping reply. Not only that. Really a great structure and nice interface.