How to provide Dynamic Prompt Setting Editing (switch between voices) #314

Open
opened 2023-07-25 23:27:13 +00:00 by xasima · 5 comments

Hi, how to proper specify Prompt Setting Editing (switch between voices), mentioned in Generate wiki section?

python .\src\cli.py --text='{"voice": "random"} Is that really you, Mary? \n {"voice": "random"}The name is Maria.'
python .\src\cli.py --text="{\"voice\": \"random\"} Is that really you, Mary? \n {\"voice\": \"random\"}The name is Maria."
python .\src\cli.py --text="{'voice': 'random'} Is that really you, Mary? \n {'voice': 'random'}The name is Maria."

Both UI and CLI doesn't respect the JSON at the start of the line, and trying to pronounce the word "voice" instead of the switch to the predefined voice from the ./voices folder

Hi, how to proper specify Prompt Setting Editing (switch between voices), mentioned in Generate wiki section? ``` python .\src\cli.py --text='{"voice": "random"} Is that really you, Mary? \n {"voice": "random"}The name is Maria.' python .\src\cli.py --text="{\"voice\": \"random\"} Is that really you, Mary? \n {\"voice\": \"random\"}The name is Maria." python .\src\cli.py --text="{'voice': 'random'} Is that really you, Mary? \n {'voice': 'random'}The name is Maria." ``` Both UI and CLI doesn't respect the JSON at the start of the line, and trying to pronounce the word "voice" instead of the switch to the predefined voice from the ./voices folder
xasima changed title from How to provice Dynamic Prompt Setting Editing (switch between voices) to How to provide Dynamic Prompt Setting Editing (switch between voices) 2023-07-25 23:28:24 +00:00
Owner

Mary? \n {"voice":

I think the issue is that you have a space after the \n.

I should be able to fix this by stripping for whitespace before checking if needed.

> `Mary? \n {"voice":` I think the issue is that you have a space after the `\n`. I should be able to fix this by stripping for whitespace before checking if needed.
Author

Thanks for noticing. I have figured out the issue from the logs.

# Possible issue of overriding of emotion / voices
I have tried to work with UI at first, so set up the emotion and voice on UI level. Then these values has been silently saved into config/generate.json. When I run the aforementioned CLI commands, I have noticed that the generate.json is updated at least to a new "text" fields, but emotions and voice has been remained as previously saved. Suprisingly, the mention of the JSON with new voices doesn't override that generate.json voice and emotion pairs, which remained hardcoded. Moreover, the override doesn't happened on a execution phase, so a resulting wav-sound reading aloud even a very first JSON.

So, I manually empty to "" the emotion and voice from config/generate.json, so it starts working

#Possible pollution of config/generate.json if malformed text
If something wrong with parsing occurs, then it seems that config/generate.json may gone bad, so subsequent corrected invocation fails nevertheless due to previously malformed json. I don't sure how to reproduce, but I invoke CLI with different escaping of " ' in the json inline of the text, as well as adding propts []. These leads to set up prompt: null automaticaly that breaks the generate.json, so later I need to reset to prompt: "" manually

#Still unsure how to specify multiple json voices

This works but read loudly a second random world. Notice please that I have missed a space betwenn a second json and "The name". The result is attached
python src/cli.py --text='{"voice": "random"} Is that really you, Mary? \n{"voice": "random"}The name is Maria!'

If adding an expected space after second JSON, then it fails to run with a following error
` python src/cli.py --text='{"voice": "random"} Is that really you, Mary? \n{"voice": "random"} The name is Maria!'

[1/1] Generating line: {"voice": "random"} Is that really you, Mary? \n{"voice": "random"} The name is Maria!
Traceback (most recent call last):
File "/mnt/d/ai-voice-cloning/src/utils.py", line 1182, in generate_tortoise
override = json.loads(match[0])
File "/home/aperez/miniconda3/envs/tortoise2/lib/python3.9/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/home/aperez/miniconda3/envs/tortoise2/lib/python3.9/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 21 (char 20)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/mnt/d/ai-voice-cloning/src/cli.py", line 66, in
generate(**kwargs)
File "/mnt/d/ai-voice-cloning/src/utils.py", line 345, in generate
return generate_tortoise(**kwargs)
File "/mnt/d/ai-voice-cloning/src/utils.py", line 1185, in generate_tortoise
raise Exception("Prompt settings editing requested, but received invalid JSON")
Exception: Prompt settings editing requested, but received invalid JSON
`

Thanks for noticing. I have figured out the issue from the logs. **# Possible issue of overriding of emotion / voices** I have tried to work with UI at first, so set up the emotion and voice on UI level. Then these values has been silently saved into config/generate.json. When I run the aforementioned CLI commands, I have noticed that the generate.json is updated at least to a new "text" fields, but emotions and voice has been remained as previously saved. Suprisingly, the mention of the JSON with new voices doesn't override that generate.json voice and emotion pairs, which remained hardcoded. Moreover, the override doesn't happened on a execution phase, so a resulting wav-sound reading aloud even a very first JSON. So, I manually empty to "" the emotion and voice from config/generate.json, so it starts working **#Possible pollution of config/generate.json if malformed text** If something wrong with parsing occurs, then it seems that config/generate.json may gone bad, so subsequent corrected invocation fails nevertheless due to previously malformed json. I don't sure how to reproduce, but I invoke CLI with different escaping of \" \' in the json inline of the text, as well as adding propts []. These leads to set up prompt: null automaticaly that breaks the generate.json, so later I need to reset to prompt: "" manually **#Still unsure how to specify multiple json voices** This works but read loudly a second random world. Notice please that I have missed a space betwenn a second json and "The name". The result is attached ` python src/cli.py --text='{"voice": "random"} Is that really you, Mary? \n{"voice": "random"}The name is Maria!' ` If adding an expected space after second JSON, then it fails to run with a following error ` python src/cli.py --text='{"voice": "random"} Is that really you, Mary? \n{"voice": "random"} The name is Maria!' [1/1] Generating line: {"voice": "random"} Is that really you, Mary? \n{"voice": "random"} The name is Maria! Traceback (most recent call last): File "/mnt/d/ai-voice-cloning/src/utils.py", line 1182, in generate_tortoise override = json.loads(match[0]) File "/home/aperez/miniconda3/envs/tortoise2/lib/python3.9/json/__init__.py", line 346, in loads return _default_decoder.decode(s) File "/home/aperez/miniconda3/envs/tortoise2/lib/python3.9/json/decoder.py", line 340, in decode raise JSONDecodeError("Extra data", s, end) json.decoder.JSONDecodeError: Extra data: line 1 column 21 (char 20) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/mnt/d/ai-voice-cloning/src/cli.py", line 66, in <module> generate(**kwargs) File "/mnt/d/ai-voice-cloning/src/utils.py", line 345, in generate return generate_tortoise(**kwargs) File "/mnt/d/ai-voice-cloning/src/utils.py", line 1185, in generate_tortoise raise Exception("Prompt settings editing requested, but received invalid JSON") Exception: Prompt settings editing requested, but received invalid JSON `
Author

It seems to be only CLI specific, since I have sucessfully managed to get the proper results via UI....

It seems to be only CLI specific, since I have sucessfully managed to get the proper results via UI....

Would love to know how far prompt engineering can go and even the effect punctuation has in the text such as : and other ways of expressing dialogue and context. Seems to be little out there. I went on a bit of a hunt for more info and found very little , this is one example although how it can relate to tortoise I dont know https://arxiv.org/abs/2211.12171 i did gather some more links and papers will post them when I can find them..

Would love to know how far prompt engineering can go and even the effect punctuation has in the text such as : and other ways of expressing dialogue and context. Seems to be little out there. I went on a bit of a hunt for more info and found very little , this is one example although how it can relate to tortoise I dont know https://arxiv.org/abs/2211.12171 i did gather some more links and papers will post them when I can find them..

Would love to know how far prompt engineering can go and even the effect punctuation has in the text such as : and other ways of expressing dialogue and context. Seems to be little out there. I went on a bit of a hunt for more info and found very little , this is one example although how it can relate to tortoise I dont know https://arxiv.org/abs/2211.12171 i did gather some more links and papers will post them when I can find them..

There are neat tricks. Sometimes a dataset/model for whatever reason can't pronounce a word correctly, or seems stuck on a particular "interpretation" of the prompt. Especially on unusual or complex multiple syllable words, and names.

For instance, if a prompt like "Giorgio Armani" wasn't being pronounced correctly, creating a line like this:

Giorgio Armani; Giorgio Armani. Giorgio Armani, Giorgio Armani:

or swapping the order of the punctuation, can nudge the prompt enough such that it miraculously pronounces it correctly.

Another tidbit is if a sentence itself is problematic for some reason, "While we sat in the car park, the ravens arrived and ate the bread bits we left on the windscreen." can be modified, broken up, and glued back together after:

While we sat in the car park: the ravens arrived;
and ate the bread bits,
we left on the windscreen.

something like that... basically trial and error, and be willing to subtly rewrite the sentence. Also, have a dedicated seed is pretty much mandatory (from experience)

> Would love to know how far prompt engineering can go and even the effect punctuation has in the text such as : and other ways of expressing dialogue and context. Seems to be little out there. I went on a bit of a hunt for more info and found very little , this is one example although how it can relate to tortoise I dont know https://arxiv.org/abs/2211.12171 i did gather some more links and papers will post them when I can find them.. There are neat tricks. Sometimes a dataset/model for whatever reason can't pronounce a word correctly, or seems stuck on a particular "interpretation" of the prompt. Especially on unusual or complex multiple syllable words, and names. For instance, if a prompt like "Giorgio Armani" wasn't being pronounced correctly, creating a line like this: Giorgio Armani; Giorgio Armani. Giorgio Armani, Giorgio Armani: or swapping the order of the punctuation, can nudge the prompt enough such that it miraculously pronounces it correctly. Another tidbit is if a sentence itself is problematic for some reason, "While we sat in the car park, the ravens arrived and ate the bread bits we left on the windscreen." can be modified, broken up, and glued back together after: While we sat in the car park: the ravens arrived; and ate the bread bits, we left on the windscreen. something like that... basically trial and error, and be willing to subtly rewrite the sentence. Also, have a dedicated seed is pretty much mandatory (from experience)
Sign in to join this conversation.
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#314
No description provided.