Question; How to continually run from CLI #261
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
5 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#261
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Just a question,
I'm trying to run TTTS in a python script. Right now I'm just using the line as such :
!python /content/ai-voice-cloning/ai-voice-cloning/modules/tortoise-tts/scripts/tortoise_tts.py --seed 42 "The quick brown fox " -O /content/ai-voice-cloning/results
This works, but every time I run it I have to do all the bootup stuff again.
Loaded tokenizer
Loading autoregressive model: /content/ai-voice-cloning/ai-voice-cloning/models/tortoise/autoregressive.pth
Loaded autoregressive model
Loaded diffusion model
Loading vocoder model: None
Loading vocoder model: bigvgan_24khz_100band.pth
Removing weight norm...
Loaded vocoder model
Is there a way to keep my current 'session' open and generate a couple lines? I think thats how the webui works and then it only has to reload when you change up the settings/model ?
I should probably also ask is this actually utilizing this fork or is this just using the base TTTS
See notes from #242 regarding generation from CLI.
Okay,
I got the Gradio API method working,
However after it finishes generating, I keep hitting an assertion
8 frames
/usr/local/lib/python3.10/dist-packages/gradio_client/serializing.py in _deserialize_single(self, x, save_dir, root_url, hf_token)
292 else:
293 data = x.get("data")
--> 294 assert data is not None, f"The 'data' field is missing in {x}"
295 file_name = utils.decode_base64_to_file(data, dir=save_dir).name
296 else:
AssertionError: The 'data' field is missing in {'visible': False, 'value': None, 'type': 'update'}
It actually generates the audio file, but I think when its trying to return its messing up somewhere. Probably in output or source ?
According to the API it returns
Any ideas?
That's a new one for me. Does it happen if you revert to a previous commit?
I can give it a try this weekend, but honestly I just decided to catch the exception and move on because since its still generating the files its good enough lol
I'm also wondering if this is possible. Whenever I use the cli.py script, it still goes through the whole Loading/Loaded boot sequence. Is there a way to keep the loading "in session"?
`
import subprocess
import time
import requests
import os
import json
cmd = 'cd ai-voice-cloning && start.bat'
process = subprocess.Popen(cmd, shell=True)
time.sleep(30)
voice_name = "name"
response = requests.post("http://127.0.0.1:7860/run/generate", json={
"data": [
"Hello World", #'Input Prompt'
"\n", #'Line Delimiter'
"Happy", #'Emotion'
"",# 'Custom Emotion'
voice_name,# 'Voice' Dropdown component
{"name":"audio.wav","data":"data:audio/wav;base64,UklGRiQAAABXQVZFZm10IBAAAAABAAEARKwAAIhYAQACABAAZGF0YQAAAAA="}, # represents audio data as object with filename and base64 string of 'Microphone Source' Audio component
0,#'Voice Chunks'
1,#'Candidates'
0,#'Seed'
256,#Samples
400,#Iterations
0.8,#Temperature
"DDIM",#'Diffusion Samplers'
2,#Pause Size
0,#CVVP Weight
0.55,#'Top P'
1,#'Diffusion Temperature'
1,#'Length Penalty'
2,#'Repetition Penalty'
2,#'Conditioning-Free K'
["Half Precision"],# 'Experimental Flags'
False,#'Use Original Latents Method (AR)'
False,#Use Original Latents Method (Diffusion)
]
}).json()
data = response["data"]
`
I am able to generate audio with the above code, but it comes out as a garbled mess, I think it has to do with
{"name":"audio.wav","data":"data:audio/wav;base64,UklGRiQAAABXQVZFZm10IBAAAAABAAEARKwAAIhYAQACABAAZGF0YQAAAAA="}, # represents audio data as object with filename and base64 string of 'Microphone Source' Audio component
not looking at the correct sample audio clips, but I couldnt find any examples on how to get the data
What I do is use the trained voice data on the original tortoise-tts, as it's quite a bit faster on Linux than the version bundled in this project.
And then I hacked do_tts.py to read through a txt file and generate each row, without having to reload training data each time.