Compare commits

...

34 Commits

Author SHA1 Message Date
mrq
5f80ee9b38 set use-deepspeed to false because it's not a dependency and installing it as a dependency under windows is a huge nightmare 2023-09-04 22:09:09 +00:00
ken11o2
29c270d1cc master (#369)
Add DeepSpeed feature for tortoise

Reviewed-on: mrq/ai-voice-cloning#369
Co-authored-by: ken11o2 <ken11o2@noreply.localhost>
Co-committed-by: ken11o2 <ken11o2@noreply.localhost>
2023-09-04 22:04:00 +00:00
mrq
7fc8f4c45a slight fixes 2023-09-03 12:34:55 +00:00
mrq
7110b878b7 Merge pull request 'Websocket fixes / additions' (#350) from ben_mkiv/ai-voice-cloning:master into master
Reviewed-on: mrq/ai-voice-cloning#350
2023-08-30 18:39:32 +00:00
13b65d8775 Merge branch 'master' of https://git.ecker.tech/ben_mkiv/ai-voice-cloning 2023-08-26 17:43:35 +02:00
b72f2216bf added websocket server arguments to enabled it (now disabled by default) and to specify the address/port to listen on 2023-08-26 17:38:58 +02:00
mrq
690947ad36 Do not double phonemize if using VALL-E backend (I wonder how many hours I've wasted from this oversight) 2023-08-26 00:02:17 +00:00
6f0f148782 websocket server: fix for model loading (just overriding args didn't do it after all...) 2023-08-26 01:41:29 +02:00
578a5bcadd websocket server: fix for model loading (just overriding args didn't do it after all...) 2023-08-26 01:40:35 +02:00
mrq
b4dc103931 I don't know how I did not commit the 'sample from the voices to construct the input prompt for vall-e' change but this helps 2023-08-25 04:26:48 +00:00
mrq
a657623cbc updated vall-e training template to use path-based speakers because it would just have a batch/epoch size of 1 otherwise; revert hardcoded 'spit processed dataset to this path' from my training rig to spit it out in a sane spot 2023-08-24 21:45:50 +00:00
mrq
533b73e083 fixed the overwrite regression for bark and vall-e backends too 2023-08-24 19:46:42 +00:00
mrq
f5fab33e9c fixed defaults for vall-e backend 2023-08-24 19:44:52 +00:00
mrq
4aa240d48a Merge pull request 'fix filename generation which didn't work and overwrote existing files' (#341) from ben_mkiv/ai-voice-cloning:master into master
Reviewed-on: mrq/ai-voice-cloning#341
2023-08-24 12:29:59 +00:00
00b173857d fix filename generation which didn't work and overwrote existing files 2023-08-24 09:57:01 +02:00
mrq
dc46fdc7d0 fixed another issue from haphazardly copying my changes from my training machine 2023-08-23 22:09:22 +00:00
mrq
29290f574e should fix issue that arises when trying to prepare the dataset without slicing segments 2023-08-23 21:49:22 +00:00
mrq
0a5483e57a updated valle yaml template 2023-08-23 21:42:32 +00:00
mrq
e613299304 Merge pull request 'favor existing arguments from parameters (kwargs) over global (args)' (#336) from ben_mkiv/ai-voice-cloning:master into master
Reviewed-on: mrq/ai-voice-cloning#336
2023-08-23 21:05:36 +00:00
ce24ba41e2 Websocket server, override args parameters for model settings (squashed)
Revert "favor existing arguments from parameters (kwargs) over global (args)"

This reverts commit 89102347a956ebcfe9a83ae7d1aa1336f1c53483.

args are now updated in the websocket server
2023-08-23 19:40:39 +02:00
mrq
5f4215b3ef Merge pull request 'websocket server: API change(!), better response format' (#334) from ben_mkiv/ai-voice-cloning:master into master
Reviewed-on: mrq/ai-voice-cloning#334
2023-08-22 20:35:42 +00:00
5d73d9e71c small QoL change to the StringNone helper, to allow generated text to be "None", maybe someone wants to generate that, we never know... 2023-08-22 21:49:49 +02:00
9abcb0f193 websocket server: API change(!), better response format 2023-08-22 21:37:19 +02:00
mrq
fb1cfd059f Merge pull request 'websocket server: small fix' (#333) from ben_mkiv/ai-voice-cloning:master into master
Reviewed-on: mrq/ai-voice-cloning#333
2023-08-22 19:26:37 +00:00
1ec3344999 Merge branch 'master' of https://git.ecker.tech/ben_mkiv/ai-voice-cloning 2023-08-22 21:00:06 +02:00
a902913780 websocket server: workaround for values and None type 2023-08-22 20:20:49 +02:00
mrq
2060b6f21c fixed issue with sliced audio being the wrong sample rate 2023-08-22 14:22:39 +00:00
mrq
eeddd4cb6b forgot the important reason I even started working on AIVC again 2023-08-21 03:42:12 +00:00
mrq
72a38ff2fc made initialization faster if there's a lot of voice files (because glob fucking sucks), commiting changes buried on my training rig 2023-08-21 03:31:49 +00:00
mrq
91a0c495ff Merge pull request 'added simple websocket server which allows to start tts generation tasks, retrieving autoregressive models and voices list' (#328) from ben_mkiv/ai-voice-cloning:master into master
Reviewed-on: mrq/ai-voice-cloning#328
2023-08-16 14:01:44 +00:00
2626364c40 added simple websocket server which allows to start tts generation tasks, retrieving autoregressive models and voices list 2023-08-16 12:51:13 +02:00
mrq
ac645e0a20 no longer need to install bark under ./modules/ 2023-07-11 16:20:28 +00:00
mrq
e2a6dc1c0a under bark, properly use transcribed audio if the audio wasn't actually sliced (oops) 2023-07-11 14:53:32 +00:00
mrq
a325496661 Merge pull request 'Freeze pydantic package to 1.10.11' (#301) from Jarod/ai-voice-cloning:master into master
Reviewed-on: mrq/ai-voice-cloning#301
2023-07-09 15:06:31 +00:00
8 changed files with 5139 additions and 4806 deletions

View File

@ -1,8 +1,8 @@
# AI Voice Cloning
This [repo](https://git.ecker.tech/mrq/ai-voice-cloning)/[rentry](https://rentry.org/AI-Voice-Cloning/) aims to serve as both a foolproof guide for setting up AI voice cloning tools for legitimate, local use on Windows/Linux, as well as a stepping stone for anons that genuinely want to play around with [TorToiSe](https://github.com/neonbjb/tortoise-tts).
> **Note** This project has been in dire need of being rewritten from the ground up for some time. Apologies for any crust from my rather spaghetti code.
Similar to my own findings for Stable Diffusion image generation, this rentry may appear a little disheveled as I note my new findings with TorToiSe. Please keep this in mind if the guide seems to shift a bit or sound confusing.
This [repo](https://git.ecker.tech/mrq/ai-voice-cloning)/[rentry](https://rentry.org/AI-Voice-Cloning/) aims to serve as both a foolproof guide for setting up AI voice cloning tools for legitimate, local use on Windows/Linux, as well as a stepping stone for anons that genuinely want to play around with [TorToiSe](https://github.com/neonbjb/tortoise-tts).
>\>Ugh... why bother when I can just abuse 11.AI?

View File

@ -1,13 +1,106 @@
data_dirs: [./training/${voice}/valle/]
spkr_name_getter: "lambda p: p.parts[-3]" # "lambda p: p.parts[-1].split('-')[0]"
dataset:
training: [
"./training/${voice}/valle/",
]
noise: [
"./training/valle/data/Other/noise/",
]
speaker_name_getter: "lambda p: p.parts[-3]" # "lambda p: f'{p.parts[-3]}_{p.parts[-2]}'"
use_hdf5: False
hdf5_name: data.h5
hdf5_flag: r
validate: True
max_phones: 72
workers: 4
cache: False
models: '${models}'
batch_size: ${batch_size}
gradient_accumulation_steps: ${gradient_accumulation_size}
eval_batch_size: ${batch_size}
phones_range: [4, 64]
duration_range: [1.0, 8.0]
max_iter: ${iterations}
save_ckpt_every: ${save_rate}
eval_every: ${validation_rate}
random_utterance: 1.0
max_prompts: 3
prompt_duration: 3.0
sample_type: path
tasks_list: ["tts"] # ["tts", "ns", "sr", "tse", "cse", "nse", "tts"]
models:
_max_levels: 8
_models:
- name: "ar"
size: "full"
resp_levels: 1
prom_levels: 2
tasks: 8
arch_type: "retnet"
- name: "nar"
size: "full"
resp_levels: 3
prom_levels: 4
tasks: 8
arch_type: "retnet"
hyperparameters:
batch_size: ${batch_size}
gradient_accumulation_steps: ${gradient_accumulation_size}
gradient_clipping: 100
optimizer: AdamW
learning_rate: 1.0e-4
scheduler_type: ""
evaluation:
batch_size: ${batch_size}
frequency: ${validation_rate}
size: 16
steps: 300
ar_temperature: 0.95
nar_temperature: 0.25
trainer:
iterations: ${iterations}
save_tag: step
save_on_oom: True
save_on_quit: True
export_on_save: True
export_on_quit: True
save_frequency: ${save_rate}
keep_last_checkpoints: 4
aggressive_optimizations: False
load_state_dict: True
#strict_loading: False
#load_tag: "9500"
#load_states: False
#restart_step_count: True
gc_mode: None # "global_step"
weight_dtype: bfloat16
backend: deepspeed
deepspeed:
zero_optimization_level: 2
use_compression_training: True
inference:
use_vocos: True
normalize: False
weight_dtype: float32
bitsandbytes:
enabled: False
injects: True
linear: True
embedding: True

@ -1 +1 @@
Subproject commit 5ff00bf3bfa97e2c8e9f166b920273f83ac9d8f0
Subproject commit b10c58436d6871c26485d30b203e6cfdd4167602

View File

@ -7,4 +7,5 @@ music-tag
voicefixer
psutil
phonemizer
pydantic==1.10.11
pydantic==1.10.11
websockets

View File

@ -0,0 +1,84 @@
import asyncio
import json
from threading import Thread
from websockets.server import serve
from utils import generate, get_autoregressive_models, get_voice_list, args, update_autoregressive_model, update_diffusion_model, update_tokenizer
# this is a not so nice workaround to set values to None if their string value is "None"
def replaceNoneStringWithNone(message):
ignore_fields = ['text'] # list of fields which CAN have "None" as literal String value
for member in message:
if message[member] == 'None' and member not in ignore_fields:
message[member] = None
return message
async def _handle_generate(websocket, message):
# update args parameters which control the model settings
if message.get('autoregressive_model'):
update_autoregressive_model(message['autoregressive_model'])
if message.get('diffusion_model'):
update_diffusion_model(message['diffusion_model'])
if message.get('tokenizer_json'):
update_tokenizer(message['tokenizer_json'])
if message.get('sample_batch_size'):
global args
args.sample_batch_size = message['sample_batch_size']
message['result'] = generate(**message)
await websocket.send(json.dumps(replaceNoneStringWithNone(message)))
async def _handle_get_autoregressive_models(websocket, message):
message['result'] = get_autoregressive_models()
await websocket.send(json.dumps(replaceNoneStringWithNone(message)))
async def _handle_get_voice_list(websocket, message):
message['result'] = get_voice_list()
await websocket.send(json.dumps(replaceNoneStringWithNone(message)))
async def _handle_message(websocket, message):
message = replaceNoneStringWithNone(message)
if message.get('action') and message['action'] == 'generate':
await _handle_generate(websocket, message)
elif message.get('action') and message['action'] == 'get_voices':
await _handle_get_voice_list(websocket, message)
elif message.get('action') and message['action'] == 'get_autoregressive_models':
await _handle_get_autoregressive_models(websocket, message)
else:
print("websocket: undhandled message: " + message)
async def _handle_connection(websocket, path):
print("websocket: client connected")
async for message in websocket:
try:
await _handle_message(websocket, json.loads(message))
except ValueError:
print("websocket: malformed json received")
async def _run(host: str, port: int):
print(f"websocket: server started on ws://{host}:{port}")
async with serve(_handle_connection, host, port, ping_interval=None):
await asyncio.Future() # run forever
def _run_server(listen_address: str, port: int):
asyncio.run(_run(host=listen_address, port=port))
def start_websocket_server(listen_address: str, port: int):
Thread(target=_run_server, args=[listen_address, port], daemon=True).start()

View File

@ -11,6 +11,9 @@ os.environ['PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION'] = 'python'
from utils import *
from webui import *
from api.websocket_server import start_websocket_server
if __name__ == "__main__":
args = setup_args()
@ -23,6 +26,9 @@ if __name__ == "__main__":
if not args.defer_tts_load:
tts = load_tts()
if args.websocket_enabled:
start_websocket_server(args.websocket_listen_address, args.websocket_listen_port)
webui.block_thread()
elif __name__ == "main":
from fastapi import FastAPI
@ -37,4 +43,5 @@ elif __name__ == "main":
app = gr.mount_gradio_app(app, webui, path=args.listen_path)
if not args.defer_tts_load:
tts = load_tts()
tts = load_tts()

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff