• https://git.ecker.tech/ aims to provide a place to share my efforts while maintaining true ownership of my code, as I do not trust GitHub.

    XMR: 4B9TQdkAkBFYrbj5ztvTx89e5LpucPeTSPzemCihdDi9EBnx7btn8RDNZTBz2zihWsjMnDkzn5As1LU6gLv3KQy8BLsZ8SG

  • Joined on 2022-10-10
mrq commented on issue mrq/ai-voice-cloning#152 2023-09-02 18:21:40 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

Playing around with encodec encoding + vocos decoding. As good as vocos is, it still gives some minor audio artifacts for higher pitch voices. This puts an upperbound on the quality of the…

mrq pushed to master at mrq/vall-e 2023-09-02 17:22:34 +00:00
57db3ccfa8 shuffled VALL-E continuous as a task tts-c instead, logic fixes for it
mrq pushed to master at mrq/vall-e 2023-09-02 02:32:42 +00:00
2f06166ddd cleanups
mrq pushed to master at mrq/vall-e 2023-09-02 01:57:19 +00:00
e40c0d34a0 somewhat got recurrent forward working (it's as accurate as chunkwise forward: it's not accurate at all), added option to use AMP instead of blanket setting the weight's dtype
mrq pushed to master at mrq/vall-e 2023-09-01 22:18:31 +00:00
2bc2d08b09 (need to verify) added modifying model size and config bool to align with VALL-E continuous' methodology
mrq commented on issue mrq/ai-voice-cloning#363 2023-09-01 19:03:17 +00:00
Any tips for getting the fastest inference physically possible?

I cloned the fast fork and edited the autoregressive.py to use deepspeed. I saw some speed ups that were pretty nice. Sometimes as much as 7-10 seconds speed ups (compared to your fork). But on…

mrq commented on issue mrq/ai-voice-cloning#152 2023-09-01 18:42:20 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

I had gotten that from here, but I think yeah it is just plain incorrect and probably closer to the number you gave.

Seems like someone ran someone else's article (and not the paper itself)…

mrq commented on issue mrq/ai-voice-cloning#152 2023-09-01 18:03:20 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

It looks like the original vall-e model used ~140B parameters.

Where'd you get that number from? The papers (VALL-E, VALL-E X, SpeechX) don't mention a parameter count anywhere.

[NaturalSpe…

mrq commented on issue mrq/ai-voice-cloning#152 2023-09-01 01:36:47 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

I think I've got a good wrangling of any electrical-related issues over painful trial and error and isolation over the past few days. Turns out there's quite the rabbit hole that I just so…

mrq commented on issue mrq/ai-voice-cloning#362 2023-08-31 20:39:25 +00:00
WhisperX installation
.\venv\Scripts\activate.bat
pip3 install git+https://github.com/m-bain/whisperX

is the basic way to do it, but you pretty much need to cross your fingers and hope that all the…

mrq commented on issue mrq/ai-voice-cloning#361 2023-08-31 20:37:33 +00:00
American imposter

I would increase the temperature as 0.2 is a bit low for TorToiSe. I imagine that's the case, because I remember the base model will erase any non-American accents.

mrq pushed to master at mrq/vall-e 2023-08-30 23:21:53 +00:00
5c8694db8e nasty bandaid if there's no validation dataset specified during training (for example, during finetunes)
mrq commented on issue mrq/ai-voice-cloning#152 2023-08-30 21:54:45 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

I'm just posting to inform you that vast.ai is just a nugget for GPU cloud, often 3x cheaper than runpod for 3090/4090/A40. The trick is to activate "Unverified Machines"

Ah I see, I didn't…

mrq commented on pull request mrq/ai-voice-cloning#350 2023-08-30 18:40:18 +00:00
Websocket fixes / additions

Shit, I could have sworn I merged this after seeing it a few hours after being submitted a few days ago. Gomen.

mrq pushed to master at mrq/ai-voice-cloning 2023-08-30 18:39:36 +00:00
7110b878b7 Merge pull request 'Websocket fixes / additions' (#350) from ben_mkiv/ai-voice-cloning:master into master
b72f2216bf added websocket server arguments to enabled it (now disabled by default) and to specify the address/port to listen on
6f0f148782 websocket server: fix for model loading (just overriding args didn't do it after all...)
578a5bcadd websocket server: fix for model loading (just overriding args didn't do it after all...)
Compare 5 commits »
mrq merged pull request mrq/ai-voice-cloning#350 2023-08-30 18:39:35 +00:00
Websocket fixes / additions
mrq commented on issue mrq/ai-voice-cloning#152 2023-08-30 18:38:14 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

mmm... I think it's foolish to continue running training on the existing weights.

  • even before with the rental 4090s/3090s, the metrics never improved, they're just wavering between the ranges…
mrq commented on issue mrq/ai-voice-cloning#359 2023-08-30 17:51:36 +00:00
Are YouTube rips entirely unusable for finetuning?

My gut says:

  • the finetune is being trained too fast, as your initial LR is too high / your LR is not decaying fast enough.
  • the finetune is also not being trained long enough. 2300 steps /…
mrq commented on issue mrq/vall-e#7 2023-08-30 17:45:19 +00:00
Training error: ValueError: num_samples should be a positive integer value, but got num_samples=0

In the training YAML, copy over what's in the dataset.training into the dataset.validation. I could have sworn I had it fall back and do this itself for the validation dataset/dataloader, but…

mrq commented on issue mrq/ai-voice-cloning#358 2023-08-30 17:44:03 +00:00
The content of the generated sound is not correct

The issue I've ran into with naively using Japanese is that there's a problem with the way the default tokenizer will normalize Japanese text (it will convert kana/kanji the wrong way). I honestly…