doctorjuice
  • Joined on 2023-07-15
doctorjuice commented on issue mrq/ai-voice-cloning#152 2023-10-06 18:44:01 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

You were right, at around loss 3.0 I am getting human-like sounds (this is just on 30 hours of audio...). I was able to add some lines to emit the metrics separately. It looks like the ar loss is…

doctorjuice commented on issue mrq/ai-voice-cloning#152 2023-10-05 23:16:36 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

Another thing that would be fairly useful for the ar+nar class: Right now, you can only see the combined loss and accuracy. One thing that may be useful to adjust over time is the p_ar_level.…

doctorjuice commented on issue mrq/ai-voice-cloning#152 2023-10-04 19:10:58 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

Cool, that's useful for the purposes of debugging anyway. I do see in some of your earlier posts how sometimes quality versus loss/acc can be inconsistent.

Another question, I'm using the…

doctorjuice commented on issue mrq/ai-voice-cloning#152 2023-10-04 13:16:04 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

Although it's kind of hard to say exactly when these milestones precisely occurred. I'll have to assume an average sample would be 64 text tokens + 75 * 6 audio tokens = 514 tokens per sample,…

doctorjuice commented on issue mrq/ai-voice-cloning#152 2023-10-03 21:58:37 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

So, I'm trying to overfit on just 3 speakers just to ensure I have things set up correctly. I'd like to query exactly same data from the training set to ensure everything is going fine.

Right…

doctorjuice commented on issue mrq/ai-voice-cloning#152 2023-09-02 17:02:37 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

Another question: how are you plotting your loss curves etc? Was going to write some code for it, but looks like you were producing them somehow. Maybe I missed them in the repo.

doctorjuice commented on issue mrq/ai-voice-cloning#152 2023-09-02 16:51:13 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

Playing around with encodec encoding + vocos decoding. As good as vocos is, it still gives minor some audio artifacts for higher pitch voices. This puts on upperbound on the quality of the model,…

doctorjuice commented on issue mrq/ai-voice-cloning#152 2023-09-01 18:19:37 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

It looks like the original vall-e model used ~140B parameters.

Where'd you get that number from? The papers (VALL-E, VALL-E X, SpeechX) don't mention a parameter count anywhere.

doctorjuice commented on issue mrq/ai-voice-cloning#152 2023-09-01 16:54:23 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

It looks like the original vall-e model used ~140B parameters. That can't fit into a 4070 can it, so are you using a smaller model size? Does size: "full" correspond to the original paper model…

doctorjuice commented on issue mrq/ai-voice-cloning#152 2023-08-24 19:29:28 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

Thanks, I'll look into that.

And what about model size? How do you control that currently? I didn't see any params for it in config.yaml.

doctorjuice commented on issue mrq/ai-voice-cloning#152 2023-08-24 18:24:37 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

I'm looking to make use of multiple GPUs, but for all scripts used in the repo, looks like it's overriding my PyTorch DataParallel settings, etc with whatever's being set by deepspeed. Struggling…

doctorjuice commented on issue mrq/ai-voice-cloning#152 2023-08-23 17:22:42 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

Streaming is very valuable but yeah it is surprisingly tough for most things.

Looks like you’re moving forward with RetNet, right? Why is that when the “vanilla” (no recurrent steps)…

doctorjuice commented on issue mrq/ai-voice-cloning#152 2023-08-22 19:07:29 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

For sure, having an already prepared dataset is very helpful. I had tried the script provided for dataset preparation that you had in the readme, but there were errors unpickling the audios that I…

doctorjuice commented on issue mrq/ai-voice-cloning#152 2023-08-22 13:08:25 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

@mrq Appreciate the response, and I totally get it. Thanks for letting me know, and good luck with all the work you’re doing here.

doctorjuice commented on issue mrq/ai-voice-cloning#152 2023-08-21 21:29:53 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

Hey @mrq , I sent you an email to mrq@ecker.tech reaching out about some things. Let me know if you’ve seen it and are able to respond there, thanks!

doctorjuice commented on issue mrq/ai-voice-cloning#152 2023-08-09 20:05:28 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

Does it seem the RetNet approach is better / more data efficient, or better to use the original vall-e implementation?

Also, I am using the phonemizer, but is keeps coming up with None values…

doctorjuice commented on issue mrq/ai-voice-cloning#152 2023-08-06 16:21:21 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

This repo's web UI handles it fine with the Train > Prepare Dataset tab (or whatever I ended up calling it again). It'll handle the entire stack from transcribing with Whisper (or preferably,…

doctorjuice commented on issue mrq/ai-voice-cloning#152 2023-08-04 20:48:27 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

Trying to get proper transcriptions right now for this repo.

I just made use of the openai-whisper package and with the "tiny" model. Do you think that's sufficient? I see you're…

doctorjuice commented on issue mrq/ai-voice-cloning#152 2023-08-03 20:45:10 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

Trying to get proper transcriptions right now for this repo.

I just made use of the openai-whisper package and with the "tiny" model. Do you think that's sufficient? I see you're using whisperX…

doctorjuice commented on issue mrq/ai-voice-cloning#152 2023-07-25 14:47:20 +00:00
VALL-E Integration (and In Response To TorToiSe: a Quick Retrospective)

Just wanted to say, I love what you're doing and your detailed updates. I wish I could do something similar, but I have my day job which gets in the way. How are you able to juggle this with work…