Great job! #72

New Issue

st33lmouse · 2023-03-06T10:42:53Z

st33lmouse commented

2023-03-06 10:42:53 +00:00

This gui and the tools here are what make tortoise usable. Any AI project that doesn't have something like this is missing out on a lot of usability. Fantastic work!!!

I'll second the motion on some kind of batching system. The nature of audio work is that you get a long list of sentences to be spoken by a character and you just want the machine to munch on them for awhile.

Maybe point the AI at a text list of separated sentences, then have it output to a folder with file names like Kennedy_01A, Kennedy_01B Kennedy_01C for the first sentence and 3 candidates, then Kennedy_02A for second sentence, first candidate and so on. Then you can check them out later at your leisure.

The other item on my wishlist is voice generation from thin air. Don't know how you would go about it, but it would be nice!

I see a few bugs here and there, I'll see if I can do my bit and report some.

This gui and the tools here are what make tortoise usable. Any AI project that doesn't have something like this is missing out on a lot of usability. Fantastic work!!! I'll second the motion on some kind of batching system. The nature of audio work is that you get a long list of sentences to be spoken by a character and you just want the machine to munch on them for awhile. Maybe point the AI at a text list of separated sentences, then have it output to a folder with file names like Kennedy_01A, Kennedy_01B Kennedy_01C for the first sentence and 3 candidates, then Kennedy_02A for second sentence, first candidate and so on. Then you can check them out later at your leisure. The other item on my wishlist is voice generation from thin air. Don't know how you would go about it, but it would be nice! I see a few bugs here and there, I'll see if I can do my bit and report some.

mrq commented

2023-03-06 18:47:42 +00:00

I'll second the motion on some kind of batching system. The nature of audio work is that you get a long list of sentences to be spoken by a character and you just want the machine to munch on them for awhile.

Unless I'm misinterpreting, that should already be covered with the Line Delimiter option where it'll split up by lines.

Maybe point the AI at a text list of separated sentences, then have it output to a folder with file names like Kennedy_01A, Kennedy_01B Kennedy_01C for the first sentence and 3 candidates, then Kennedy_02A for second sentence, first candidate and so on. Then you can check them out later at your leisure.

It'll be formatted as ./results/{voice}/{voice}_{index}_{line}_{candidate}.wav, and combined at the end into ./results/{voice}/{voice}_{index}_{candidate}_combined.wav (where index is the generation number, _{line} is ignored if only 1 line, _{candidate} is ignored if one candidate).

Although desu, maybe lettering would be better for candidates, as the appended numbering can get quite bothersome.

The other item on my wishlist is voice generation from thin air. Don't know how you would go about it, but it would be nice!

The closest would be the random voice option, where it'll generate a new voice without any specific input data. From a cursory glance at the code, I guess it samples from a model of existing latents to generate a new one.

> I'll second the motion on some kind of batching system. The nature of audio work is that you get a long list of sentences to be spoken by a character and you just want the machine to munch on them for awhile. Unless I'm misinterpreting, that should already be covered with the `Line Delimiter` option where it'll split up by lines. > Maybe point the AI at a text list of separated sentences, then have it output to a folder with file names like Kennedy_01A, Kennedy_01B Kennedy_01C for the first sentence and 3 candidates, then Kennedy_02A for second sentence, first candidate and so on. Then you can check them out later at your leisure. It'll be formatted as `./results/{voice}/{voice}_{index}_{line}_{candidate}.wav`, and combined at the end into `./results/{voice}/{voice}_{index}_{candidate}_combined.wav` (where index is the generation number, `_{line}` is ignored if only 1 line, `_{candidate}` is ignored if one candidate). Although desu, maybe lettering would be better for candidates, as the appended numbering can get quite bothersome. > The other item on my wishlist is voice generation from thin air. Don't know how you would go about it, but it would be nice! The closest would be the `random` voice option, where it'll generate a new voice without any specific input data. From a cursory glance at the code, I guess it samples from a model of existing latents to generate a new one.

mrq closed this issue

2023-03-07 02:49:46 +00:00

Sign in to join this conversation.