diff --git a/README.md b/README.md index 24ea1c6..6232648 100644 --- a/README.md +++ b/README.md @@ -59,6 +59,7 @@ For LoRAs, replace the above `fp32.pth` with `lora.pth`. - [x] Re-enable DDIM sampler - [ ] Extend the original inference routine with additional features: - [ ] non-float32 / mixed precision for the entire stack + - Parts of the stack will whine about mismatching dtypes... - [x] BitsAndBytes support - Provided Linears technically aren't used because GPT2 uses Conv1D instead... - [x] LoRAs @@ -75,12 +76,16 @@ For LoRAs, replace the above `fp32.pth` with `lora.pth`. - [ ] Saner way of loading finetuned models / LoRAs - [ ] Some vector embedding store to find the "best" utterance to pick - [ ] Documentation + - this also includes a correct explanation of the entire stack (rather than the poor one I left in ai-voice-cloning) ## Why? -To correct the mess I've made with forking TorToiSe TTS originally with a bunch of slopcode, and the nightmare that ai-voice-cloning turned out. - -Additional features can be applied to the program through a framework of my own that I'm very familiar with. +To: +* atone for the mess I've made with forking TorToiSe TTS originally with a bunch of slopcode, and the nightmare that ai-voice-cloning turned out. +* unify the trainer and the inference-er. +* implement additional features with much ease, as I'm very well familiar with my framework. +* disillusion myself that it won't get better than TorToiSe TTS: + - while it's faster than VALL-E, the quality leaves a lot to be desired (although this is simply due to the overall architecture). ## License