readme tweaks

2024-06-28 21:02:40 -05:00 · 2024-06-28 21:02:40 -05:00 · e0a93a6400
commit e0a93a6400
parent 80d6494973
1 changed files with 8 additions and 3 deletions
--- a/README.md
+++ b/README.md
@ -59,6 +59,7 @@ For LoRAs, replace the above `fp32.pth` with `lora.pth`.
  - [x] Re-enable DDIM sampler
 - [ ] Extend the original inference routine with additional features:
  - [ ] non-float32 / mixed precision for the entire stack
+    - Parts of the stack will whine about mismatching dtypes...
  - [x] BitsAndBytes support
    - Provided Linears technically aren't used because GPT2 uses Conv1D instead...
  - [x] LoRAs
@ -75,12 +76,16 @@ For LoRAs, replace the above `fp32.pth` with `lora.pth`.
  - [ ] Saner way of loading finetuned models / LoRAs
  - [ ] Some vector embedding store to find the "best" utterance to pick
 - [ ] Documentation
+  - this also includes a correct explanation of the entire stack (rather than the poor one I left in ai-voice-cloning)

 ## Why?

-To correct the mess I've made with forking TorToiSe TTS originally with a bunch of slopcode, and the nightmare that ai-voice-cloning turned out.
-
-Additional features can be applied to the program through a framework of my own that I'm very familiar with.
+To:
+* atone for the mess I've made with forking TorToiSe TTS originally with a bunch of slopcode, and the nightmare that ai-voice-cloning turned out.
+* unify the trainer and the inference-er.
+* implement additional features with much ease, as I'm very well familiar with my framework.
+* disillusion myself that it won't get better than TorToiSe TTS:
+  - while it's faster than VALL-E, the quality leaves a lot to be desired (although this is simply due to the overall architecture).

 ## License