.. | ||
README.md | ||
vall_e.cpp |
vall_e.cpp
This is an implementation that makes use of llama.cpp and encodec.cpp.
At the moment it's very barebones as I try and wrestle with llama.cpp
's API without needing to modify its code.
Build
Probably something like:
g++ -I/path/to/llama.cpp/include/ -L/path/to/llama.cpp/libllama.so -lggml -lggml-base -lllama -o ./vall_e
To-Do
- converted model to GGUF
- convert it without modifying any of the existing code
- basic framework
- load the quantized model
- orchestrate the required embeddings
- juggle the output head / classifier properly
- phonemize text
- tokenize phonemes
- load audio from disk
- encode audio
- sum embeddings for the
prom
and priorresp
s AR
samplingNAR-len
demasking samplingNAR
sampling- decode audio to disk
- a functional CLI