1.5 KiB
1.5 KiB
vall_e.cpp
This is an implementation that makes use of llama.cpp and encodec.cpp.
At the moment it's very barebones as I try and wrestle with llama.cpp
's API without needing to modify its code.
Build
Populate ./include/
with the llama.cpp
and encodec.cpp
headers.
Populate ./libs/
with the compiled libraries of llama.cpp
and encodec.cpp
.
encodec.cpp
requires updatingggml
to the latest version and doing a quick hack to make it work on the CPU backend.llama.cpp
currently requires no hacks, but would be very nice to hack in a way to retrieve a model'stok_embd
.
Run make
.
To-Do
- converted model to GGUF
- convert it without modifying any of the existing code, as the tokenizer requires some care
- basic framework
- load the quantized model
- orchestrate the required embeddings
- juggle the output head / classifier properly
- phonemize text
- with the help of espeak-ng
- tokenize phonemes
- the tokenizer is being a huge thorn on actual sequences
- load audio from disk
- encode audio
- sum embeddings for the
prom
and priorresp
s AR
samplingNAR-len
demasking samplingNAR
sampling- decode audio to disk
- a functional CLI
- actually make it work
- it seems naively stitching the model together isn't good enough since the output is wrong, it most likely needs training with a glued together classifier