vall-e/vall_e.cpp
2024-12-21 11:56:22 -06:00
..
README.md quant 2024-12-21 11:56:22 -06:00
vall_e.cpp quant 2024-12-21 11:56:22 -06:00

vall_e.cpp

This is an implementation that makes use of llama.cpp and encodec.cpp.

At the moment it's very barebones as I try and wrestle with llama.cpp's API without needing to modify its code.

Build

Probably something like:

g++ -I/path/to/llama.cpp/include/ -L/path/to/llama.cpp/libllama.so -lggml -lggml-base -lllama -o ./vall_e

To-Do

  • converted model to GGUF
    • convert it without modifying any of the existing code
  • basic framework
    • load the quantized model
    • orchestrate the required embeddings
    • juggle the output head / classifier properly
  • phonemize text
  • tokenize phonemes
  • load audio from disk
  • encode audio
  • sum embeddings for the prom and prior resps
  • AR sampling
  • NAR-len demasking sampling
  • NAR sampling
  • decode audio to disk
  • a functional CLI