2024-12-21 16:57:02 +00:00
# vall_e.cpp
This is an implementation that makes use of [llama.cpp ](https://github.com/ggerganov/llama.cpp/ ) and [encodec.cpp ](https://github.com/PABannier/encodec.cpp ).
At the moment it's ** *very*** barebones as I try and wrestle with `llama.cpp` 's API without needing to modify its code.
## Build
2024-12-21 21:48:12 +00:00
Populate `./include/` with the `llama.cpp` and `encodec.cpp` headers.
2024-12-21 16:57:02 +00:00
2024-12-21 21:48:12 +00:00
Populate `./libs/` with the compiled libraries of `llama.cpp` and `encodec.cpp` .
* `encodec.cpp` requires updating `ggml` to the latest version and doing a quick hack to make it work on the CPU backend.
* `llama.cpp` currently requires no hacks, but would be *very* nice to hack in a way to retrieve a model's `tok_embd` .
Run `make` .
2024-12-21 16:57:02 +00:00
## To-Do
* [x] converted model to GGUF
* [ ] convert it without modifying any of the existing code
* [x] basic framework
2024-12-21 17:56:22 +00:00
* [x] load the quantized model
2024-12-21 16:57:02 +00:00
* [x] orchestrate the required embeddings
* [x] juggle the output head / classifier properly
* [ ] phonemize text
* [ ] tokenize phonemes
2024-12-22 01:16:44 +00:00
* [x] load audio from disk
* [x] encode audio
* [x] sum embeddings for the `prom` and prior `resp` s
2024-12-21 17:56:22 +00:00
* [x] `AR` sampling
2024-12-21 16:57:02 +00:00
* [ ] `NAR-len` demasking sampling
* [ ] `NAR` sampling
* [ ] decode audio to disk
2024-12-22 01:16:44 +00:00
* [ ] a functional CLI
* [ ] actually make it work
* it seems naively stitching the model together isn't good enough since the output is wrong