An unofficial PyTorch implementation of VALL-E
Go to file
2024-11-21 15:07:46 -06:00
data added more harvard sentences to load from a text file 2024-11-21 13:18:11 -06:00
docs added mixed modality AR+NAR-len to generate a short prefix through the AR, then inference with said prefix through the NAR-len (need to experiment with it more to ensure that the masked off tokens are the only tokens getting updated) 2024-11-20 14:22:12 -06:00
scripts support for wildcard in training/validation/noise dataset array (to-do: a better way to query between metadata folder and data folder) 2024-09-18 21:34:43 -05:00
vall_e m 2024-11-21 15:07:46 -06:00
.gitignore I cannot believe it's not actually called Wand DB (added wandb logging support since I think it would have been a much better way to look at my metrics) 2024-11-20 16:10:47 -06:00
LICENSE Rewrite init 2023-08-02 21:53:35 +00:00
README.md agony 2024-11-05 22:30:49 -06:00
setup.py dependency updates (gradio 5.x now works on my machine) 2024-11-20 12:33:01 -06:00
vall-e.png Rewrite init 2023-08-02 21:53:35 +00:00

VALL'E

An unofficial PyTorch implementation of VALL-E, utilizing the EnCodec encoder/decoder.

A demo is available on HuggingFace here.

Requirements

Besides a working PyTorch environment, the only hard requirement is espeak-ng for phonemizing text:

  • Linux users can consult their package managers on installing espeak/espeak-ng.
  • Windows users are required to install espeak-ng.
    • additionally, you may be required to set the PHONEMIZER_ESPEAK_LIBRARY environment variable to specify the path to libespeak-ng.dll.
  • In the future, an internal homebrew to replace this would be fantastic.

Install

Simply run pip install git+https://git.ecker.tech/mrq/vall-e or pip install git+https://github.com/e-c-k-e-r/vall-e.

I've tested this repo under Python versions 3.10.9, 3.11.3, and 3.12.3.

Pre-Trained Model

My pre-trained weights can be acquired from here.

A script to setup a proper environment and download the weights can be invoked with ./scripts/setup.sh. This will automatically create a venv, and download the ar+nar-llama-8 weights and config file to the right place.

When inferencing, either through the web UI or CLI, if no model is passed, the default model will download automatically instead, and should automatically update.

Documentation

The provided documentation under ./docs/ should provide thorough coverage over most, if not all, of this project.

Markdown files should correspond directly to their respective file or folder under ./vall_e/.