An unofficial PyTorch implementation of VALL-E

audio-lm pytorch text-to-speech tts vall-e valle

Go to file

mrq 6845c447c9 added more harvard sentences to load from a text file		2024-11-21 13:18:11 -06:00
data	added more harvard sentences to load from a text file	2024-11-21 13:18:11 -06:00
docs	added mixed modality AR+NAR-len to generate a short prefix through the AR, then inference with said prefix through the NAR-len (need to experiment with it more to ensure that the masked off tokens are the only tokens getting updated)	2024-11-20 14:22:12 -06:00
scripts	support for wildcard in training/validation/noise dataset array (to-do: a better way to query between metadata folder and data folder)	2024-09-18 21:34:43 -05:00
vall_e	added more harvard sentences to load from a text file	2024-11-21 13:18:11 -06:00
.gitignore	I cannot believe it's not actually called Wand DB (added wandb logging support since I think it would have been a much better way to look at my metrics)	2024-11-20 16:10:47 -06:00
LICENSE	Rewrite init	2023-08-02 21:53:35 +00:00
README.md	agony	2024-11-05 22:30:49 -06:00
setup.py	dependency updates (gradio 5.x now works on my machine)	2024-11-20 12:33:01 -06:00
vall-e.png	Rewrite init	2023-08-02 21:53:35 +00:00

README.md

VALL'E

An unofficial PyTorch implementation of VALL-E, utilizing the EnCodec encoder/decoder.

A demo is available on HuggingFace here.

Requirements

Besides a working PyTorch environment, the only hard requirement is espeak-ng for phonemizing text:

Linux users can consult their package managers on installing espeak/espeak-ng.
Windows users are required to install espeak-ng.
- additionally, you may be required to set the PHONEMIZER_ESPEAK_LIBRARY environment variable to specify the path to libespeak-ng.dll.
In the future, an internal homebrew to replace this would be fantastic.

Install

Simply run pip install git+https://git.ecker.tech/mrq/vall-e or pip install git+https://github.com/e-c-k-e-r/vall-e.

I've tested this repo under Python versions 3.10.9, 3.11.3, and 3.12.3.

Pre-Trained Model

My pre-trained weights can be acquired from here.

A script to setup a proper environment and download the weights can be invoked with ./scripts/setup.sh. This will automatically create a venv, and download the ar+nar-llama-8 weights and config file to the right place.

When inferencing, either through the web UI or CLI, if no model is passed, the default model will download automatically instead, and should automatically update.

Documentation

The provided documentation under ./docs/ should provide thorough coverage over most, if not all, of this project.

Markdown files should correspond directly to their respective file or folder under ./vall_e/.