An unofficial PyTorch implementation of VALL-E

audio-lm pytorch text-to-speech tts vall-e valle

Go to file

mrq 7f4206a879 fixing an error I caught while fixing tortoise_tts, possibly actually load a LoRA if not passing a yaml/model		2025-07-24 20:56:09 -05:00
data	fixed dac	2025-03-12 23:17:27 -05:00
docs	things i forgot to do last week now that some mental faculties were restored	2025-05-30 22:56:07 -05:00
scripts	documentation update while I wait for more audio (between 4 and 8 seconds per utterance) quantize for nvidia/audio-codec-44khz (I was foolish to think I can get something servicable with just 4 seconds max for an utterance)	2025-02-15 17:42:06 -06:00
vall_e	fixing an error I caught while fixing tortoise_tts, possibly actually load a LoRA if not passing a yaml/model	2025-07-24 20:56:09 -05:00
vall_e.cpp	diagnosed both hf/llama.cpp versions to probably just being a faulty export method (to-do: migrate vall_e.models.base to vall_e.export --hf)	2025-04-05 22:05:39 -05:00
.gitignore		2024-12-26 21:42:17 -06:00
LICENSE
README.md	cleaup	2024-12-24 23:14:32 -06:00
setup.py	a gorillionth time's the charm (aka: the encoder/decoder pill is a tough pill to swallow)	2025-02-28 17:56:50 -06:00
test.wav	fixes fixes fixes (a quarter of my recently processed audio returned zero'd tensors......)	2025-02-22 09:07:33 -06:00
vall-e.png

README.md

VALL'E

An unofficial PyTorch implementation of VALL-E (last updated: 2024.12.11), utilizing the EnCodec encoder/decoder.

A demo is available on HuggingFace here.

Requirements

Besides a working PyTorch environment, the only hard requirement is espeak-ng for phonemizing text:

Linux users can consult their package managers on installing espeak/espeak-ng.
Windows users are required to install espeak-ng.
- additionally, you may be required to set the PHONEMIZER_ESPEAK_LIBRARY environment variable to specify the path to libespeak-ng.dll.
In the future, an internal homebrew to replace this would be fantastic.

Install

Simply run pip install git+https://git.ecker.tech/mrq/vall-e or pip install git+https://github.com/e-c-k-e-r/vall-e.

This repo is tested under Python versions 3.10.9, 3.11.3, and 3.12.3.

Additional Implementations

An "HF"-ified version of the model is available as ecker/vall-e@hf, but it does require some additional efforts (see the __main__ of ./vall_e/models/base.py for details).

Additionally, vall_e.cpp is available. Consult its README for more details.

Pre-Trained Model

Pre-trained weights can be acquired from

here or automatically when either inferencing or running the web UI.
./scripts/setup.sh, a script to setup a proper environment and download the weights. This will also automatically create a venv.
when inferencing, either through the web UI or CLI, if no model is passed, the default model will download automatically instead, and should automatically update.

Documentation

The provided documentation under ./docs/ should provide thorough coverage over most, if not all, of this project.

Markdown files should correspond directly to their respective file or folder under ./vall_e/.