An unofficial PyTorch implementation of VALL-E
Go to file
2025-04-02 17:17:37 -05:00
data fixed dac 2025-03-12 23:17:27 -05:00
docs is this my last cope (falling back to explicit duration prediction, as this regression just won't go away) (also the smaller model was lobotomized because of my ROCm setup having a botched SDPA for who knows why) 2025-04-02 17:01:24 -05:00
scripts documentation update while I wait for more audio (between 4 and 8 seconds per utterance) quantize for nvidia/audio-codec-44khz (I was foolish to think I can get something servicable with just 4 seconds max for an utterance) 2025-02-15 17:42:06 -06:00
vall_e fix for bsz>1 because I forgot the old implementation implicitly handles this 2025-04-02 17:17:37 -05:00
vall_e.cpp 2024-12-26 21:42:17 -06:00
.gitignore 2024-12-26 21:42:17 -06:00
LICENSE Rewrite init 2023-08-02 21:53:35 +00:00
README.md cleaup 2024-12-24 23:14:32 -06:00
setup.py a gorillionth time's the charm (aka: the encoder/decoder pill is a tough pill to swallow) 2025-02-28 17:56:50 -06:00
test.wav fixes fixes fixes (a quarter of my recently processed audio returned zero'd tensors......) 2025-02-22 09:07:33 -06:00
vall-e.png Rewrite init 2023-08-02 21:53:35 +00:00

VALL'E

An unofficial PyTorch implementation of VALL-E (last updated: 2024.12.11), utilizing the EnCodec encoder/decoder.

A demo is available on HuggingFace here.

Requirements

Besides a working PyTorch environment, the only hard requirement is espeak-ng for phonemizing text:

  • Linux users can consult their package managers on installing espeak/espeak-ng.
  • Windows users are required to install espeak-ng.
    • additionally, you may be required to set the PHONEMIZER_ESPEAK_LIBRARY environment variable to specify the path to libespeak-ng.dll.
  • In the future, an internal homebrew to replace this would be fantastic.

Install

Simply run pip install git+https://git.ecker.tech/mrq/vall-e or pip install git+https://github.com/e-c-k-e-r/vall-e.

This repo is tested under Python versions 3.10.9, 3.11.3, and 3.12.3.

Additional Implementations

An "HF"-ified version of the model is available as ecker/vall-e@hf, but it does require some additional efforts (see the __main__ of ./vall_e/models/base.py for details).

Additionally, vall_e.cpp is available. Consult its README for more details.

Pre-Trained Model

Pre-trained weights can be acquired from

  • here or automatically when either inferencing or running the web UI.
  • ./scripts/setup.sh, a script to setup a proper environment and download the weights. This will also automatically create a venv.
  • when inferencing, either through the web UI or CLI, if no model is passed, the default model will download automatically instead, and should automatically update.

Documentation

The provided documentation under ./docs/ should provide thorough coverage over most, if not all, of this project.

Markdown files should correspond directly to their respective file or folder under ./vall_e/.