An unofficial PyTorch implementation of VALL-E
Go to file
2025-02-26 21:26:13 -06:00
data this seems to work in testing 2025-02-12 16:16:04 -06:00
docs made muon actually work by actually utilizing param groups (thanks APOLLO for reminding me this is the sane way to handle this split) 2025-02-26 10:39:13 -06:00
scripts documentation update while I wait for more audio (between 4 and 8 seconds per utterance) quantize for nvidia/audio-codec-44khz (I was foolish to think I can get something servicable with just 4 seconds max for an utterance) 2025-02-15 17:42:06 -06:00
vall_e segregated experimental changes into its own streamlined file to avoid breaking the existing model, and it can pivot to the cleaned up code if it actually works (nothing is working) 2025-02-26 21:26:13 -06:00
vall_e.cpp 2024-12-26 21:42:17 -06:00
.gitignore 2024-12-26 21:42:17 -06:00
LICENSE
README.md cleaup 2024-12-24 23:14:32 -06:00
setup.py added option to playback audio directly, removed no-phonemize option since I swear it worked in testing but it doesn't actually work 2025-01-12 21:52:49 -06:00
test.wav fixes fixes fixes (a quarter of my recently processed audio returned zero'd tensors......) 2025-02-22 09:07:33 -06:00
vall-e.png

VALL'E

An unofficial PyTorch implementation of VALL-E (last updated: 2024.12.11), utilizing the EnCodec encoder/decoder.

A demo is available on HuggingFace here.

Requirements

Besides a working PyTorch environment, the only hard requirement is espeak-ng for phonemizing text:

  • Linux users can consult their package managers on installing espeak/espeak-ng.
  • Windows users are required to install espeak-ng.
    • additionally, you may be required to set the PHONEMIZER_ESPEAK_LIBRARY environment variable to specify the path to libespeak-ng.dll.
  • In the future, an internal homebrew to replace this would be fantastic.

Install

Simply run pip install git+https://git.ecker.tech/mrq/vall-e or pip install git+https://github.com/e-c-k-e-r/vall-e.

This repo is tested under Python versions 3.10.9, 3.11.3, and 3.12.3.

Additional Implementations

An "HF"-ified version of the model is available as ecker/vall-e@hf, but it does require some additional efforts (see the __main__ of ./vall_e/models/base.py for details).

Additionally, vall_e.cpp is available. Consult its README for more details.

Pre-Trained Model

Pre-trained weights can be acquired from

  • here or automatically when either inferencing or running the web UI.
  • ./scripts/setup.sh, a script to setup a proper environment and download the weights. This will also automatically create a venv.
  • when inferencing, either through the web UI or CLI, if no model is passed, the default model will download automatically instead, and should automatically update.

Documentation

The provided documentation under ./docs/ should provide thorough coverage over most, if not all, of this project.

Markdown files should correspond directly to their respective file or folder under ./vall_e/.