mrq/vall-e

An unofficial PyTorch implementation of VALL-E

audio-lm pytorch text-to-speech tts vall-e valle

Go to file

mrq 2ea387c08a segregated experimental changes into its own streamlined file to avoid breaking the existing model, and it can pivot to the cleaned up code if it actually works (nothing is working)		2025-02-26 21:26:13 -06:00
data	this seems to work in testing	2025-02-12 16:16:04 -06:00
docs	made muon actually work by actually utilizing param groups (thanks APOLLO for reminding me this is the sane way to handle this split)	2025-02-26 10:39:13 -06:00
scripts	documentation update while I wait for more audio (between 4 and 8 seconds per utterance) quantize for nvidia/audio-codec-44khz (I was foolish to think I can get something servicable with just 4 seconds max for an utterance)	2025-02-15 17:42:06 -06:00
vall_e	segregated experimental changes into its own streamlined file to avoid breaking the existing model, and it can pivot to the cleaned up code if it actually works (nothing is working)	2025-02-26 21:26:13 -06:00
vall_e.cpp		2024-12-26 21:42:17 -06:00
.gitignore		2024-12-26 21:42:17 -06:00
LICENSE	Rewrite init	2023-08-02 21:53:35 +00:00
README.md	cleaup	2024-12-24 23:14:32 -06:00
setup.py	added option to playback audio directly, removed no-phonemize option since I swear it worked in testing but it doesn't actually work	2025-01-12 21:52:49 -06:00
test.wav	fixes fixes fixes (a quarter of my recently processed audio returned zero'd tensors......)	2025-02-22 09:07:33 -06:00
vall-e.png	Rewrite init	2023-08-02 21:53:35 +00:00

README.md

VALL'E

An unofficial PyTorch implementation of VALL-E (last updated: 2024.12.11), utilizing the EnCodec encoder/decoder.

A demo is available on HuggingFace here.

Requirements

Besides a working PyTorch environment, the only hard requirement is espeak-ng for phonemizing text:

Linux users can consult their package managers on installing espeak/espeak-ng.
Windows users are required to install espeak-ng.
- additionally, you may be required to set the PHONEMIZER_ESPEAK_LIBRARY environment variable to specify the path to libespeak-ng.dll.
In the future, an internal homebrew to replace this would be fantastic.

Install

Simply run pip install git+https://git.ecker.tech/mrq/vall-e or pip install git+https://github.com/e-c-k-e-r/vall-e.

This repo is tested under Python versions 3.10.9, 3.11.3, and 3.12.3.

Additional Implementations

An "HF"-ified version of the model is available as ecker/vall-e@hf, but it does require some additional efforts (see the __main__ of ./vall_e/models/base.py for details).

Additionally, vall_e.cpp is available. Consult its README for more details.

Pre-Trained Model

Pre-trained weights can be acquired from

here or automatically when either inferencing or running the web UI.
./scripts/setup.sh, a script to setup a proper environment and download the weights. This will also automatically create a venv.
when inferencing, either through the web UI or CLI, if no model is passed, the default model will download automatically instead, and should automatically update.

Documentation

The provided documentation under ./docs/ should provide thorough coverage over most, if not all, of this project.

Markdown files should correspond directly to their respective file or folder under ./vall_e/.