mrq/vall-e

An unofficial PyTorch implementation of VALL-E

audio-lm pytorch text-to-speech tts vall-e valle

Go to file

mrq 23d402bf01 added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......)		2024-12-05 23:05:52 -06:00
data	added more harvard sentences to load from a text file	2024-11-21 13:18:11 -06:00
docs	rolling context finally (use last N utterances as the prefix for the next gen), option to split input text prompt by sentences instead of lines (or no splitting)	2024-12-04 20:31:44 -06:00
scripts	support for wildcard in training/validation/noise dataset array (to-do: a better way to query between metadata folder and data folder)	2024-09-18 21:34:43 -05:00
vall_e	added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......)	2024-12-05 23:05:52 -06:00
.gitignore	rolling context finally (use last N utterances as the prefix for the next gen), option to split input text prompt by sentences instead of lines (or no splitting)	2024-12-04 20:31:44 -06:00
LICENSE	Rewrite init	2023-08-02 21:53:35 +00:00
README.md	rolling context finally (use last N utterances as the prefix for the next gen), option to split input text prompt by sentences instead of lines (or no splitting)	2024-12-04 20:31:44 -06:00
setup.py	forgot to add NTLK as a dependency, promoted sageattn as a default dependency since it works fine enough and seems agnostic	2024-12-04 20:33:25 -06:00
vall-e.png	Rewrite init	2023-08-02 21:53:35 +00:00

README.md

VALL'E

An unofficial PyTorch implementation of VALL-E, utilizing the EnCodec encoder/decoder.

A demo is available on HuggingFace here.

Requirements

Besides a working PyTorch environment, the only hard requirement is espeak-ng for phonemizing text:

Linux users can consult their package managers on installing espeak/espeak-ng.
Windows users are required to install espeak-ng.
- additionally, you may be required to set the PHONEMIZER_ESPEAK_LIBRARY environment variable to specify the path to libespeak-ng.dll.
In the future, an internal homebrew to replace this would be fantastic.

Install

Simply run pip install git+https://git.ecker.tech/mrq/vall-e or pip install git+https://github.com/e-c-k-e-r/vall-e.

This repo is tested under Python versions 3.10.9, 3.11.3, and 3.12.3.

Pre-Trained Model

Pre-trained weights can be acquired from

here or automatically when either inferencing or running the web UI.
./scripts/setup.sh, a script to setup a proper environment and download the weights. This will also automatically create a venv.
when inferencing, either through the web UI or CLI, if no model is passed, the default model will download automatically instead, and should automatically update.

Documentation

The provided documentation under ./docs/ should provide thorough coverage over most, if not all, of this project.

Markdown files should correspond directly to their respective file or folder under ./vall_e/.