An unofficial PyTorch implementation of VALL-E
Go to file
2024-12-07 22:34:25 -06:00
data doc update, added automatically deducing language from a given text, also checks if the input is already phonemized text to allow direct control without being cringe (procrastinating adding WER/SIM-O) 2024-12-07 22:34:25 -06:00
docs doc update, added automatically deducing language from a given text, also checks if the input is already phonemized text to allow direct control without being cringe (procrastinating adding WER/SIM-O) 2024-12-07 22:34:25 -06:00
scripts support for wildcard in training/validation/noise dataset array (to-do: a better way to query between metadata folder and data folder) 2024-09-18 21:34:43 -05:00
vall_e doc update, added automatically deducing language from a given text, also checks if the input is already phonemized text to allow direct control without being cringe (procrastinating adding WER/SIM-O) 2024-12-07 22:34:25 -06:00
.gitignore rolling context finally (use last N utterances as the prefix for the next gen), option to split input text prompt by sentences instead of lines (or no splitting) 2024-12-04 20:31:44 -06:00
LICENSE Rewrite init 2023-08-02 21:53:35 +00:00
README.md rolling context finally (use last N utterances as the prefix for the next gen), option to split input text prompt by sentences instead of lines (or no splitting) 2024-12-04 20:31:44 -06:00
setup.py doc update, added automatically deducing language from a given text, also checks if the input is already phonemized text to allow direct control without being cringe (procrastinating adding WER/SIM-O) 2024-12-07 22:34:25 -06:00
vall-e.png Rewrite init 2023-08-02 21:53:35 +00:00

VALL'E

An unofficial PyTorch implementation of VALL-E, utilizing the EnCodec encoder/decoder.

A demo is available on HuggingFace here.

Requirements

Besides a working PyTorch environment, the only hard requirement is espeak-ng for phonemizing text:

  • Linux users can consult their package managers on installing espeak/espeak-ng.
  • Windows users are required to install espeak-ng.
    • additionally, you may be required to set the PHONEMIZER_ESPEAK_LIBRARY environment variable to specify the path to libespeak-ng.dll.
  • In the future, an internal homebrew to replace this would be fantastic.

Install

Simply run pip install git+https://git.ecker.tech/mrq/vall-e or pip install git+https://github.com/e-c-k-e-r/vall-e.

This repo is tested under Python versions 3.10.9, 3.11.3, and 3.12.3.

Pre-Trained Model

Pre-trained weights can be acquired from

  • here or automatically when either inferencing or running the web UI.
  • ./scripts/setup.sh, a script to setup a proper environment and download the weights. This will also automatically create a venv.
  • when inferencing, either through the web UI or CLI, if no model is passed, the default model will download automatically instead, and should automatically update.

Documentation

The provided documentation under ./docs/ should provide thorough coverage over most, if not all, of this project.

Markdown files should correspond directly to their respective file or folder under ./vall_e/.