updated readme to reflect changes

This commit is contained in:
mrq 2024-10-25 22:17:05 -05:00
parent ccf71dc1b6
commit 023c3af331

View File

@ -25,7 +25,8 @@ I've tested this repo under Python versions `3.10.9`, `3.11.3`, and `3.12.3`.
My pre-trained weights can be acquired from [here](https://huggingface.co/ecker/vall-e).
A script to setup a proper environment and download the weights can be invoked with `./scripts/setup.sh`. This will automatically create a `venv`, and download the `ar+nar-llama-8` weights and config file to the right place.
* In the future, the model should be automatically downloaded.
When inferencing, either through the web UI or CLI, if no model is passed, the default model will download automatically instead, and should automatically update.
## Train
@ -207,7 +208,7 @@ You can also export to `safetensors` with `--format=sft`, and `fp32.sft` will be
## Synthesis
To synthesize speech: `python -m vall_e <text> <ref_path> <out_path> --yaml=<yaml_path>`
To synthesize speech: `python -m vall_e <text> <ref_path> <out_path> --yaml=<yaml_path>` (or `--model=<model_path>`)
Some additional flags you can pass are:
* `--language`: specifies the language for phonemizing the text, and helps guide inferencing when the model is trained against that language.
@ -252,6 +253,7 @@ Currently, the model only transcribes back into the IPA phonemes it was trained
A Gradio-based web UI is accessible by running `python3 -m vall_e.webui`. You can, optionally, pass:
* `--yaml=./path/to/your/config.yaml`: will load the targeted YAML
* `--model=./path/to/your/model.sft`: will load the targeted model weights
* `--listen 0.0.0.0:7860`: will set the web UI to listen to all IPs at port 7860. Replace the IP and Port to your preference.
### Emergent Behavior