fixed training stats not loading from exported weights, a bit of a readme cleanup, updated example training yaml

This commit is contained in:
mrq 2023-09-23 19:59:00 -05:00
parent 9384900ce6
commit 4abd6564d1
4 changed files with 60 additions and 62 deletions

View File

@ -4,7 +4,11 @@
# VALL'E # VALL'E
An unofficial PyTorch implementation of [VALL-E](https://valle-demo.github.io/), based on the [EnCodec](https://github.com/facebookresearch/encodec) tokenizer. An unofficial PyTorch implementation of [VALL-E](https://valle-demo.github.io/), utilizing the [EnCodec](https://github.com/facebookresearch/encodec) encoder/decoder.
[Main Repo](https://git.ecker.tech/mrq/vall-e) | [GitHub Mirror](https://github.com/e-c-k-e-r/vall-e/) | [HuggingFace Space](https://huggingface.co/spaces/ecker/vall-e)
> **Note** This README is still quite a disorganized mess.
## Requirements ## Requirements
@ -32,11 +36,7 @@ A HuggingFace space hosting the code and models can be found [here](https://hugg
### Local ### Local
To quickly try it out, you can choose between the following modes: To quickly try it out, you can run `python -m vall_e.models.ar_nar yaml="./data/config.yaml"`
* AR only: `python -m vall_e.models.ar yaml="./data/config.yaml"`
* NAR only: `python -m vall_e.models.nar yaml="./data/config.yaml"`
* AR+NAR: `python -m vall_e.models.base yaml="./data/config.yaml"`
Each model file has a barebones trainer and inference routine. Each model file has a barebones trainer and inference routine.
@ -77,6 +77,7 @@ A "libre" dataset can be found [here](https://huggingface.co/ecker/vall-e/blob/m
If you're interested in creating an HDF5 copy of your dataset, simply invoke: `python -m vall_e.data --action='hdf5' yaml='./data/config.yaml'` If you're interested in creating an HDF5 copy of your dataset, simply invoke: `python -m vall_e.data --action='hdf5' yaml='./data/config.yaml'`
5. Train the AR and NAR models using the following scripts: `python -m vall_e.train yaml=./data/config.yaml` 5. Train the AR and NAR models using the following scripts: `python -m vall_e.train yaml=./data/config.yaml`
* If distributing your training (for example, multi-GPU), use `deepspeed --module vall_e.train yaml="./data/config.yaml"`
You may quit your training any time by just entering `quit` in your CLI. The latest checkpoint will be automatically saved. You may quit your training any time by just entering `quit` in your CLI. The latest checkpoint will be automatically saved.
@ -98,15 +99,11 @@ You can specify what X and Y labels you want to plot against by passing `--xs to
### Notices ### Notices
#### Modifying `prom_levels`, `resp_levels`, Or `tasks` For A Model
If you're wanting to increase the `prom_levels` for a given model, or increase the `tasks` levels a model accepts, you will need to export your weights and set `train.load_state_dict` to `True` in your configuration YAML.
#### Training Under Windows #### Training Under Windows
As training under `deepspeed` is not supported, under your `config.yaml`, simply change `trainer.backend` to `local` to use the local training backend. As training under `deepspeed` and Windows is not supported, under your `config.yaml`, simply change `trainer.backend` to `local` to use the local training backend.
Keep in mind that creature comforts like distributed training cannot be verified as working at the moment. Keep in mind that creature comforts like distributed training or `float16` training cannot be verified as working at the moment.
#### Training on Low-VRAM Cards #### Training on Low-VRAM Cards
@ -116,13 +113,11 @@ VRAM use is also predicated on your dataset; a mix of large and small utterances
Additionally, under Windows, I managed to finetune the AR on my 2060 (6GiB VRAM) with a batch size of 8 (although, with the card as a secondary GPU). Additionally, under Windows, I managed to finetune the AR on my 2060 (6GiB VRAM) with a batch size of 8 (although, with the card as a secondary GPU).
If you need to, you are free to train only one model at a time. Just remove the definition for one model in your `config.yaml`'s `models._model` list.
## Export ## Export
Both trained models *can* be exported, but is only required if loading them on systems without DeepSpeed for inferencing (Windows systems). To export the models, run: `python -m vall_e.export yaml=./data/config.yaml`. To export the models, run: `python -m vall_e.export yaml=./data/config.yaml`.
This will export the latest checkpoints, for example, under `./data/ckpt/ar-retnet-2/fp32.pth` and `./data/ckpt/nar-retnet-2/fp32.pth`, to be loaded on any system with PyTorch. This will export the latest checkpoints, for example, under `./data/ckpt/ar-retnet-2/fp32.pth` and `./data/ckpt/nar-retnet-2/fp32.pth`, to be loaded on any system with PyTorch, and will include additional metadata, such as the symmap used, and training stats.
## Synthesis ## Synthesis
@ -149,17 +144,17 @@ And some experimental sampling flags you can use too (your mileage will ***defin
## To-Do ## To-Do
* reduce load time for creating / preparing dataloaders (hint: remove use of `Path.glob` and `Path.rglob`).
* train and release a ***good*** model. * train and release a ***good*** model.
* extend to multiple languages (VALL-E X) and ~~extend to~~ train SpeechX features. * clean up the README, and document, document, document onto the wiki.
+ This can easily be done with adding in additional embeddings + tokens, rather than cramming into the input prompt embedding. * extend to multiple languages ([VALL-E X](https://arxiv.org/abs/2303.03926)) and addditional tasks ([SpeechX](https://arxiv.org/abs/2308.06873)).
## Notice
- [EnCodec](https://github.com/facebookresearch/encodec) is licensed under CC-BY-NC 4.0. If you use the code to generate audio quantization or perform decoding, it is important to adhere to the terms of their license. ## Notices and Citations
Unless otherwise credited/noted, this repository is [licensed](LICENSE) under AGPLv3. Unless otherwise credited/noted, this repository is [licensed](LICENSE) under AGPLv3.
## Citations - [EnCodec](https://github.com/facebookresearch/encodec) is licensed under CC-BY-NC 4.0. If you use the code to generate audio quantization or perform decoding, it is important to adhere to the terms of their license.
- This implementation was originally based on [enhuiz/vall-e](https://github.com/enhuiz/vall-e), but has been heavily, heavily modified over time.
```bibtex ```bibtex
@article{wang2023neural, @article{wang2023neural,

View File

@ -1,60 +1,57 @@
dataset: dataset:
training: [ training: []
# "./training/valle/data/LibriTTS/994/", validation: []
] noise: []
validation: [
# "./training/valle/data/Validation/1188/",
]
noise: [
# "./training/valle/data/Other/noise/",
]
speaker_name_getter: "lambda p: f'{p.parts[-3]}_{p.parts[-2]}'" speaker_name_getter: "lambda p: f'{p.parts[-3]}_{p.parts[-2]}'"
use_hdf5: True use_hdf5: True
use_metadata: True
hdf5_flag: r hdf5_flag: r
validate: True validate: True
workers: 4 workers: 2
cache: True cache: True
phones_range: [4, 512] phones_range: [4, 256]
duration_range: [1.0, 16.0] duration_range: [1.0, 16.0]
min_utterances: 32
random_utterance: 1.0 random_utterance: 1.0
max_prompts: 3 max_prompts: 6
prompt_duration: 3.0 prompt_duration: 6.0
sample_type: speaker sample_type: speaker
tasks_list: ["tts"] # ["tts", "ns", "sr", "tse", "cse", "nse"] tasks_list: [ "tts" ] # , [ "tts", "tts-c", "ns", "sr", "tse", "cse", "nse", "tts"]
models: models:
_prom_levels: 8
_max_levels: 8 _max_levels: 8
_models:
- name: "ar"
size: "full"
resp_levels: 1
prom_levels: 2
tasks: 8
arch_type: "retnet"
- name: "nar" _models:
size: "full" - name: "ar+nar"
resp_levels: 1 size: "double"
prom_levels: 2 resp_levels: 8
prom_levels: 8
tasks: 8 tasks: 8
arch_type: "retnet" arch_type: "retnet"
training: True
version: 2
hyperparameters: hyperparameters:
batch_size: 16 batch_size: 8
gradient_accumulation_steps: 4 gradient_accumulation_steps: 16
gradient_clipping: 100 gradient_clipping: 100
# prodigyopt is nicer, but requires even more VRAM
#optimizer: Prodigy
#learning_rate: 1.0 # e-4
optimizer: AdamW optimizer: AdamW
learning_rate: 1.0e-4 learning_rate: 1.0e-4
torch_optimizer: True
scheduler_type: "" scheduler_type: ""
#scheduler_type: OneCycle #scheduler_type: OneCycle
@ -77,12 +74,13 @@ hyperparameters:
evaluation: evaluation:
batch_size: 16 batch_size: 16
frequency: 500 frequency: 250
size: 16 size: 16
steps: 300 steps: 450
ar_temperature: 0.95 ar_temperature: 0.95
nar_temperature: 0.25 nar_temperature: 0.25
load_disabled_engines: True
trainer: trainer:
iterations: 1_000_000 iterations: 1_000_000
@ -90,11 +88,13 @@ trainer:
save_tag: step save_tag: step
save_on_oom: True save_on_oom: True
save_on_quit: True save_on_quit: True
save_frequency: 1000 save_frequency: 100
export_on_save: True
keep_last_checkpoints: 4 keep_last_checkpoints: 4
aggressive_optimizations: False aggressive_optimizations: False
load_disabled_engines: False
#load_state_dict: True #load_state_dict: True
#strict_loading: False #strict_loading: False
@ -105,18 +105,21 @@ trainer:
gc_mode: None # "global_step" gc_mode: None # "global_step"
weight_dtype: bfloat16 weight_dtype: bfloat16
amp: False
backend: deepspeed backend: deepspeed
deepspeed: deepspeed:
zero_optimization_level: 0 zero_optimization_level: 0
use_compression_training: True use_compression_training: True
activation_checkpointing: True
inference: inference:
use_vocos: True use_vocos: True
normalize: False # do NOT change this unless you know exactly what you are doing. normalize: False
bitsandbytes: bitsandbytes:
enabled: False enabled: False
injects: True injects: False
linear: True linear: False
embedding: True embedding: False

View File

@ -40,8 +40,8 @@ class Engine(DeepSpeedEngine):
kwargs['config_class'] = DeepSpeedConfig(kwargs['config']) kwargs['config_class'] = DeepSpeedConfig(kwargs['config'])
stats = { stats = {
"global_steps": 0, "global_step": 0,
"micro_steps": 0, "micro_step": 0,
"global_samples": 0, "global_samples": 0,
"tokens_processed": 0, "tokens_processed": 0,
} }
@ -54,8 +54,8 @@ class Engine(DeepSpeedEngine):
super().__init__(None, *args, **kwargs) super().__init__(None, *args, **kwargs)
self._frozen_params = set() self._frozen_params = set()
self.global_steps = stats["global_steps"] self.global_steps = stats["global_step"]
self.micro_steps = stats["micro_steps"] self.micro_steps = stats["micro_step"]
self.global_samples = stats["global_samples"] self.global_samples = stats["global_samples"]
self.tokens_processed = stats["tokens_processed"] self.tokens_processed = stats["tokens_processed"]

View File

@ -83,7 +83,7 @@ def load_engines():
# state dict is not just the module, extract the extra trainer details # state dict is not just the module, extract the extra trainer details
if "stats" in state: if "stats" in state:
additionals = state["stats"] stats = state["stats"]
if "module" in state: if "module" in state:
state = state["module"] state = state["module"]