diff --git a/README.md b/README.md
index 44d9493..d0bc303 100755
--- a/README.md
+++ b/README.md
@@ -12,21 +12,22 @@ An unofficial PyTorch implementation of [VALL-E](https://valle-demo.github.io/),
 
 ## Requirements
 
-* [`espeak-ng`](https://github.com/espeak-ng/espeak-ng/):
-  - For phonemizing text, this repo requires `espeak`/`espeak-ng` installed.
-  - Linux users can consult their package managers on installing `espeak`/`espeak-ng`.
-  - Windows users are required to install [`espeak-ng`](https://github.com/espeak-ng/espeak-ng/releases/tag/1.51#Assets).
-    + additionally, you may be required to set the `PHONEMIZER_ESPEAK_LIBRARY` environment variable to specify the path to `libespeak-ng.dll`.
+Besides a working PyTorch environment, the only hard requirement is [`espeak-ng`](https://github.com/espeak-ng/espeak-ng/):
+- For phonemizing text, this repo requires `espeak`/`espeak-ng` installed.
+- Linux users can consult their package managers on installing `espeak`/`espeak-ng`.
+- Windows users are required to install [`espeak-ng`](https://github.com/espeak-ng/espeak-ng/releases/tag/1.51#Assets).
+  + additionally, you may be required to set the `PHONEMIZER_ESPEAK_LIBRARY` environment variable to specify the path to `libespeak-ng.dll`.
+- In the future, an internal homebrew to replace this *would* be fantastic.
 
 ## Install
 
 Simply run `pip install git+https://git.ecker.tech/mrq/vall-e` or `pip install git+https://github.com/e-c-k-e-r/vall-e`.
 
-I've tested this repo under Python versions `3.10.9` and `3.11.3`.
+I've tested this repo under Python versions `3.10.9`, `3.11.3`, and `3.12.3`.
 
 ## Try Me
 
-To quickly try it out, you can run `python -m vall_e.models.ar_nar yaml="./data/config.yaml"`.
+To quickly try it out, you can run `python -m vall_e.models.ar_nar --yaml="./data/config.yaml"`.
 
 A small trainer will overfit a provided utterance to ensure a model configuration works.
 
@@ -85,19 +86,19 @@ Two dataset formats are supported:
 * the standard way:
   - for Encodec/Vocos audio backends, data is stored under `./training/data/{group}/{speaker}/{id}.enc` as a NumPy file.
   - for Descript-Audio-Codec audio backend, data is stored under `./training/data/{group}/{speaker}/{id}.dac` as a NumPy file.
-  - it is *highly* recommended to generate metadata to speed up dataset pre-load with `python3 -m vall_e.data yaml="./training/config.yaml" --action=metadata`
+  - it is *highly* recommended to generate metadata to speed up dataset pre-load with `python3 -m vall_e.data --yaml="./training/config.yaml" --action=metadata`
 * using an HDF5 dataset:
-  - you can convert from the standard way with the following command: `python3 -m vall_e.data yaml="./training/config.yaml"` (metadata for dataset pre-load is generated alongside HDF5 creation)
+  - you can convert from the standard way with the following command: `python3 -m vall_e.data --yaml="./training/config.yaml"` (metadata for dataset pre-load is generated alongside HDF5 creation)
   - this will shove everything into a single HDF5 file and store some metadata alongside (for now, the symbol map generated, and text/audio lengths)
   - be sure to also define `use_hdf5` in your config YAML.
 
 ### Training
 
-For single GPUs, simply running `python3 -m vall_e.train yaml="./training/config.yaml`.
+For single GPUs, simply running `python3 -m vall_e.train --yaml="./training/config.yaml`.
 
 For multiple GPUs, or exotic distributed training:
-* with `deepspeed` backends, simply running `deepspeed --module vall_e.train yaml="./training/config.yaml"` should handle the gory details.
-* with `local` backends, simply run `torchrun --nnodes=1 --nproc-per-node={NUMOFGPUS} -m vall_e.train yaml="./training/config.yaml"`
+* with `deepspeed` backends, simply running `deepspeed --module vall_e.train --yaml="./training/config.yaml"` should handle the gory details.
+* with `local` backends, simply run `torchrun --nnodes=1 --nproc-per-node={NUMOFGPUS} -m vall_e.train --yaml="./training/config.yaml"`
 
 You can enter `save` to save the state at any time, or `quit` to save and quit training.
 
@@ -105,7 +106,7 @@ The `lr` will also let you adjust the learning rate on the fly. For example: `lr
 
 ### Plotting Metrics
 
-Included is a helper script to parse the training metrics. Simply invoke it with, for example: `python3 -m vall_e.plot yaml="./training/config.yaml"`
+Included is a helper script to parse the training metrics. Simply invoke it with, for example: `python3 -m vall_e.plot --yaml="./training/config.yaml"`
 
 You can specify what X and Y labels you want to plot against by passing `--xs tokens_processed --ys loss stats.acc`
 
@@ -127,7 +128,7 @@ Unfortunately, efforts to train a *good* foundational model seems entirely predi
 * a poorly mapped phoneme mapping: I naively crafted my own phoneme mapping, where a HuggingFace tokenizer might supply a better token mapping.
   + This seems remedied with settling for using a HuggingFace tokenizer to handle everything.
 * having a unified AR and NAR model might sound too convenient, but each task may lobotomize the other, due to the nature of things.
-  + This *might* be remedied with better sequence formatting.
+  + This *might* be remedied with better sequence formatting, or separate embeddings for the AR/NAR
 
 #### Backend Architectures
 
@@ -169,13 +170,13 @@ The wide support for various backends is solely while I try and figure out which
 
 ## Export
 
-To export the models, run: `python -m vall_e.export yaml=./training/config.yaml`.
+To export the models, run: `python -m vall_e.export --yaml=./training/config.yaml`.
 
 This will export the latest checkpoints, for example, under `./training/ckpt/ar+nar-retnet-8/fp32.pth`, to be loaded on any system with PyTorch, and will include additional metadata, such as the symmap used, and training stats.
 
 ## Synthesis
 
-To synthesize speech, invoke either (if exported the models): `python -m vall_e <text> <ref_path> <out_path> --model-ckpt ./training/ckpt/ar+nar-retnet-8/fp32.pth` or `python -m vall_e <text> <ref_path> <out_path> yaml=<yaml_path>`
+To synthesize speech: `python -m vall_e <text> <ref_path> <out_path> --yaml=<yaml_path>`
 
 Some additional flags you can pass are:
 * `--language`: specifies the language for phonemizing the text, and helps guide inferencing when the model is trained against that language.
@@ -204,17 +205,22 @@ And some experimental sampling flags you can use too (your mileage will ***defin
 ## To-Do
 
 * train and release a ***good*** model.
+* explore alternative setups, like a NAR-only model
+  - this would require a audio length predictor, but could help with a lot of things (I believe Meta's Voicebox does this?)
+* explore better sampling techniques
+  - dynamic temperature shows promise despite it being a very early iteration
+  - mirostat seems to show promise too despite being a half-baked implementation
+  - penalty incurred from sampling is a bit steep at times...
+  - the NAR might need to be greedy sampled only
 * clean up the README, and document, document, document onto the wiki.
 * extend to ~~multiple languages ([VALL-E X](https://arxiv.org/abs/2303.03926)) and~~ addditional tasks ([SpeechX](https://arxiv.org/abs/2308.06873)).
   - training additional tasks needs the SpeechX implementation to be reworked.
   - this requires a good foundational model before extending it to transfer tasks onto.
 * improve throughput (despite peaking at 120it/s):
-  - properly utilize RetNet's recurrent forward / chunkwise forward passes (does not seem to want to work no matter how the model is trained).
   - utilize an approach similar to [FasterDecoding/Medusa](https://github.com/FasterDecoding/Medusa/) with additional heads for decoding N+1, N+2, N+3 AR tokens
     + this requires a properly trained AR, however.
-* work around issues with extending context past what's trained (despite RetNet's retention allegedly being able to defeat this):
-  - "sliding" AR input, such as have the context a fixed length.
-    + the model may need to be trained for this with a fancy positional embedding injected OR already trained with a sliding context window in mind. Naively sliding the context window while making use of the RetNet implementation's positional embedding doesn't seem fruitful.
+* audio streaming
+  - this *technically* can work without any additional architecture changes, just clever tricks with sampling-then-decoding-to-audio.
 
 ## Notices and Citations
 
@@ -222,7 +228,7 @@ Unless otherwise credited/noted in this README or within the designated Python f
 
 - [EnCodec](https://github.com/facebookresearch/encodec) is licensed under CC-BY-NC 4.0. If you use the code to generate audio quantization or perform decoding, it is important to adhere to the terms of their license.
 
-- This implementation was originally based on [enhuiz/vall-e](https://github.com/enhuiz/vall-e), but has been heavily, heavily modified over time.
+- This implementation was originally based on [enhuiz/vall-e](https://github.com/enhuiz/vall-e), but has been heavily, heavily modified over time. Without it I would not have had a good basis to muck around and learn.
 
 ```bibtex
 @article{wang2023neural,
diff --git a/setup.py b/setup.py
index 98c9d50..d2fdc79 100755
--- a/setup.py
+++ b/setup.py
@@ -8,7 +8,6 @@ def shell(*args):
     out = subprocess.check_output(args)
     return out.decode("ascii").strip()
 
-
 def write_version(version_core, pre_release=True):
     if pre_release:
         time = shell("git", "log", "-1", "--format=%cd", "--date=iso")
@@ -23,7 +22,6 @@ def write_version(version_core, pre_release=True):
 
     return version
 
-
 with open("README.md", "r") as f:
     long_description = f.read()
 
@@ -37,31 +35,60 @@ setup(
     long_description=long_description,
     long_description_content_type="text/markdown",
     packages=find_packages(),
-    install_requires=(["deepspeed>=0.7.7"] if not sys.platform.startswith("win") else []) +[
+    install_requires=(
+        # training backends
+        ["deepspeed>=0.7.7"] if not sys.platform.startswith("win") else [])
+        + [
+        # logging niceties
         "coloredlogs>=15.0.1",
+        "humanize>=4.4.0",
+        "matplotlib>=3.6.0",
+        "pandas>=1.5.0",
+
+        # boiler plate niceties
         "diskcache>=5.4.0",
         "einops>=0.6.0",
-        "encodec>=0.1.1",
-        "phonemizer>=2.1.0",
-        "matplotlib>=3.6.0",
-        "numpy",
-        "omegaconf==2.0.6",
-        "tqdm>=4.64.1",
-        "humanize>=4.4.0",
+        "tqdm",
+
+        # HF bloat
+        "tokenizers>4.37.0",
         "transformers>4.37.0",
 
-        "pandas>=1.5.0",
+        # training bloat
+        "auraloss[all]", # [all] is needed for MelSTFTLoss
+        "h5py",
+        "prodigyopt @ git+https://github.com/konstmish/prodigy",
+
+        # practically the reason to use python
+        "numpy",
         "torch>=1.13.0",
         "torchaudio>=0.13.0",
         "torchmetrics",
 
-        "auraloss[all]",
+        # core foundations
+        "phonemizer>=2.1.0",
+        "encodec>=0.1.1",
         "vocos",
-        "h5py",
-        "torchscale @ git+https://git.ecker.tech/mrq/torchscale",
-        "prodigyopt @ git+https://github.com/konstmish/prodigy",
-
         "descript-audio-codec",
+
+        # gradio web UI
+        "gradio"
+        
     ],
+    extras_require = {
+        "all": [
+            # retnet backend (even though two internal copies exist)
+            "torchscale @ git+https://git.ecker.tech/mrq/torchscale",
+            # bitnet
+            "bitnet",
+            # mamba
+            "causal-conv1d",
+            "mamba-ssm",
+
+            # attention helpers
+            "xformers",
+            # "flash-attn" --no-build-isolation # commented out right now because I want to query this for Volta freaks like me who can't use it
+        ]
+    },
     url="https://git.ecker.tech/mrq/vall-e",
 )
diff --git a/vall_e/config.py b/vall_e/config.py
index ba8889d..d115912 100755
--- a/vall_e/config.py
+++ b/vall_e/config.py
@@ -6,6 +6,8 @@ import os
 import subprocess
 import sys
 import time
+import argparse
+import yaml
 
 import torch
 
@@ -14,15 +16,13 @@ from dataclasses import asdict, dataclass, field
 from functools import cached_property
 from pathlib import Path
 
-from omegaconf import OmegaConf
-
 from .utils.distributed import world_size
 
 # Yuck
 from transformers import PreTrainedTokenizerFast
 
 @dataclass()
-class _Config:
+class BaseConfig:
 	cfg_path: str | None = None
 
 	@property
@@ -81,39 +81,29 @@ class _Config:
 		with open(path, "w") as f:
 			f.write(self.dumps())
 
-	@staticmethod
-	def _is_cfg_argv(s):
-		return "=" in s and "--" not in s
-
 	@classmethod
 	def from_yaml( cls, yaml_path ):
-		return cls.from_cli( [f'yaml="{yaml_path}"'] )
+		return cls.from_cli( [f'--yaml="{yaml_path}"'] )
 
 	@classmethod
 	def from_cli(cls, args=sys.argv):
-		cli_cfg = OmegaConf.from_cli([s for s in args if cls._is_cfg_argv(s)])
+		# legacy support for yaml=`` format
+		for i, arg in enumerate(args):
+			if arg.startswith("yaml"):
+				args[i] = f'--{arg}'
 
-		# Replace argv to ensure there are no omegaconf options, for compatibility with argparse.
-		sys.argv = [s for s in sys.argv if not cls._is_cfg_argv(s)]
+		parser = argparse.ArgumentParser(allow_abbrev=False)
+		parser.add_argument("--yaml", type=Path, default=os.environ.get('VALLE_YAML', None)) # os environ so it can be specified in a HuggingFace Space too
+		args, unknown = parser.parse_known_args(args=args)
 
-		if cli_cfg.get("help"):
-			print(f"Configurable hyperparameters with their default values:")
-			print(json.dumps(asdict(cls()), indent=2, default=str))
-			exit()
+		state = {}
+		if args.yaml:
+			cfg_path = args.yaml
+			
+			state = yaml.safe_load(open(cfg_path, "r", encoding="utf-8"))
+			state.setdefault("cfg_path", cfg_path)
 
-		if "yaml" in cli_cfg:
-			yaml_cfg = OmegaConf.load(cli_cfg.yaml)
-			yaml_path = Path(cli_cfg.yaml).absolute()
-			cfg_path = Path(*yaml_path.relative_to(Path.cwd()).parts[:-1])
-			cfg_path = cfg_path.with_suffix("")
-			cfg_path = f'./{cfg_path}'
-
-			yaml_cfg.setdefault("cfg_path", cfg_path)
-			cli_cfg.pop("yaml")
-		else:
-			yaml_cfg = {}
-		merged = OmegaConf.merge(yaml_cfg, cli_cfg)
-		return cls(**dict(merged))
+		return cls(**state)
 
 	def __repr__(self):
 		return str(self)
@@ -621,7 +611,7 @@ class Optimizations:
 	fp8: bool = False # use fp8
 
 @dataclass()
-class Config(_Config):
+class Config(BaseConfig):
 	device: str = "cuda"
 	mode: str = "training" # "inferencing"
 	experimental: bool = False # So I can stop commenting out things when committing
@@ -668,6 +658,7 @@ class Config(_Config):
 			return diskcache.Cache(self.cache_dir).memoize
 		return lambda: lambda x: x
 
+	# I don't remember why this is needed
 	def load_yaml( self, config_path ):
 		tmp = Config.from_yaml( config_path )
 		self.__dict__.update(tmp.__dict__)
@@ -759,6 +750,10 @@ class Config(_Config):
 		if self.trainer.activation_checkpointing is not None:
 			self.trainer.gradient_checkpointing = self.trainer.activation_checkpointing
 
+		# load our HDF5 file if requested here
+		if self.dataset.use_hdf5:
+			self.load_hdf5()
+
 # Preserves the old behavior
 class NaiveTokenizer:
 	def get_vocab( self ):
@@ -787,15 +782,12 @@ class NaiveTokenizer:
 
 cfg = Config.from_cli()
 
-# OmegaConf might not coerce the dicts into the @dataclass decorated classes, so we (try to) coerce them ourselves
+# some safety for remapping deprecated formats and re-coercing uninitialized properties into actual types
 try:
 	cfg.format()
-	if cfg.dataset.use_hdf5:
-		cfg.load_hdf5()
 except Exception as e:
-	cfg.dataset.use_hdf5 = False
-	print("Error while parsing config YAML:", e)
-	pass
+	print("Error while parsing config YAML:")
+	raise e # throw an error because I'm tired of silent errors messing things up for me
 
 try:
 	from transformers import PreTrainedTokenizerFast
diff --git a/vall_e/inference.py b/vall_e/inference.py
index 07d93d9..af41deb 100755
--- a/vall_e/inference.py
+++ b/vall_e/inference.py
@@ -32,7 +32,8 @@ class TTS():
 		try:
 			cfg.format()
 		except Exception as e:
-			pass
+			print("Error while parsing config YAML:")
+			raise e # throw an error because I'm tired of silent errors messing things up for me
 
 		if amp is None:
 			amp = cfg.inference.amp