- Adds a script which preprocesses quantized mels given a DVAE
- Adds a dataset which can consume preprocessed qmels
- Reworks GPT TTS to consume the outputs of that dataset (removes logic to add padding and start/end tokens)
- Adds inference to gpt_tts
- Allow image_folder_dataset to normalize inbound images
- ExtensibleTrainer can denormalize images on the output path
- Support .webp - an output from LSUN
- Support logistic GAN divergence loss
- Support stylegan2 TF weight extraction for discriminator
- New injector that produces latent noise (with separated paths)
- Modify FID evaluator to be operable with rosinality-style GANs
- Turns out my custom convolution was RIDDLED with backwards bugs, which is
why the existing implementation wasn't working so well.
- Implements the switch logic from both Mixture of Experts and Switch Transformers
for testing purposes.
- The pixpro latent now rescales the latent space instead of using a "coordinate vector", which
**might** have performance implications.
- The latent against which the pixel loss is computed can now be a small, randomly sampled patch
out of the entire latent, allowing further memory/computational discounts. Since the loss
computation does not have a receptive field, this should not alter the loss.
- The instance projection size can now be separate from the pixel projection size.
- PixContrast removed entirely.
- ResUnet with full resolution added.
This is a concept from "Lifelong Learning GAN", although I'm skeptical of it's novelty -
basically you scale and shift the weights for the generator and discriminator of a pretrained
GAN to "shift" into new modalities, e.g. faces->birds or whatever. There are some interesting
applications of this that I would like to try out.