resnet-classifier/README.md

# Tentative Title For A ResNet-Based Image Classifier

This is a simple ResNet based image classifier for """specific images""", using a similar training framework I use to train [VALL-E](https://git.ecker.tech/mrq/vall-e/).

## Premise

This was cobbled together in a night, partly to test how well my training framework fares when not married to my VALL-E implementation, and partly to solve a problem I have recently faced. Since I've been balls deep in learning the ins and outs of making VALL-E work, why not do the exact opposite (a tiny, image classification model of fixed lengths) to test the framework and my knowledge? Thus, this """ambiguous""" project is born.

This is by no ways state of the art, as it just leverages an existing ResNet arch provided by `torchvision`.

## Training

1. Throw the images you want to train under `./data/images/`.

2. Modify the `./data/config.yaml` accordingly.

3. Install using `pip3 install -e ./image_classifier/`.

4. Train using `python3 -m image_classifier.train yaml='./data/config.yaml'`.

5. Wait.

## Inferencing

Simply invoke the inferencer with the following command: `python3 -m image_classifier "./data/path-to-your-image.png" yaml="./data/config.yaml" --temp=1.0`

## Known Issues

* Setting `dataset.workers` higher than 0 will cause issues when using the local engine backend. Use DeepSpeed.
* The evaluation / validation routine doesn't quite work.
* Using `float16` with the local engine backend will cause instability in the losses. Use DeepSpeed.

## Strawmen

>\> UGH... Why *another* training framework!!! Just subjugate [DLAS](https://git.ecker.tech/mrq/DL-Art-School) even more!!!

I want my own code to own. The original VALL-E implementation had a rather nice and clean setup that *mostly* just made sense. DLAS was a nightmare to comb through for the gorillion amounts of models it attests.

>\> OK. But how do I use it for `[thing that isn't the specific usecase only I know/care about]`

Simply provide your own symmapping under `./image_classifier/data.py`, and, be sure to set the delimiter (where exactly is an exercise left to the reader).

Because this is for a ***very specific*** use-case. I don't really care right now to make this a *little* more generalized, despite most of the bits and bobs for it to generalize being there.

>\> ur `[a slur]` for using a ResNet... why not use `[CRNN / some other meme arch]`??

I don't care, I'd rather keep the copypasting from other people's code to a minimum. Lazily adapting my phoneme tokenizer from my VALL-E implementation into something practically fixed length by introducing start/stop tokens should be grounds for me to use a CRNN, or anything recurrent at the very least, but again, I don't care, it just works for my use case at the moment.

>\> UGH!!! What are you talking about """specific images"""???

[ひみつ](https://files.catbox.moe/csuh49.webm)

>\> NOOOO!!!! WHY AREN'T YOU USING `[cuck license]`???

:)