A ResNet-based image classifier for """specific""" images

Go to file

mrq ba2ca9c24d added small listen server to allow inferencing (todo: allow reading from base64)		2023-08-05 16:50:53 +00:00
data	Forgot the yaml	2023-08-05 03:48:33 +00:00
image_classifier	added small listen server to allow inferencing (todo: allow reading from base64)	2023-08-05 16:50:53 +00:00
scripts	culled artifacts left over from the valle trainer	2023-08-05 04:03:59 +00:00
.gitignore	An amazing commit :)	2023-08-05 03:40:14 +00:00
LICENSE	An amazing commit :)	2023-08-05 03:40:14 +00:00
README.md	culled artifacts left over from the valle trainer	2023-08-05 04:03:59 +00:00
setup.py	ops don't say it	2023-08-05 03:48:06 +00:00

README.md

Tentative Title For A ResNet-Based Image Classifier

This is a simple ResNet based image classifier for """specific images""", using a similar training framework I use to train VALL-E.

Premise

This was cobbled together in a night, partly to test how well my training framework fares when not married to my VALL-E implementation, and partly to solve a problem I have recently faced. Since I've been balls deep in learning the ins and outs of making VALL-E work, why not do the exact opposite (a tiny, image classification model of fixed lengths) to test the framework and my knowledge? Thus, this """ambiguous""" project is born.

This is by no ways state of the art, as it just leverages an existing ResNet arch provided by torchvision.

Training

Throw the images you want to train under ./data/images/.
Modify the ./data/config.yaml accordingly.
Install using pip3 install -e ./image_classifier/.
Train using python3 -m image_classifier.train yaml='./data/config.yaml'.
Wait.

Inferencing

Simply invoke the inferencer with the following command: python3 -m image_classifier "./data/path-to-your-image.png" yaml="./data/config.yaml" --temp=1.0

Known Issues

Setting dataset.workers higher than 0 will cause issues when using the local engine backend. Use DeepSpeed.
The evaluation / validation routine doesn't quite work.
Using float16 with the local engine backend will cause instability in the losses. Use DeepSpeed.

Strawmen

> UGH... Why another training framework!!! Just subjugate DLAS even more!!!

I want my own code to own. The original VALL-E implementation had a rather nice and clean setup that mostly just made sense. DLAS was a nightmare to comb through for the gorillion amounts of models it attests.

> OK. But how do I use it for [thing that isn't the specific usecase only I know/care about]

Simply provide your own symmapping under ./image_classifier/data.py, and, be sure to set the delimiter (where exactly is an exercise left to the reader).

Because this is for a very specific use-case. I don't really care right now to make this a little more generalized, despite most of the bits and bobs for it to generalize being there.

> ur [a slur] for using a ResNet... why not use [CRNN / some other meme arch]??

I don't care, I'd rather keep the copypasting from other people's code to a minimum. Lazily adapting my phoneme tokenizer from my VALL-E implementation into something practically fixed length by introducing start/stop tokens should be grounds for me to use a CRNN, or anything recurrent at the very least, but again, I don't care, it just works for my use case at the moment.

> UGH!!! What are you talking about """specific images"""???

ひみつ

> NOOOO!!!! WHY AREN'T YOU USING [cuck license]???