A ResNet-based image classifier for """specific""" images
 
 
Go to file
mrq 5cb28a210e fixes that a CPU-only pytorch needed 2023-08-06 15:14:05 +07:00
data tweaks, fixes, cleanup, added reporting accuracy/precision from the VALL-E trainer (which indirectly revealed a grody bug in the VALL-E trainer), some other cr*p 2023-08-05 22:42:05 +07:00
image_classifier fixes that a CPU-only pytorch needed 2023-08-06 15:14:05 +07:00
scripts culled artifacts left over from the valle trainer 2023-08-05 04:03:59 +07:00
.gitignore tweaks, fixes, cleanup, added reporting accuracy/precision from the VALL-E trainer (which indirectly revealed a grody bug in the VALL-E trainer), some other cr*p 2023-08-05 22:42:05 +07:00
LICENSE An amazing commit :) 2023-08-05 03:40:14 +07:00
README.md tweaks, fixes, cleanup, added reporting accuracy/precision from the VALL-E trainer (which indirectly revealed a grody bug in the VALL-E trainer), some other cr*p 2023-08-05 22:42:05 +07:00
setup.py fixes that a CPU-only pytorch needed 2023-08-06 15:14:05 +07:00

README.md

Tentative Title For A ResNet-Based Image Classifier

This is a simple ResNet based image classifier for """specific images""", using a similar training framework I use to train VALL-E.

Premise

This was cobbled together in a night, partly to test how well my training framework fares when not married to my VALL-E implementation, and partly to solve a minor problem I have recently faced. Since I've been balls deep in learning the ins and outs of making VALL-E work, why not do the exact opposite (a tiny, image classification model of fixed lengths) to test the framework and my knowledge? Thus, this """ambiguous""" project is born.

This is by no ways state of the art, as it just leverages an existing ResNet arch provided by torchvision.

Training

  1. Throw the images you want to train under ./data/images/.

  2. Modify the ./data/config.yaml accordingly.

  3. Install using pip3 install -e ./image_classifier/.

  4. Train using python3 -m image_classifier.train yaml='./data/config.yaml'.

  5. Wait.

Inferencing

Simply invoke the inferencer with the following command: python3 -m image_classifier --path="./data/path-to-your-image.png" yaml="./data/config.yaml" --temp=1.0

Continuous Usage

If you're looking to continuously classifier trained images, use python3 -m image_classifier --listen --port=7860 yaml="./data/config.yaml" --temp=1.0 instead to enable a light webserver using simple_http_server. Send a GET request to http://127.0.0.1:7860/?b64={base64 encoded image string} and a JSON response will be returned with the classified label.

Known Issues

  • Setting dataset.workers higher than 0 will cause issues when using the local engine backend. Use DeepSpeed.
  • Using float16 with the local engine backend will cause instability in the losses. Use DeepSpeed.
  • Web server doesn't emit content-type: application/json, nor accepts JSON POSTs at the moment.

Strawmen

> UGH... Why another training framework!!! Just subjugate DLAS even more!!!

I want my own code to own. The original VALL-E implementation had a rather nice and clean setup that mostly just made sense. DLAS was a nightmare to comb through for the gorillion amounts of models it attests.

> OK. But how do I use it for [thing that isn't the specific usecase only I know/care about]

Simply provide your own symmapping under ./image_classifier/data.py, and, be sure to set the delimiter (where exactly is an exercise left to the reader).

Because this is for a very specific use-case. I don't really care right now to make this a little more generalized, despite most of the bits and bobs for it to generalize being there.

> ur [a slur] for using a ResNet... why not use [CRNN / some other meme arch]??

I don't care, I'd rather keep the copypasting from other people's code to a minimum. Lazily adapting my phoneme tokenizer from my VALL-E implementation into something practically fixed length by introducing start/stop tokens should be grounds for me to use a CRNN, or anything recurrent at the very least, but again, I don't care, it just works for my use case at the moment.

> UGH!!! What are you talking about """specific images"""???

ひみつ

> NOOOO!!!! WHY AREN'T YOU USING [cuck license]???

:)