2.9 KiB
Executable File
Tentative Title For A ResNet-Based Image Classifier
This is a simple ResNet based image classifier for """specific images""", using a similar training framework I use to train VALL-E.
Premise
This was cobbled together in a night, partly to test how well my training framework fares when not married to my VALL-E implementation, and partly to solve a problem I have recently faced. Since I've been balls deep in learning the ins and outs of making VALL-E work, why not do the exact opposite (a tiny, image classification model of fixed lengths) to test the framework and my knowledge? Thus, this """ambiguous""" project is born.
This is by no ways state of the art, as it just leverages an existing ResNet arch provided by torchvision
.
Training
-
Throw the images you want to train under
./data/images/
. -
Modify the
./data/config.yaml
accordingly. -
Install using
pip3 install -e ./image_classifier/
. -
Train using
python3 -m image_classifier.train yaml='./data/config.yaml'
. -
Wait.
Inferencing
Simply invoke the inferencer with the following command: python3 -m image_classifier "./data/path-to-your-image.png" yaml="./data/config.yaml" --temp=1.0
Known Issues
- Setting
dataset.workers
higher than 0 will cause issues when using the local engine backend. Use DeepSpeed. - The evaluation / validation routine doesn't quite work.
- Using
float16
with the local engine backend will cause instability in the losses. Use DeepSpeed.
Strawmen
> UGH... Why another training framework!!! Just subjugate DLAS even more!!!
I want my own code to own. The original VALL-E implementation had a rather nice and clean setup that mostly just made sense. DLAS was a nightmare to comb through for the gorillion amounts of models it attests.
> OK. But how do I use it for
[thing that isn't the specific usecase only I know/care about]
Simply provide your own symmapping under ./image_classifier/data.py
, and, be sure to set the delimiter (where exactly is an exercise left to the reader).
Because this is for a very specific use-case. I don't really care right now to make this a little more generalized, despite most of the bits and bobs for it to generalize being there.
> ur
[a slur]
for using a ResNet... why not use[CRNN / some other meme arch]
??
I don't care, I'd rather keep the copypasting from other people's code to a minimum. Lazily adapting my phoneme tokenizer from my VALL-E implementation into something practically fixed length by introducing start/stop tokens should be grounds for me to use a CRNN, or anything recurrent at the very least, but again, I don't care, it just works for my use case at the moment.
> UGH!!! What are you talking about """specific images"""???
> NOOOO!!!! WHY AREN'T YOU USING
[cuck license]
???
:)