55 lines
2.9 KiB
Markdown
Executable File
55 lines
2.9 KiB
Markdown
Executable File
# Tentative Title For A ResNet-Based Image Classifier
|
|
|
|
This is a simple ResNet based image classifier for """specific images""", using a similar training framework I use to train [VALL-E](https://git.ecker.tech/mrq/vall-e/).
|
|
|
|
## Premise
|
|
|
|
This was cobbled together in a night, partly to test how well my training framework fares when not married to my VALL-E implementation, and partly to solve a problem I have recently faced. Since I've been balls deep in learning the ins and outs of making VALL-E work, why not do the exact opposite (a tiny, image classification model of fixed lengths) to test the framework and my knowledge? Thus, this """ambiguous""" project is born.
|
|
|
|
This is by no ways state of the art, as it just leverages an existing ResNet arch provided by `torchvision`.
|
|
|
|
## Training
|
|
|
|
1. Throw the images you want to train under `./data/images/`.
|
|
|
|
2. Modify the `./data/config.yaml` accordingly.
|
|
|
|
3. Install using `pip3 install -e ./image_classifier/`.
|
|
|
|
4. Train using `python3 -m image_classifier.train yaml='./data/config.yaml'`.
|
|
|
|
5. Wait.
|
|
|
|
## Inferencing
|
|
|
|
Simply invoke the inferencer with the following command: `python3 -m image_classifier "./data/path-to-your-image.png" yaml="./data/config.yaml" --temp=1.0`
|
|
|
|
## Known Issues
|
|
|
|
* Setting `dataset.workers` higher than 0 will cause issues when using the local engine backend. Use DeepSpeed.
|
|
* The evaluation / validation routine doesn't quite work.
|
|
* Using `float16` with the local engine backend will cause instability in the losses. Use DeepSpeed.
|
|
|
|
## Strawmen
|
|
|
|
>\> UGH... Why *another* training framework!!! Just subjugate [DLAS](https://git.ecker.tech/mrq/DL-Art-School) even more!!!
|
|
|
|
I want my own code to own. The original VALL-E implementation had a rather nice and clean setup that *mostly* just made sense. DLAS was a nightmare to comb through for the gorillion amounts of models it attests.
|
|
|
|
>\> OK. But how do I use it for `[thing that isn't the specific usecase only I know/care about]`
|
|
|
|
Simply provide your own symmapping under `./image_classifier/data.py`, and, be sure to set the delimiter (where exactly is an exercise left to the reader).
|
|
|
|
Because this is for a ***very specific*** use-case. I don't really care right now to make this a *little* more generalized, despite most of the bits and bobs for it to generalize being there.
|
|
|
|
>\> ur `[a slur]` for using a ResNet... why not use `[CRNN / some other meme arch]`??
|
|
|
|
I don't care, I'd rather keep the copypasting from other people's code to a minimum. Lazily adapting my phoneme tokenizer from my VALL-E implementation into something practically fixed length by introducing start/stop tokens should be grounds for me to use a CRNN, or anything recurrent at the very least, but again, I don't care, it just works for my use case at the moment.
|
|
|
|
>\> UGH!!! What are you talking about """specific images"""???
|
|
|
|
[ひみつ](https://files.catbox.moe/csuh49.webm)
|
|
|
|
>\> NOOOO!!!! WHY AREN'T YOU USING `[cuck license]`???
|
|
|
|
:) |