.. | ||
in | ||
out | ||
fetch.js | ||
fetch.py | ||
package.json | ||
preprocess.js | ||
preprocess.py | ||
README.md | ||
tags.csv |
E621 Scripts
Included are the utilities provided for scraping acquiring your source content to train on.
If you're targeting another booru, the same principles apply, but you'll need to adjust your repo URL and processing your booru's JSON output. Doing so is left as an exercise to the reader.
Lastly, feature parity between the two scripts may not be up to par, as I'm a sepples programmer, not a Python dev. The initial preprocess.py
was gratiously written by an anon, and I've cobbled together the fetch.py
one myself. The node.js version will definitely have more features, as I'm better at node.js
Dependencies
The python scripts have no additional dependencies, while node.js scripts requires running npm install node-fetch@2
(v2.x because I'm old and still using require
for my includes).
Fetch
This script is responsible for scraping downloading from e621 all requested files for your target subject/style.
To run, simply invoke the script with python fetch.py [search query]
. For example: python fetch.py "kemono -dog"
to download all non-dog posts tagged as kemono.
In the script are some tune-ables, but the defaults are sane enough not to require any additional configuration.
If you're using another booru, extending the script to support your booru of choice is easy, as the script was configured to allow for additional booru definitions. Just reference the provided one for e621 if you need a starting point.
The python script is nearly at feature-parity with the node.js script, albeit missing the concurrency option. Please understand, not a Python dev.
Pre-Process
The bread and butter of this repo is the preprocess script, responsible for associating your images from e621 with tags to train against during Textual Inversion.
The output from the fetch script seamlessy integrates with the inputs for the preprocess script. The cache.json
file should also have all the necessary tags to further accelerate this script.
For the python version, simply place your source material into the ./in/
folder, invoke the script with python3 preprocess.py
, then get your processed files from ./out/
. For the node.js version, do the same thing, but with node preprocess.js
.
This script should also support files already pre-processed through the web UI, as long as they were processed with their original filenames (the MD5 hash booru filenames). Pre-processing in the web UI after running this script might prove tricky, as I've had files named something like 00001-0anthro[...]
, and had to use a clever rename command to break it apart.