History

mrq a31ff6ba1d bug fixes, more features, some starting notes on hypernetworks		2022-10-12 03:45:16 +00:00
..
in	cum	2022-10-04 21:14:46 +00:00
out	cum	2022-10-04 21:14:46 +00:00
fetch.js	bug fixes, more features, some starting notes on hypernetworks	2022-10-12 03:45:16 +00:00
fetch.py	bug fixes, more features, some starting notes on hypernetworks	2022-10-12 03:45:16 +00:00
package.json	cum	2022-10-04 21:25:47 +00:00
preprocess.js	bug fixes, more features, some starting notes on hypernetworks	2022-10-12 03:45:16 +00:00
preprocess.py	bug fixes, more features, some starting notes on hypernetworks	2022-10-12 03:45:16 +00:00
README.md	bug fixes, more features, some starting notes on hypernetworks	2022-10-12 03:45:16 +00:00
tags.csv	cum	2022-10-04 21:14:46 +00:00

README.md

E621 Scripts

Included are the utilities provided for ~~scraping~~ acquiring your source content to train on.

If you're targeting another booru, the same principles apply, but you'll need to adjust your repo URL and processing your booru's JSON output. Doing so is left as an exercise to the reader.

Lastly, feature parity between the two scripts may not be up to par, as I'm a sepples programmer, not a Python dev. The initial preprocess.py was gratiously written by an anon, and I've cobbled together the fetch.py one myself. The node.js version will definitely have more features, as I'm better at node.js

Dependencies

The python scripts have no additional dependencies, while node.js scripts requires running npm install node-fetch@2 (v2.x because I'm old and still using require for my includes).

Fetch

This script is responsible for ~~scraping~~ downloading from e621 all requested files for your target subject/style.

To run, simply invoke the script with python fetch.py [search query]. For example: python fetch.py "kemono -dog" to download all non-dog posts tagged as kemono.

In the script are some tune-ables, but the defaults are sane enough not to require any additional configuration.

If you're using another booru, extending the script to support your booru of choice is easy, as the script was configured to allow for additional booru definitions. Just reference the provided one for e621 if you need a starting point.

The python script is nearly at feature-parity with the node.js script, albeit missing the concurrency option. Please understand, not a Python dev.

Pre-Process

The bread and butter of this repo is the preprocess script, responsible for associating your images from e621 with tags to train against during Textual Inversion.

The output from the fetch script seamlessy integrates with the inputs for the preprocess script. The cache.json file should also have all the necessary tags to further accelerate this script.

For the python version, simply place your source material into the ./in/ folder, invoke the script with python3 preprocess.py, then get your processed files from ./out/. For the node.js version, do the same thing, but with node preprocess.js.

This script should also support files already pre-processed through the web UI, as long as they were processed with their original filenames (the MD5 hash booru filenames). Pre-processing in the web UI after running this script might prove tricky, as I've had files named something like 00001-0anthro[...], and had to use a clever rename command to break it apart.