History

mrq d7e163e0e6 generalized the fetch script to also allow other boorus, README 2		2022-10-10 18:34:56 +00:00
..
in	cum	2022-10-04 21:14:46 +00:00
out	cum	2022-10-04 21:14:46 +00:00
fetch.js	generalized the fetch script to also allow other boorus, README 2	2022-10-10 18:34:56 +00:00
package.json	cum	2022-10-04 21:25:47 +00:00
preprocess.js	generalized the fetch script to also allow other boorus, README 2	2022-10-10 18:34:56 +00:00
preprocess.py	Added some notes regarding style training, refined some ambiguous nerd-language	2022-10-10 02:34:48 +00:00
README.md	generalized the fetch script to also allow other boorus, README 2	2022-10-10 18:34:56 +00:00
tags.csv	cum	2022-10-04 21:14:46 +00:00

README.md

E621 Scripts

Included are the utilities provided for ~~scraping~~ acquiring your source content to train on.

If you're targeting another booru, the same principles apply, but you'll need to adjust your repo URL and processing your booru's JSON output. Doing so is left as an exercise to the reader.

Dependencies

While I strive to make dependencies minimal, only the pre-processsing script is available in Python, while the e621 downloading script is only available in node.js, as I'm not that strong of a python dev. It's reasonable to assume everyone has python, as it's a hard dependency for using voldy's web UI.

Python scripts have no additional dependencies, while node.js scripts require running npm install node-fetch@2 (v2.x because I'm old and still using require for my includes).

Fetch

!TODO! Rewrite in python, currently only available in node.js

This script is responsible for ~~scraping~~ downloading from e621 all requested files for your target subject/style.

To run, simply invoke the script with node fetch.js [search query]. For example: node fetch.js "kemono -dog" to download all non-dog posts tagged as kemono.

In the script are some tune-ables, but the defaults are sane enough not to require any additional configuration.

If you're using another booru, extending the script to support your booru of choice is easy, as the script was configured to allow for additional booru definitions. Just reference the provided one for e621 if you need a starting point.

Pre-Process

The bread and butter of this repo is the preprocess script, responsible for associating your images from e621 with tags to train against during Textual Inversion.

The output from the fetch script seamlessy integrates with the inputs for the preprocess script. The cache.json file should also have all the necessary tags to further accelerate this script.

For the python version, simply place your source material into the ./in/ folder, invoke the script with python3 preprocess.py, then get your processed files from ./out/. For the node.js version, do the same thing, but with node preprocess.js.

This script should also support files already pre-processed through the web UI, as long as they were processed with their original filenames (the MD5 hash booru filenames). Pre-processing in the web UI after running this script might prove tricky, as I've had files named something like 00001-0anthro[...], and had to use a clever rename command to break it apart.