quick adjustment for LoRAs (because I finally got off my fat ass to look into them since my interest in AI generated content re-ignited again with locally running PyTorch things)

master
mrq 2023-02-08 18:59:39 +07:00
parent 3eb5ab52fa
commit 0d5bbfa465
4 changed files with 50 additions and 17 deletions

@ -1,18 +1,15 @@
# Textual Inversion/Hypernetwork Guide w/ E621 Content
An up-to-date repo with all the necessary files can be found [here](https://git.ecker.tech/mrq/stable-diffusion-utils): ([mirror](https://git.coom.tech/mrq/stable-diffusion-utils))
**!**WARNING**!** **!**CAUTION**!** ***DO NOT POST THE MIRROR REPO'S URL ON 4CHAN*** **!**CAUTION**!** **!**WARNING**!**
`coom.tech` is an automatic 30-day ban if posted. I am not responsible if you share that URL. Share the [rentry](https://rentry.org/sd-e621-textual-inversion/) instead, as this is effectively a copy of the README.
An up-to-date repo with all the necessary files can be found [here](https://git.ecker.tech/mrq/stable-diffusion-utils).
This guide has been stitched together with different trains of thoughts as I learn the ins and outs of effectively training concepts. Please keep this in mind if the guide seems to shift a bit, or sound confusing, or feels like it's covering unncessary topics. I intend to do a clean rewrite to make things more to-the-point yet concise.
Also, as new features get added, they have to find room among the details for Textual Inversion, so bear in mind something seems rather forced to be included. As examples:
* hypernetworks released a week or two after training textual inversion in the web UI was added
* the CLIP Aesthetic feature also released, and, while it requires little set-up, has a hard time finding a home in this guide
* there's been a ton of new promising features that I haven't bothered with, so stuff like LoRAs are being experimented with by me, albeit slowly
Unlike any guide for getting Voldy's Web UI up, a good majority of this guide is focused on getting the right content, and feeding it the right content, rather than running commands.
Unlike any guide for getting Voldy's Web UI up, a good majority of this guide is focused on getting the right content, and feeding it the right content, rather than running a batch script.
## Assumptions
@ -23,6 +20,12 @@ This guide assumes the following basics:
You can also extend this into any other booru-oriented model (I doubt anyone reading this cares, seems the normalfags are well content in their own circles), but you'll have to modify the fetch and pre-processing script according to the site images were pulled from. The general concepts still apply.
### LoRAs
Since I've been away working on other things for a few months, I've missed out on LoRAs. They seem very promising from seeing them cross my path, and training for them seem [very straightforward](https://rentry.org/LazyTrainingGuide). The core scripts seem to be easily adaptable to providing something you can easily feed into a LoRA training script; all that's needed is to set a boolean to `true` if you're interested in outputting for that instead.
When I get something cobbled together with meaningful results, I'll (finally) update this guide to including using them, and perhaps replace them with TI/hypernetworks/what-have-you instead.
## Glossary
Below is a list of terms clarified. I notice I'll use some terms interchangably with other concepts. These do not necessarily cover everything that's generally related to Stable Diffusion, but moreso about Textual Inversion and terms I'll use that needs disambiguation:
@ -33,9 +36,10 @@ Below is a list of terms clarified. I notice I'll use some terms interchangably
* `source content/material`: the images you're using to train against; pulled from e621 (or another booru)
* `embedding`: the trained "model" of the subject or style in question. "Model" would be wrong to call the trained output, as Textual Inversion isn't true training
- `aesthetic image embedding/clip aesthetic`: a collection of images of an aesthetic you're trying to capture to use for CLIP Aesthetic features
* `hypernetwork`: a different way to train custom content against a model, almost all of the same prinicples here apply for hypernetworks
* `loss rate`: a calculated value determining how close the actual output is to the expected output. Typically a value between `0.1` and `0.15` seem to be a good sign
* `epoch`: a term derived from typical neural network training, normally, it's referred to as a full training cycle over your source material (total iterations / training set size), but the web UI doesn't actually do anything substantial with it.
* `hypernetwork`: a different way to train custom content against a model, almost all of the same prinicples here apply for hypernetworks
* `LoRA`: Low-Rank Adaptation: an extremely promising, fancy way of expanding upon a model.
## Preface
@ -125,7 +129,7 @@ Lastly, for Textual Inversion, your results will vary greatly depending on the c
### Fetch Script
If you want to accelerate your ~~scraping~~ content acquisition, consult the fetch script under [`./src/`](https://git.coom.tech/mrq/stable-diffusion-utils/src/branch/master/src/). It's a """simple but powerful""" script that can ~~scrape~~ download from e621 given a search query.
If you want to accelerate your ~~scraping~~ content acquisition, consult the fetch script under [`./src/`](https://git.ecker.tech/mrq/stable-diffusion-utils/src/branch/master/src/). It's a """simple but powerful""" script that can ~~scrape~~ download from e621 given a search query.
All you simply need to do is invoke the script with `python3 ./src/fetch.py "search query"`. For example: `python3 ./src/fetch.py "zangoose -female score:>0"`.
@ -143,7 +147,7 @@ Use the automatic pre-processing script in the web UI to flip and split your sou
You are not required to actually run this, as this script is just a shortcut to manually renaming files and curating the tags, but it cuts the bulk work of it.
Included in the repo under [`./src/`](https://git.coom.tech/mrq/stable-diffusion-utils/src/branch/master/src/) is a script for tagging images from e621 in the filename for later user in the web UI.
Included in the repo under [`./src/`](https://git.ecker.tech/mrq/stable-diffusion-utils/src/branch/master/src/) is a script for tagging images from e621 in the filename for later user in the web UI.
You can also have multiple variations of the same images, as it's useful if you're splitting an image into multiple parts. For example, the following is valid:
```
@ -173,7 +177,7 @@ The generalized procedure is as followed:
### Pre-Requisites
There's little safety checks or error messages, so triple check you have:
* downloaded/cloned [this repo](https://git.coom.tech/mrq/stable-diffusion-utils)
* downloaded/cloned [this repo](https://git.ecker.tech/mrq/stable-diffusion-utils)
* open a command prompt/terminal where you downloaded/cloned this rep
* fill the `./images/downloaded/` folder with the images you want to use
- if you're manually supplying your images, make sure they retain the original filenames from e621
@ -186,6 +190,8 @@ Consult the script if you want to adjust it's behavior. I tried my best to expla
If you're looking to train for a hypernetwork, I suggest having the script include tags for species and characters (in the script are two '`tagsOverride`', the second one is commented out, so just copy what's in the `[]`'s into the first one).
If you're looking to set up files for LoRA training,
### Caveats
There's some "bugs" with the script, be it limitations with interfacing with web UI, or oversights in processing tags:

@ -17,5 +17,6 @@
"removeParentheses": true,
"onlyIncludeModelArtists": true,
"reverseTags": false,
"tagDelimiter": ","
"tagDelimiter": ",",
"lora": false
}

@ -54,9 +54,11 @@ let config = {
reverseTags: false, // inverts sorting, prioritizing tags with little representation in the model
tagDelimiter: ",", // what separates each tag in the filename, web UI will accept comma separated filenames
tagDelimiter: ", ", // what separates each tag in the filename, web UI will accept comma separated filenames
invalidCharacters: "\\/:*?\"<>|", // characters that can't go in a filename
lora: false, // set to true to enable outputting for LoRA training
}
// import source
@ -97,6 +99,13 @@ args.shift();
if ( args[0] ) config.input = args[0];
if ( args[1] ) config.output = args[1];
if ( config.lora ) {
config.filenameLimit = 0;
if ( config.tagDelimiter.length == 1 ) {
config.tagDelimiter += " ";
}
}
for ( let k in {"input":null, "output":null} ) {
try {
if ( !FS.lstatSync(config[k]).isDirectory() ) {
@ -197,7 +206,7 @@ let parse = async () => {
let jointmp = "";
let filtered = [];
for ( let i in tags ) {
if ( (jointmp + config.tagDelimiter + tags[i]).length > config.filenameLimit ) break;
if ( config.filenameLimit && (jointmp + config.tagDelimiter + tags[i]).length > config.filenameLimit ) break;
jointmp += config.tagDelimiter + tags[i];
if ( config.removeParentheses )
tags[i] = tags[i].replace(/\(.+?\)$/, "").trim()
@ -205,8 +214,12 @@ let parse = async () => {
}
let joined = filtered.join(config.tagDelimiter)
// NOOOOOO YOU'RE SUPPOSE TO DO IT ASYNCHRONOUSLY SO IT'S NOT BLOCKING
FS.copyFileSync(`${config.input}/${file}`, `${config.output}/${file.replace(md5, joined).trim()}`)
if ( config.lora ) {
FS.copyFileSync(`${config.input}/${file}`, `${config.output}/${file.replace(md5, i).trim()}`)
} else {
FS.copyFileSync(`${config.input}/${file}`, `${config.output}/${file.replace(md5, joined).trim()}`)
}
FS.writeFileSync(`${config.output}/${i}.txt`, joined)
if ( rateLimit && config.rateLimit ) await new Promise( (resolve) => {
setTimeout(resolve, config.rateLimit)

@ -62,6 +62,8 @@ config = {
'tagDelimiter': ",", # what separates each tag in the filename, web UI will accept comma separated filenames
'invalidCharacters': "\\/:*?\"<>|", # characters that can't go in a filename
'lora': True, # set to true to enable outputting for LoRA training
}
if os.path.exists(config['source']):
@ -93,12 +95,18 @@ try:
except:
pass
if config['lora']:
config['filenameLimit'] = 0
if len(config['tagDelimiter']) == 1:
config['tagDelimiter'] = config['tagDelimiter'] + " ";
def parse():
global config, cache
files = []
for file in os.listdir(config['input']):
files.append(file)
for i in range(len(files)):
index = i
file = files[i]
# try filenames like "83737b5e961b594c26e8feaed301e7a5 (1).jpg" (duplicated copies from a file manager)
md5 = re.match(r"^([a-f0-9]{32})", file)
@ -185,7 +193,7 @@ def parse():
jointmp = ""
filtered = []
for i in tags:
if len(jointmp + config['tagDelimiter'] + i) > config['filenameLimit']:
if config['filenameLimit'] > 0 and len(jointmp + config['tagDelimiter'] + i) > config['filenameLimit']:
break
jointmp += config['tagDelimiter'] + i
if config['removeParentheses']:
@ -193,7 +201,12 @@ def parse():
filtered.append(i)
joined = config['tagDelimiter'].join(filtered)
shutil.copy(os.path.join(config['input'], file), os.path.join(config['output'], file.replace(md5, joined).strip()))
if config['lora']:
shutil.copy(os.path.join(config['input'], file), os.path.join(config['output'], file.replace(md5, f'{index}').strip()))
with open(os.path.join(config['output'], f"{index}.txt"), 'wb') as f:
f.write(joined.encode('utf-8'))
else:
shutil.copy(os.path.join(config['input'], file), os.path.join(config['output'], file.replace(md5, joined).strip()))
if rateLimit and config['rateLimit']:
time.sleep(config['rateLimit'] / 1000.0)