quick adjustment for LoRAs (because I finally got off my fat ass to look into them since my interest in AI generated content re-ignited again with locally running PyTorch things)

2023-02-08 18:59:39 +00:00 · 2023-02-08 18:59:39 +00:00 · 0d5bbfa465
commit 0d5bbfa465
parent 3eb5ab52fa
4 changed files with 50 additions and 17 deletions
--- a/README.md
+++ b/README.md
@ -1,18 +1,15 @@
 # Textual Inversion/Hypernetwork Guide w/ E621 Content

-An up-to-date repo with all the necessary files can be found [here](https://git.ecker.tech/mrq/stable-diffusion-utils): ([mirror](https://git.coom.tech/mrq/stable-diffusion-utils))
-
-**!**WARNING**!** **!**CAUTION**!** ***DO NOT POST THE MIRROR REPO'S URL ON 4CHAN*** **!**CAUTION**!** **!**WARNING**!**
-
-`coom.tech` is an automatic 30-day ban if posted. I am not responsible if you share that URL. Share the [rentry](https://rentry.org/sd-e621-textual-inversion/) instead, as this is effectively a copy of the README.
+An up-to-date repo with all the necessary files can be found [here](https://git.ecker.tech/mrq/stable-diffusion-utils).

 This guide has been stitched together with different trains of thoughts as I learn the ins and outs of effectively training concepts. Please keep this in mind if the guide seems to shift a bit, or sound confusing, or feels like it's covering unncessary topics. I intend to do a clean rewrite to make things more to-the-point yet concise.

 Also, as new features get added, they have to find room among the details for Textual Inversion, so bear in mind something seems rather forced to be included. As examples:
 * hypernetworks released a week or two after training textual inversion in the web UI was added
 * the CLIP Aesthetic feature also released, and, while it requires little set-up, has a hard time finding a home in this guide
+* there's been a ton of new promising features that I haven't bothered with, so stuff like LoRAs are being experimented with by me, albeit slowly

-Unlike any guide for getting Voldy's Web UI up, a good majority of this guide is focused on getting the right content, and feeding it the right content, rather than running commands.
+Unlike any guide for getting Voldy's Web UI up, a good majority of this guide is focused on getting the right content, and feeding it the right content, rather than running a batch script.

 ## Assumptions

@ -23,6 +20,12 @@ This guide assumes the following basics:

 You can also extend this into any other booru-oriented model (I doubt anyone reading this cares, seems the normalfags are well content in their own circles), but you'll have to modify the fetch and pre-processing script according to the site images were pulled from. The general concepts still apply.

+### LoRAs
+
+Since I've been away working on other things for a few months, I've missed out on LoRAs. They seem very promising from seeing them cross my path, and training for them seem [very straightforward](https://rentry.org/LazyTrainingGuide). The core scripts seem to be easily adaptable to providing something you can easily feed into a LoRA training script; all that's needed is to set a boolean to `true` if you're interested in outputting for that instead.
+
+When I get something cobbled together with meaningful results, I'll (finally) update this guide to including using them, and perhaps replace them with TI/hypernetworks/what-have-you instead.
+
 ## Glossary

 Below is a list of terms clarified. I notice I'll use some terms interchangably with other concepts. These do not necessarily cover everything that's generally related to Stable Diffusion, but moreso about Textual Inversion and terms I'll use that needs disambiguation:
@ -33,9 +36,10 @@ Below is a list of terms clarified. I notice I'll use some terms interchangably
 * `source content/material`: the images you're using to train against; pulled from e621 (or another booru)
 * `embedding`: the trained "model" of the subject or style in question. "Model" would be wrong to call the trained output, as Textual Inversion isn't true training
 	- `aesthetic image embedding/clip aesthetic`: a collection of images of an aesthetic you're trying to capture to use for CLIP Aesthetic features
-* `hypernetwork`: a different way to train custom content against a model, almost all of the same prinicples here apply for hypernetworks
 * `loss rate`: a calculated value determining how close the actual output is to the expected output. Typically a value between `0.1` and `0.15` seem to be a good sign
 * `epoch`: a term derived from typical neural network training, normally, it's referred to as a full training cycle over your source material (total iterations / training set size), but the web UI doesn't actually do anything substantial with it.
+* `hypernetwork`: a different way to train custom content against a model, almost all of the same prinicples here apply for hypernetworks
+* `LoRA`: Low-Rank Adaptation: an extremely promising, fancy way of expanding upon a model.

 ## Preface

@ -125,7 +129,7 @@ Lastly, for Textual Inversion, your results will vary greatly depending on the c

 ### Fetch Script

-If you want to accelerate your ~~scraping~~ content acquisition, consult the fetch script under [`./src/`](https://git.coom.tech/mrq/stable-diffusion-utils/src/branch/master/src/). It's a """simple but powerful""" script that can ~~scrape~~ download from e621 given a search query.
+If you want to accelerate your ~~scraping~~ content acquisition, consult the fetch script under [`./src/`](https://git.ecker.tech/mrq/stable-diffusion-utils/src/branch/master/src/). It's a """simple but powerful""" script that can ~~scrape~~ download from e621 given a search query.

 All you simply need to do is invoke the script with `python3 ./src/fetch.py "search query"`. For example: `python3 ./src/fetch.py "zangoose -female score:>0"`.

@ -143,7 +147,7 @@ Use the automatic pre-processing script in the web UI to flip and split your sou

 You are not required to actually run this, as this script is just a shortcut to manually renaming files and curating the tags, but it cuts the bulk work of it.

-Included in the repo under [`./src/`](https://git.coom.tech/mrq/stable-diffusion-utils/src/branch/master/src/) is a script for tagging images from e621 in the filename for later user in the web UI.
+Included in the repo under [`./src/`](https://git.ecker.tech/mrq/stable-diffusion-utils/src/branch/master/src/) is a script for tagging images from e621 in the filename for later user in the web UI.

 You can also have multiple variations of the same images, as it's useful if you're splitting an image into multiple parts. For example, the following is valid:
 ```
@ -173,7 +177,7 @@ The generalized procedure is as followed:
 ### Pre-Requisites

 There's little safety checks or error messages, so triple check you have:
-* downloaded/cloned [this repo](https://git.coom.tech/mrq/stable-diffusion-utils)
+* downloaded/cloned [this repo](https://git.ecker.tech/mrq/stable-diffusion-utils)
 * open a command prompt/terminal where you downloaded/cloned this rep
 * fill the `./images/downloaded/` folder with the images you want to use
 	- if you're manually supplying your images, make sure they retain the original filenames from e621
@ -186,6 +190,8 @@ Consult the script if you want to adjust it's behavior. I tried my best to expla

 If you're looking to train for a hypernetwork, I suggest having the script include tags for species and characters (in the script are two '`tagsOverride`', the second one is commented out, so just copy what's in the `[]`'s into the first one).

+If you're looking to set up files for LoRA training, 
+
 ### Caveats

 There's some "bugs" with the script, be it limitations with interfacing with web UI, or oversights in processing tags:
--- a/data/config/examples/preprocess.json
+++ b/data/config/examples/preprocess.json
@ -17,5 +17,6 @@
 	"removeParentheses": true,
 	"onlyIncludeModelArtists": true,
 	"reverseTags": false,
-	"tagDelimiter": ","
+	"tagDelimiter": ",",
+	"lora": false
 }
--- a/src/preprocess.js
+++ b/src/preprocess.js
@ -54,9 +54,11 @@ let config = {

 	reverseTags: false, // inverts sorting, prioritizing tags with little representation in the model

-	tagDelimiter: ",", // what separates each tag in the filename, web UI will accept comma separated filenames
+	tagDelimiter: ", ", // what separates each tag in the filename, web UI will accept comma separated filenames

 	invalidCharacters: "\\/:*?\"<>|", // characters that can't go in a filename
+
+	lora: false, // set to true to enable outputting for LoRA training
 }

 // import source
@ -97,6 +99,13 @@ args.shift();
 if ( args[0] ) config.input = args[0];
 if ( args[1] ) config.output = args[1];

+if ( config.lora ) {
+	config.filenameLimit = 0;
+	if ( config.tagDelimiter.length == 1 ) {
+		config.tagDelimiter += " ";
+	}
+}
+
 for ( let k in {"input":null, "output":null} ) {
 	try {
 		if ( !FS.lstatSync(config[k]).isDirectory() ) {
@ -197,7 +206,7 @@ let parse = async () => {
 		let jointmp = "";
 		let filtered = [];
 		for ( let i in tags ) {
-			if ( (jointmp + config.tagDelimiter + tags[i]).length > config.filenameLimit ) break;
+			if ( config.filenameLimit && (jointmp + config.tagDelimiter + tags[i]).length > config.filenameLimit ) break;
 			jointmp += config.tagDelimiter + tags[i];
 			if ( config.removeParentheses ) 
 				tags[i] = tags[i].replace(/\(.+?\)$/, "").trim()
@ -205,8 +214,12 @@ let parse = async () => {
 		}
 		let joined = filtered.join(config.tagDelimiter)

-		// NOOOOOO YOU'RE SUPPOSE TO DO IT ASYNCHRONOUSLY SO IT'S NOT BLOCKING
-		FS.copyFileSync(`${config.input}/${file}`, `${config.output}/${file.replace(md5, joined).trim()}`)
+		if ( config.lora ) { 
+			FS.copyFileSync(`${config.input}/${file}`, `${config.output}/${file.replace(md5, i).trim()}`)
+		} else {
+			FS.copyFileSync(`${config.input}/${file}`, `${config.output}/${file.replace(md5, joined).trim()}`)
+		}
+		FS.writeFileSync(`${config.output}/${i}.txt`, joined)

 		if ( rateLimit && config.rateLimit ) await new Promise( (resolve) => {
 			setTimeout(resolve, config.rateLimit)
--- a/src/preprocess.py
+++ b/src/preprocess.py
@ -62,6 +62,8 @@ config = {
 	'tagDelimiter': ",", # what separates each tag in the filename, web UI will accept comma separated filenames

 	'invalidCharacters': "\\/:*?\"<>|", # characters that can't go in a filename
+
+	'lora': True, # set to true to enable outputting for LoRA training 
 }

 if os.path.exists(config['source']):
@ -93,12 +95,18 @@ try:
 except:
 	pass

+if config['lora']:
+	config['filenameLimit'] = 0
+	if len(config['tagDelimiter']) == 1:
+		config['tagDelimiter'] = config['tagDelimiter'] + " ";
+
 def parse():
 	global config, cache
 	files = []
 	for file in os.listdir(config['input']):
 		files.append(file)
 	for i in range(len(files)):
+		index = i
 		file = files[i]
 		# try filenames like "83737b5e961b594c26e8feaed301e7a5 (1).jpg" (duplicated copies from a file manager)
 		md5 = re.match(r"^([a-f0-9]{32})", file)
@ -185,7 +193,7 @@ def parse():
 		jointmp = ""
 		filtered = []
 		for i in tags:
-			if len(jointmp + config['tagDelimiter'] + i) > config['filenameLimit']:
+			if config['filenameLimit'] > 0 and len(jointmp + config['tagDelimiter'] + i) > config['filenameLimit']:
 				break
 			jointmp += config['tagDelimiter'] + i
 			if config['removeParentheses']:
@ -193,7 +201,12 @@ def parse():
 			filtered.append(i)
 		joined = config['tagDelimiter'].join(filtered)

-		shutil.copy(os.path.join(config['input'], file), os.path.join(config['output'], file.replace(md5, joined).strip()))
+		if config['lora']:
+			shutil.copy(os.path.join(config['input'], file), os.path.join(config['output'], file.replace(md5, f'{index}').strip()))
+			with open(os.path.join(config['output'], f"{index}.txt"), 'wb') as f:
+				f.write(joined.encode('utf-8'))
+		else:
+			shutil.copy(os.path.join(config['input'], file), os.path.join(config['output'], file.replace(md5, joined).strip()))

 		if rateLimit and config['rateLimit']:
 			time.sleep(config['rateLimit'] / 1000.0)