added MORE notes to hypernetwork training

2022-10-14 14:23:25 +00:00 · 2022-10-14 14:23:25 +00:00 · a4bbda9425
commit a4bbda9425
parent c8475149f9
3 changed files with 49 additions and 40 deletions
--- a/README.md
+++ b/README.md
@ -8,7 +8,7 @@ An up-to-date repo with all the necessary files can be found [here](https://git.

 This guide has been stitched together the past few days with different trains of thoughts as I learn the ins and outs of effectively training concepts. Please keep this in mind if the guide seems to shift a bit, or sound confusing, or feels like it's covering unncessary topics. I intend to do a clean rewrite to make things more to-the-point yet concise.

-Unlike any guide for getting Voldy's Web UI up, a good majority of this guide is focused on getting the right content, and feeding it the right content.
+Unlike any guide for getting Voldy's Web UI up, a good majority of this guide is focused on getting the right content, and feeding it the right content, rather than running commands.

 ## Assumptions

@ -42,12 +42,25 @@ This guide also aims to try and document the best way to go about training a hyp

 ### Are Hypernetworks Right For Me?

-Hypernetworks are a different flavor of extending models. Where Textual Inversion will train the best concepts to use during generation, hypernetworks will re-tune the outer layers of a model to better fit what you want. However, hypernetworks aren't a magic bullet to replace textual inversion. Below are some pros and cons between the two:
+Hypernetworks are a different flavor of extending models. Where Textual Inversion will train the best concepts to use during generation, hypernetworks will re-tune the outer layers of a model to better fit what you want. However, hypernetworks aren't a magic bullet to replace textual inversion. I propose a short answer and a long answer:
+
+#### Short Answer
+
+For most of you, these simple questions should guide you on what to use.
+* Are you looking to re-tune keywords to better reflect what you want? Use a hypernetwork.
+* Are you looking to train an artist? A hypernetwork *might* work better, but an embedding definitely gives results in my experience.
+* Anything else, like a character, concept, etc.? Use an embedding
+
+Hypernetworks *can* be used for anything, but, unless you specifically are looking to re-tune keywords, just use an embedding.
+
+#### Long Answer
+
+If you're not satisfied with such a short query, I present some pros and cons between the two below:

 * Embedding (Textual Inversion):
 	- Pros:
 		+ trained embeddings are very small, the file can even be embedded in the outputs that use them
-		+ excel really well at concepts you can represent in a prompt
+		+ excels really well at concepts you can represent in a prompt
 		+ easy to use, just put the keyword in the prompt you named it to, and when you don't want it, don't include it
 		+ *can* be used with little training if your concept is pretty simple
 		+ *can* be used in other models that it wasn't necessarily trained on
@ -80,7 +93,7 @@ Hypernetworks are a different flavor of extending models. Where Textual Inversio
 		+ very xenophobic to other models, as the weights greatly depend on the rest of the model
 		+ doesn't seem to solve the issue any better of embeddings failing to represent hard-to-describe concepts

-If you're still unsure, just stick with Textual Embeds for now.
+If you're still unsure, just stick with Textual Embeds for now. Despite the *apparent* upsides in training performance compared to an embedding, until better learning rates are found, I can't bring myself to suggest it.

 ## Acquiring Source Material

@ -155,6 +168,8 @@ There's little safety checks or error messages, so triple check you have:

 Consult the script if you want to adjust it's behavior. I tried my best to explain which each one does, and make it easy to edit them.

+If you're looking to train for a hypernetwork, I suggest having the script include tags for species and characters (in the script are two '`tagsOverride`', the second one is commented out, so just copy what's in the `[]`'s into the first one).
+
 ### Caveats

 There's some "bugs" with the script, be it limitations with interfacing with web UI, or oversights in processing tags:
@ -175,6 +190,7 @@ or for the pedantic:
 ```
 uploaded on e621, [filewords], [name]
 ```
+or for any other non-yiffy model, just remove the `uploaded on e621` part. It's a cope and a spook to believe it actually does have any bearing on training quality, but the yiffy model apparently seems to have been trained using a format like `uploaded on e621, by {artist}, {tags sorted alphabetically}`.

 I've had decent results with just that for training subjects with the first one. I imagine the second one being more pedantic can help too, but places your training token at the very end. It's a bit *more* correct, as I can rarely ever actually have my trained token in the early part of the prompt without it compromising other elements.

@ -209,11 +225,11 @@ I'm not quite clear on the differences by including the `by`, but the yiffy mode

 ## Preparing for Training

-Now that everything is set up, it's time to start training. For systems with "enough" VRAM (I don't have a number on what is adequate), you're free to run the web UI with `--no-half --precision full` (whatever "adequate entails"). You'll take a very slight performance hit, but quality improves barely enough I was able to notice. The Xformers feature seems to get disabled during training, but appears to make preview generations faster? So don't worry about getting xformers configured.
+Now that everything is set up, it's time to start training. For systems with "enough" VRAM (I don't have a number on what is adequate), you're free to run the web UI with `--no-half --precision full` (whatever "adequate" entails). You'll take a very slight performance hit, but quality improves barely enough I was able to notice. The Xformers feature seems to get disabled during training, but appears to make preview generations faster? So don't worry about getting xformers configured.

 Make sure you're using the correct model you want to train against, as training uses the currently selected model.

-**!**NOTE**!**: If you're using a `Filename regex`, make sure to go into the Settings tab, find the `Training` section, then under `Filename join string`, set it to `, `, as this will keep your training prompts comma separated. This doesn't make *too* big of a difference, but it's another step for correctness. This is not relevant if you left the `Filename regex` blank.
+**!**NOTE**!**: If you're using a `Filename regex`, make sure to go into the Settings tab, find the `Training` section, then under `Filename join string`, set it to `, `, as this will keep your training prompts comma separated. This doesn't make *too* big of a difference, but it's another step for correctness. This is not relevant if you left the `Filename regex` blank. If you aren't, it doesn't hurt to do this too.

 Run the Web UI, and click the `Training` sub-tab.

@ -225,7 +241,7 @@ Create your embedding to train on by providing the following under the `Create e
 * a name
 	- can be changed later, it's just the filename, and the way to access your embedding in prompts
 * the initialization text
-	- can be left \*
+	- can be left as `*`
 	- it's only relevant for the very beginning training
 	- for embeds with zero training, it's effectively the same as the initialization text. For example, you can create embeds for shortcut keywords to other keywords. (The original documentation used this to """diversify""" doctors with a shortcut keyword)
 * vectors per token
@ -268,7 +284,25 @@ I ***highly*** suggest waiting for more developments around training hypernetwor

 The very core concepts are the same for training one, with the main difference being the learning rate is very, very sensitive, and needs to be reduced as more steps are ran. I've seen my hypernetworks quickly dip into some incoherent noise, and I've seen some slowly turn into some schizo's dream where the backgrounds and edges are noisy.

-The official documentation lazily suggests a learning rate of either `0.000005` or `0.0000005`, but I find it to be inadequate. For the mean time, I suggest using `0.000000025` to get started if you're fine babysitting or if you're overcautious, use `0.000005:2500,0.0000025:20000,0.0000001:30000,0.000000075:-1`. I find this value to be too slow, but appears to wrangle it in to have it somewhat-comparable to Textual Inversion's training progression in the long run.
+The official documentation lazily suggests a learning rate of either `0.000005` or `0.0000005`, but I find it to be inadequate and very prone to frying your hypernetwork. For the mean time, I'll suggest `0.000005:1000,0.0000025:10000,0.00000075:20000,0.0000005:30000,0.00000025:-1` for the ranges of learning rates to use:
+* from 0 steps to 1000 steps, use a learning rate of `0.000005`
+* from 1001 steps to 10000 steps, use a learning rate of `0.0000025`
+* from 10001 steps to 20000 steps, use a learning rate of `0.00000075`
+* from 20001 steps to 30000 steps, use a learning rate of `0.0000005`
+* from 30001 steps on, use a learning rate of `0.00000025`.
+These values definitely need to be better tuned, as I'm still not sure if they can be bumped up higher without incurring any penalties.
+
+If you don't mind babysitting the training throughout the learning process, you can:
+* start with `0.000005`
+* have it save a copy every 500 or so steps
+	- use a lower number (saves more frequently) if your GPU is on the weaker side / you can afford the disk space
+	- use a higher number (saves less frequently) if your GPU is on the stronger side / you can't afford the disk space
+* when the quality starts to drop, revert to the best copy
+* reduce the learning rate
+* repeat
+It would be fantastic if the web UI will automatically do this based on some heuristics from the loss value, or some heuristics on epoch, but sadly things aren't just that peachy.
+
+The same principle can be applied to running Textual Inversion on an embedding, but with higher rates of course.

 #### Caveats

@ -276,7 +310,7 @@ Please, please, ***please*** be aware that training a hypernetwork also uses any

 You're free to use a embedding in your hypernetwork training, but some caveats I've noticed:
 * any image generation without your embedding will get terrible output
-* using a hypernetwork + embedding of the same concept doesn't seem to give very much of a difference, although my test was with a embedding I didn't have very great results from anyways
+* using a hypernetwork + embedding of the same concept definitely provides some boost to your output, but appears to significantly reduce variety in your prompts, although I only briefly tested this on a (seemingly) well trained hypernetwork
 * if you wish to share your hypernetwork, and you in fact did train it with an embedding, it's important the very same embedding is included
 * like embeddings, hypernetworks are still bound to the model you trained against. unlike an embedding, using this on a different model will absolutely not work.

@ -353,29 +387,4 @@ Textual Inversion embeddings serve as mini-"models" to extend a current one. Whe

 Contrarily, hypernetworks are another variation of extending the model with another mini-"model". They apply to the last outer layers as a whole, allowing you to effectively re-tune the model. They effectively will modify what comes out of the prompt and into the image, effectively amplifying/modifying their effects. This is evident through:
 * using a verbose prompt with one enabled, your output will have more detail in what you prompted
-* in the context of NovelAI, you're still somewhat required to prompt what you want, but the associated hypernetwork will strongly bring about what you want.
-
-### Hiccups With Assessing Training A Hypernetwork
-
-I don't have a concrete way of getting consistent training results with Hypernetworks at the moment. Most of the headache seems to be from:
-* working around a very sensitive learning rate, and finding the sweet spot between "too high, it'll fry" and "too low, it's so slow"
-* figuring out what exactly is the best way to try and train it, and the best *thing* to train it on, such as:
-	- should I train it with tags like I do for Textual Inversion (the character + descriptor tags), or use more generalized tags (like all the various species, very generic tags like anthro male)
-	- should I train it the same as my best embedding of a character, to try and draw comparisons between the two?
-	- should I train it on a character/art style I had a rough time getting accurate results from, to see if it's better suited for it?
-		+ given the preview training output at ~52k iterations w/ 184 images, I found it to not have any advantages over a regular Textual Inversion embedding
-	- should I train it on a broader concept, like a series of characters or a specific tag (fetish), to go ahead and recommend it quicker for anyone interested in it, then train to draw conclusions of the above after?
-		+ given the preview training output at ~175k iterations w/ 9322 images, I found it to be *getting there* in looking like the eight or so characters I'm group batching for a "series of characters", but this doesn't really seem to be the way to go.
-		+ as for training it on a specific tag (fetish), I'd have to figure out which one I'd want to train it on, as I don't necessarily have any specific fetishes (at least, any that would be susbtantial to train against)
-* it takes a long, long time to get to ~150k iterations, the sweet spot I found Textual Inversions to sit at. I feel it's better to just take the extra half hour to keep training it rather than waste it fiddling with the output.
-
-There doesn't seem to be a good resource for the less narrower concepts like the above.
-A rentry I found for hypernetwork training in the /g/ thread is low quality.
-The other resources seems to be "lol go to the training discord".
-The discussion on it on the Web UI github is pretty much just:
-* *"I want to to face transfers onto Tom Cruise / a woman / some other thing"*
-* *"habibi i want this art style please sir help"*
-* dead end discussion about learning rates
-* hopeless conjecture about how quick it is to get decent results, but it failing to actually apply to anything for e621-related applications
-
-I doubt anyone else can really give some pointers in the right direction, so I have to bang my head against the wall to figure the best path, as I feel if it works for even me, it'll work for (You).
+* in the context of NovelAI, you're still somewhat required to prompt what you want, but the associated hypernetwork will strongly bring about what you want.
--- a/src/preprocess.js
+++ b/src/preprocess.js
@ -35,9 +35,9 @@ let config = {
 	// if you're cautious (paranoid), include species you want, but I found I don't really even need to include specis
 	// you can also include character names / series names if you're using this for hypernetworks
 	// you can also use this to boost a tag already defined to max priority
-	tagsOverride: ["character", "copyright"], // useful for hypernetwork training
+
+	tagsOverride: [],
 //	tagsOverride: ["character", "species", "copyright"], // useful for hypernetwork training
-	tagsOverrideCategories: true, // override categories
 	tagsOverrideStart: 1000000, // starting score that your overriden tags will start from, for sorting purposes

 	// tags to always include in the list
@ -52,7 +52,7 @@ let config = {

 	reverseTags: false, // inverts sorting, prioritizing tags with little representation in the model

-	tagDelimiter: ",", // what separates each tag in the filename, web UI will accept comma separated filenames, but will insert it without commas
+	tagDelimiter: ",", // what separates each tag in the filename, web UI will accept comma separated filenames
 }

 let csv = FS.readFileSync(config.tags)
--- a/src/preprocess.py
+++ b/src/preprocess.py
@ -41,7 +41,7 @@ config = {
 	# treat these tags as already being included in the 
 	# if you're cautious (paranoid), include species you want, but I found I don't really even need to include specis
 	# you can also include character names / series names if you're using this for hypernetworks
-	'tagsOverride': ["character", "copyright"], # useful for textual inversion training
+	'tagsOverride': [],
 #	'tagsOverride': ["character", "species", "copyright"], # useful for hypernetwork training
 	'tagsOverrideStart': 1000000, # starting score that your overriden tags will start from, for sorting purposes

@ -57,7 +57,7 @@ config = {

 	'reverseTags': False, # inverts sorting, prioritizing tags with little representation in the model

-	'tagDelimiter': ",", # what separates each tag in the filename, web UI will accept comma separated filenames, but will insert it without commas
+	'tagDelimiter': ",", # what separates each tag in the filename, web UI will accept comma separated filenames
 }

 with open(config['tags'], 'rb') as f: