[ home / bans / all ] [ qa / jp ] [ spg ] [ f / ec ] [ b / poll ] [ tv / bann ] [ toggle-new / tab ]

/qa/ - Questions and Answers

Questions and Answers about QA

New Reply

Whitelist Token
Password (For file deletion.)
Markup tags exist for bold, itallics, header, spoiler etc. as listed in " [options] > View Formatting "

[Return] [Bottom] [Catalog]

File:00353-2700800976-girl, fac….png (365.08 KB,512x512)

 No.96625[Last50 Posts]

Anyone else been messing around with the stable diffusion algorithm or anything in a similar vein?
It's a bit hard to make it do exactly what you want but if you're extremely descriptive in the prompt or just use a couple words it gives some pretty good results. It seems to struggle a lot with appendages but faces come out surprisingly well most of the time.

Aside from having a 3070 i just followed this guide I found on /g/ https://rentry.org/voldy to get things setup and it was pretty painless.


File:[MoyaiSubs] Mewkledreamy -….jpg (217.55 KB,1920x1080)

Ah, yeah, I've been reading up on it. I downloaded some 7GB danbooru thing for it. I wouldn't trust /g/ with an Etch-a-Sketch so I won't follow a guide from there, but I've saved some links from other places:

I'll get to trying this eventually, but so far I've just been procrastinating a bunch since I need to install and run python and do other stuff I don't understand. My VRAM is also only 6GB and I'm not sure if that's enough.


File:00147-149750438-megumin, k….png (254.67 KB,512x512)

I'm not too concerned with the theory at the moment and more just wanted to know what practically has to be done to get it running. That guide more or less amounts to downloading some git repo, a model (this is the sketchiest part but you already did it), and download python 3.10.6. Then run a bat file and it works. From what I can tell the web-ui allocates 4gb of vram as a default and you'll have to pass arguments to get it to run more or less otherwise. It should run with an nvidia card that has 6gb.

That Krita plugin looks interesting, will check it out later.


File:62b580de9b13374d1a11e690fe….png (816.09 KB,1075x1500)

The face looks very Korean, I wonder if it's because the archetype is very common so AI is probably trained on a lot of it.


File:grid-0052.png (1.69 MB,1024x1024)

here are the other faces from that batch
the model i'm using is supposedly trained on a set of images from danbooru, not sure why it'd look korean specifically other than chance


I have exactly 0 (zero) interest in AI art. I have not saved a single file from one of them to this day even. I wouldn't call it being a hater, but they really are just fundamentally unappealing to me.


I don't get what all the fuzz is about either. If you've seen one image, you've seen them all. They all have this weird quality to them. Maybe it's that there's absolutely nothing meaningful about them. Doesn't help that most of these images look like bad crops.


File:spaghetti.png (742.16 KB,576x704)

I'm in favor of it as long as the results resemble 2D ideals.


why is she stuffing her boobs with spaghetti....


Ever wondered why girls smell so good? This is why.


File:patchy2.png (2.86 MB,2200x1536)

img2img is neat


dat' polydactyly
wow, so even AI has trouble drawing hands


File:grid-0102.png (5.12 MB,2048x2048)

surprised by how well this batch turned out
some of these could pass for a mediocre artist's work


File:download (16).png (557.43 KB,512x768)

what a big girl


>so even AI has trouble drawing hands
Yeah, it must be related to how the algorithm copies things it gets confused and can't do hands. With faces the parts have general locations and you can meld shapes a bit, but with hands it's trying to copy a bunch of different positions and angles into one and it breaks. Anime faces might be one of the best things since they don't even make sense to begin with in regards to angles.


File:1663263321-Beautiful waifu….jpg (39.5 KB,512x512)

I just steal other peoples prompts and add waifu.
Also if anyone else is on AMD on Windows, I followed this guide and it works https://rentry.org/ayymd-stable-diffustion-v1_4-guide.
Also Also if anyone can help me figure out how to change output resolution, that would be swell.


Yeah, I've been somewhat surprised by the quality of the more 3DCG drawings I've seen from it, but when it comes to more anime style the AI falls short. There's probably more subtleties that it can't pick up in batch because of differences in artist styles that causes these amateur-level drawings.


File:00080-1017444043-full body….png (502.04 KB,512x768)

I've been trying to create my Pathfinder character with it. I think this is the closest I've gotten, but it's still not there yet. I feel like I'm close, though...


Alright, I'm diving in. Might take a while to get stuff set up and figure out what I'm doing, however.


File:a.png (357.08 KB,512x512)

Making some progress...


File:b.png (368.9 KB,512x512)


File:index.png (Spoiler Image,4.25 MB,2048x1536)

Ehh, so many of these are horrifying so I'm going to put them behind a spoiler. I think I'm going to try that thing tomorrow where you can selectively "refresh" parts of the image


File:aaaaaa.png (343.09 KB,512x512)

oh no, this wasn't what I wanted at all!


File:hehe.png (372.77 KB,512x512)

I need to download the base model, this danbooru one isn't working the best for, well, non-"anime" stuff


Has science gone too far?


From AI i've used myself these arent so bad


Is that an anthropomorphic "furry" Koruri?


File:waterfox_ZRSVcPMhoC copy.png (550.51 KB,953x1039)

Okay, what the heck. There's this "textual inversion" thing which is a chuuni way of saying "custom trained models" and there's a few hundred shared ones for you to look at and download.
But, uhh...
Okay, the first one is an interesting find. Second one makes me think "okay maybe this isn't a coincidence" and third is "okay someone on a spinoff is involved with this".
There's like 500 of these total, mostly generic pop culture stuff, but these three REALLY stick out.


File:00410-493637731-bird sitti….png (850.78 KB,768x1024)

Perhaps unsurprisingly, someone released a furry model trained on e621 and it's able to do penises and sexual poses that the other databases can't. I think I'll make a thread on /secret/ for posting my experiments with it because porn tends to derail things.
Also, uhh... be very wary of trying stuff on the default model. It's trained on images of real people and I think there's going to be some legal challenges in the future.

Anyway, give me some prompts and/or images and I can mess with them if you don't want to configure this thing yourself. I have 3 models- a hybrid default/danbooru one, a pure danbooru one, and the aforementioned furry one. But, I'm really bad at it and still need to learn how stuff works. I tried to turn furry patchy into a bird but now she's a human.


File:00485-622766441-cats.png (618.9 KB,896x512)

this is pretty good, and it'd only get better if I had the patience to run it more times


File:grid-0087.jpg (1.07 MB,3584x2048)

Tsugu and Hagi didn't survive most of the attempts


File:935116e8b74134e41664779a19….png (Spoiler Image,278.45 KB,640x640)

AI generated Raymoo titties (NSFW)



the rendering and shape is good, but it's still making mistakes. Just that it's focusing on something simple so the mistakes are better disguised


did you use the prompts from stable diffusion to make that?


I didn't make it. I got it from the stable diffusion thread on 4/h/. I've been lurking it for a few days because it's a lot slower than the /g/ one and seems to have more technical discussion.

I just wanted to share it because I thought it was a pretty good generation.


There's a new model called Hentai Diffusion that was trained on Waifu (ugh) Diffusion and 150k danbooru/r34 images. I guess it'd be better at nudity?

You might need a huggingface account to download it. I have one because I was going to upload a set to train or whatever, but then I saw that there doesn't seem any way to use their GPUs without making it public and they have rules against nudity and I also wouldn't want to upload an artist's work for others to exploit for real instead of making stupid things on kissu.
Wish I had more VRAM. Oh well.


File:01040-248630214-girl's las….png (454.55 KB,512x512)

yeah. it's spooky.


File:01442-641596059-aria, aman….png (584.73 KB,512x512)

it's a lot harder to rationalize 'soul' and a human touch when you like results that are entirely mechanically hallucinated. maybe this means that art is more useful to understand an author than anything else.


I've seen AI that write code and this reminds me of some of the shortcomings people had with it.

While they were trained on a large database, it would often be the case that the AI was technically copying programmers from stack overflow and using the raw input information into people's software.

I feel like it's almost the same case here. It took chunks from every artist it saw creating essentially a collage with little creative problem-solving of it's own... and when it does it's simply a confused error rather than inference.

I was much more impressed by reimu's breasts


it's almost as if machines cannot think


File:a6b328da6c4d0e2e087ea99aa2….png (305.21 KB,512x640)

I've been messing with SD since last week using https://github.com/AUTOMATIC1111/stable-diffusion-webui and the danbooru model https://thisanimedoesnotexist.ai/downloads/wd-v1-2-full-ema.ckpt
I've been doing only txt2img cause apparently I don't have enough GPU RAM for img2img (laptop GTX 1660Ti).
A couple of images turned out to be cute most are pretty bad, or maybe my prompts are bad who knows.
I've been thinking of setting up my local server to produce anime images 24/7 with some script that autogenerates prompts, not sure how its GTX 970 would handle it though.
Not messed with lewds too much for now, going to download it and try.


File:20221004_130831.jpg (45.5 KB,448x640)


File:00190-2426397090-[blue_eye….png (347.18 KB,512x512)

nee nee, look at this military qt!


really hoping that's a cute boy and not a g*rl


Yes, science has gone too far.


You will live long enough to see robotic anime meidos for domestic use and you will be happy.


File:00006-863090008-anime youn….png (490.87 KB,704x512)

It's kind of interesting to see a real artist use it. I'm assuming he did the img2img thing which uses an image as a guide since it's got his style's wide face and ludicrously erotic body proportions.
This is a good example of how generic it looks when compared to the real thing, which you can't really get around since generic is exactly how it's supposed to function. In theory people can (and will certainly try) to directly copy individual artists, but so far it's pretty bad at that.


File:00203-285684822-[blue_eyes….png (243 KB,512x512)

another cute (female)


When you really break it down, "AI" art is more or less the same thing as procedural level generation in games. The computer is provided with a set of rules, and then randomly generates something that follows those rules.

That's also why I can't see it outright replacing artists like a lot of people are afraid (or if they're psychopaths, hopeful) it will. You can generate all of the level design for your game procedurally, and a lot of games do (minecraft, for example). But "level designer" still exists as a profession for a reason.


File:grid-0024.png (2.92 MB,2048x2560)

the many faces of /qa/-tan


kinda neat


File:grid-0077.png (5.09 MB,2304x2048)

Well, the thread is mostly to post stupid AI things, but only a couple people are doing it and it's annoying to run for me personally because it interferes with videos or 3D programs I have open most of the time.
Also I was mostly just testing how well it is at doing penises and the answer is that the furry one is passable, even on humans, but I won't derail the thread with porn


File:00171-2295644611-bird gba-….png (1.99 MB,1024x1024)

The "GBA Pokemon" embedding thing really isn't working for me. None of them are. I think you're supposed to use the exact model for them, but I'm not going to download a bunch of those since they're like 3-8gb each.


File:dumb arguing.gif (1.33 MB,1280x720)

I completely agree with this cute anon.


File:[mottoj] Tsukuyomi Moon Ph….jpg (109.55 KB,1024x576)

The serious discussion in this thread is being moved to a separate thread that is soon to be made. Brace for impact


File:grid-0000.png (1.82 MB,1920x1024)

Here is Laura that had an AI background generated after I masked it out. It was a blank white background before


NovelAI's model has been leaked. hehehe. Meaning you can do it offline without paying them.
It's 52gb with multiple models, and I doubt I'll be impressed but I'm torrenting it anyway.


Can you post the link?



Thanks, adding it to the hoard.


But someone is replying about 'python pickles' and I have no idea what that entails. I guess he's telling people that it could contain a virus or something or otherwise have code in it? There's this link but I have no idea what it means: https://rentry.org/safeunpickle
Does anyone here know python and can tell what the thing above does? They made it sound like it's something to use to check for malicious stuff or maybe I interpreted it wrong.
But, people on 4chan are already using this so I think it's safe


File:1632939770819.png (495.82 KB,1024x1024)

Pickle is a data serialization library: https://docs.python.org/3/library/pickle.html
Serialization means turning in memory data like objects into a format that can be stored on disk or sent through a network. JSON is another common serialization format.
I don't use pickle much but unlike JSON which is plaintext, pickle is binary so when you deserialize it yes it's possible that arbitrary code hidden in the data can be executed.
After a quick glance it looks like that code overrides some of the functions described in https://docs.python.org/3/library/pickle.html#pickle.Unpickler
The overridden "def find_class(self, module, name)" seems to implement some kind of whitelist so that only certain kinds of data(I guess considered safe).
I can't guarantee that code actually protects against possible code execution though, if I were you I would download it if you care but wait some time before executing it and see what happens.


All the AI talk is hurting my no-knowledge-on-AI brain. Apparently there's going to be a part
2 to the leak, can't keep up with /g/ but am happy to download/seed it though.


anon created a guide, probably 100% the real deal now


File:00197-3287249658-((([Remil….png (391.67 KB,768x768)

To go with the story of me playing games with Remilia in that Character AI thread.
This is Remi's gamer pose


File:test.png (320.86 KB,384x640)

I want to kissu her!


File:00002-4009721508-huge_brea….png (Spoiler Image,235.25 KB,512x512)

I also made a loli with pink hair... I gotta get the GPU stuff set up though, this took me 10 minutes and is obviously far from the bleeding edge of this stuff.


File:00027-1587389607-large_bre….png (208.6 KB,512x512)

Got another nice one.


File:20221008_181546.jpg (61.52 KB,512x768)



Interesting how this works.


my wife chino is ballin'


I swear to you guys I was arguing on another corner of the internet that I'm not interested in AI because it couldn't create art in the style of a particular artist, and the artist I was referring to was literally Zankuro in specific, yet here we are. I was crushingly naive. I wonder how far off we are from it making lewd gifs in Zankuro's chubby loli style...


File:00967-1113753283-1girl, ((….png (490.36 KB,576x576)

Utawarerumono riding a banana


File:00980-2759234159-1girl, ((….png (481.14 KB,576x576)

Unsurprisingly it fails to capture Kuon's beauty, although I don't know how to do the tagging with this for Kuon_(Utawarerumono) so I took a guess from what I think I remember seeing.
This one came pretty close to getting her face I think. But, I need to do a thing where I train it.
This is something I/we need to read up on that apparently is a big deal: https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/2284


File:grid-0134.png (2.23 MB,1344x1152)

Wow, embeddings are strong. There's an embedding someone made on 4/h/ for abmono who is quite a well-regarded Miku artist.
This is a combination of abmono and Wakasagihime. Even though I didn't name Miku, abmono images will impart Miku's outfit. Pretty crazy.


Do you mean abmayo?


File:01207-107125510-(masterpie….png (351.08 KB,448x576)

errr yeah, stupid brain


it works because mayonnaise is a a もの, so it's all cool


File:1665622868689087.webm (Spoiler Image,2.87 MB,512x640)

People are making animations with it somehow using a script. (webm contains nudity)

Copied /h/ post:

you can already make a video
this was one an anon made
afaik the keyframes were something like
Time (s) | Desnoise | Zoom (/s) | X Shift (pix/s) | Y shift (pix/s) | Positive Prompts | Negative Prompts | Seed
0 | 0.6 | 0.9 | 0 | 0 | sleeping in bed, under sheet | | -1
2 | 0.6 | 0.9 | 0 | 0 | shower, washing self, naked, from above | |-1
4 | 0.6 | 0.9 | 0 | 0 | eating breakfast, dressing gown| |-1
6 | 0.6 | 0.9 | 0 | 0 | Sitting on a bus, uniform, from below | |-1
8 | 0.6 | 0.9 | 0 | 0 | on stage, bikini, tattoos, singing, full theatre, bright lights, microphone | |-1
10 | 0.6 | 0.9 | 0 | 0 | drinking at a bar, cocktails, black dress, cleavage, earrings, drunk, flirty | |-1
12 | 0.6 | 0.9 | 0 | 0 | bed, (doggystyle sex:1.3), pubic hair, 1girl, 1boy | |-1
14 | 0.6 | 0.9 | 0 | 0 | passed out in bed, under sheet | |-1

I couldn't open the webm in waterfox or firefox, but it worked with brave and mpc


File:01461-2527489783-masterpie….png (304.43 KB,512x512)

This doesn't look like an abmayo Kuon at all, but I like it. Very yukkuri.


File:grid-0163.jpg (526.52 KB,2048x2048)

Remember Eruruu? This is what she looks like now (apparently)


The miku virus infects all...


File:grid-0249.png (2.92 MB,1536x1536)

Ehh.... in theory we could make a Koruri embedding thing, but I think that'd be disrespectful so even if I had the VRAM I wouldn't do it.


File:grid-0258.png (3.18 MB,1536x1536)

Sometime in the bleak future, a cybernetic Tenshi eats a corndog
(some of the results of these tags are pretty disturbing, while some like the lower left here are pretty damn cool even if they aren't really showing what I wanted)
I can really see how you could use this for ideas as an artist or modeler or anyone else in a creative field


File:02436-1162997752-masterpie….png (438.45 KB,512x512)

Good way to find embeddings on 4chan: https://find.4chan.org/?q=.pt (Warning: NSFW thumbnails images are likely)
(They're pt files)
Just grabbed a ZUN one that I'll try later


gonna go learn how this all actually works


File:102281918_p0.png (782.64 KB,1000x1080)

There's a fad that started with an AI generation thing with a glowing penis that has artists imitating it. Kind of meta.

From one of my favorite random creative artists (and maker of that one furry Patchy)


File:102251504_p0.jpg (317.4 KB,617x617)

Heh, some of these are pretty creative. Also where's the original image that inspired it?



that's just a bioluminescent mushroom dude


File:Fget6dGaAAA7FX_.jpg (356.97 KB,2100x2160)

The sperm of the sea


File:no this is NOT Laura, I sw….png (Spoiler Image,717.24 KB,576x768)

I wonder if people are just really bad at thinking up concepts for computer-generated porn or if their tastes are really so plain and boring. From what I've seen on imageboards, and I don't mean to sound too conceited, I'm clearly in the upper echelons at throwing together these amalgamations of theft and I don't even have a great GPU to create the custom models. It seems so easy to me so I don't understand why everything is so ugly and generic when I see what other people are doing.
It makes me think this "AI revolution" is crippled from the beginning because it still requires human input and people have no motivation or direction. It's like when you show a bunch of complex castles and cities built in Terraria or Minecraft, but then you see how most people play and it's simple square prison blocks made from dirt.
Well, I think the crisis is averted because people are still dull-witted and boring even with such a thing at their fingertips.
Quite an addicting thing, however, and I need to pry myself away to work on actual creative pursuits (and I'm saving images and concepts to use for inspiration so that part is actually true)


The anatomy is quite weird and of putting but I guess most Ai images are like that.
I never tried it, maybe I will.


File:dancer.png (3.44 MB,1024x1536)

Part of the problem is that niche topics are inherently hard to generate, since there isn't enough training data for them to turn out good. I am also somewhat put off attempting anything overly complicated (especially stuff with multiple characters), because the more complex the image, the more opportunities there are for bad anatomy/etc. to show up, and without any easy way to selectively fix those elements, I'd usually prefer to generate something basic done well than something more interesting that has mangled hands or whatever. What I do really wish though is that people experimented more with the style-altering options - many of those tags are well-enough populated to work great, and there is no added difficulty in using them (in fact, ones like greyscale even make it simpler), but they can go a long way in avoiding the generic AI-art look.


Yeah it's not perfect, but the thing with porn, especially niche stuff that doesn't otherwise exist, is that the brain overlooks it due to the excitement and stimulation over the rest. It's like your choice is a handful of doodles from some guy from 2008 or this thing creating new amalgamations of fetish fuel with errors. Most people have no reason to use this for porn, really, since it's easily inferior to something created by hand. But if that stuff made by hand doesn't exist? Yeah...


File:ZZX 0229.jpeg (Spoiler Image,390.19 KB,2892x4400)

It's not that niche, though the musculature of this image looks familiar(though done badly due to Ai). I wonder if genres with a smaller selection for the AI to draw from like futa will end up creating images that lean more heavily on one individuals unique style than perhaps others would. It's not just abs but this pen*s looks very similar as well, in fact it looks like the ai has taken it and recoloured it and it's part of why the image looks weird, the pen*s was taken from an image where the body is positioned differently.


File:00318-262339623-(masterpie….png (Spoiler Image,627.37 KB,576x768)

Oh, I'm quite aware dickgirls haven't been niche for like 15 years. The fact that it's ubiquitous is also why any simple image doesn't work, it's no longer manna from the heavens by virtue of existing. Find me some quality newhalf mermaid art with a human penis instead of some weird "realistic" furry dolphin version. Also, give her a nice soft belly, a mature face, a warm smile and an apron. Also it's Takane from [email protected], a girl that shares the face of the first 2D girl I had a crush on (since Luna is too old/obscure to have training data). Here's one I just generated, although it has some pretty noticeable errors.

People have fantasies more elaborate than "a girl with breasts of any size, preferably alive" and it's not any different in my situation just because a penis is involved.


File:xy_grid-0150-3577515976-(m….png (2.58 MB,2304x950)

I'm going to start dumping info and stuff in this thread, although I think most visual experiments will be posted on /megu/ since I'm mostly into this stuff for niche ero.
Someone asked how you could make transformation stuff, and this is how. Although, I had to ask on 4chan because I had the syntax wrong.
The syntax is [A:B:#].
A and B are the two things you want to morph over image generation.
# is the percentage of influence one has over the other, as a percentage (.1 is 10%). In my image example the left-most image is 10% angel and 90% demon girl.


File:firefox_IkHhhePva5.png (41.5 KB,762x647)

To make an image set like this you want to go down into Script and use X/Y Plot, then select Prompt S/R.
In this example I have it start with 10% angel and 90% demon and then end with 90% angel and 10% demon.
The X/Y script is a massive help in finding the ideal settings, so people use it a LOT.


File:00297-3520136844-masterpie….png (368.83 KB,640x384)

>pt files
What exactly are these? I think I heard that these are "hypernetworks" or something and that you can use them to fine tune a model, or to bias it into giving different results or something. I can't really seem to find any though? Not that I've looked very hard, I'll admit, but it seems people are far more interested in specific models than hypernetworks. Likewise, what's the deal with merged models and pruning?


File:embeds.zip (1.18 MB)

.pt files show up in a few places, but when people are talking about it and it's not troubleshooting it's about hypernetworks. Back when I made that post embeddings were the cool thing (and they also use .pt), but now it's hypernetworks. They're basically fine tuning things for a certain concept, but it's almost exclusively specific artists or characters. IE this was using the embedding that mimics abmayo >>98094
Embeddings are called by name in the prompt, whereas hypernetworks are loaded in the Settings. Embeddings are 20-80KB whereas hypernetworks are 85+MB. I personally liked embeddings a lot more not only because of the file size but because you could combine them. I guess hypernetworks are better and that's why everyone uses them?
Here's my embeds folder. Some of them were just uploaded without labels and I never figured out what they did, like the 3 named "ex_penis".
Extract the folder in the main WebUI folder so it's like:
and then you should be able to use them.
The badprompt ones is actually something newer. You put it in the negative prompts with 80% strength, I.E I use
(bad_prompt2:0.8), lowres, bad anatomy, etc


Does this only work for two tags? Or can you batch together multiple into the percentage.


Probably, but I haven't checked. I guess it'd just be A:B:C:# for 3 and so on


Interesting sort of addendum I found for doing this sort of thing:
>you can [x:y:z] / [:y:z] / [x::z] to make the ai draw x then y at stemp z (or percentage of steps if you put a decimal), which works great for stuff like [tentacle:mechanical hose:0.2] to make the ai draw tubes everywhere, or you can do x|y... to make the ai alternate between drawing x and y every other step; you can put any number of things here e.g. x|y|z|a, but obviously the more you use this the more steps you need, in general


That's exactly the post I saw that made me want to try it. I heard people mentioning this functionality weeks ago but completely forgot. It seems rare that anyone uses it, but it could be really great


When I try making one of these I get a
>RuntimeError: Prompt S/R did not find angel wings:demon girl:0.1 in prompt or negative prompt.
Does this mean I need to put the tags into the prompt somewhere? Or attach an X to them?


The first thing listed there has to be in the prompt for the rest to replace it. You should be able to hover over it for a tooltip.
masterpiece, picnic, turtle, eating banana

in the script you'd put
banana, burger, corndog


Gotta say, reading the documentation for all this stuff regarding stable diffusion has really impressed me with how much work and development has gone into making the open version as great as possible, beating out even its premium competitors.

I guess this is the true power of computer dorks trying to get the perfect porn.


File:1670036798856.jpg (233.99 KB,800x1257)

So I saw this one website making the rounds that's 2D-ifying or whatever images of real people or characters, and I have to wonder how you'd do the same with an image of your own in Stable Diffusion. Like say you wanted to draw a certain character, from and image, in the style of Asanagi maybe wearing some different clothing. How would you do that?


File:box on beach.png (104.7 KB,512x512)

You use img2img, which can itself be guided with a text prompt like txt2img so it's really more like img+txt 2 img.
As an example here is an image I drew


File:2022-12-03-18-31-45-393406….jpg (1.02 MB,2432x3143)

... and here are some variations created with the Stable Diffusion 1.5 model using the following prompt that matches the image contents:
"open cardboard box on beach, sunny day, waves crashing on shore, frothy sea, deep blue sea, photograph, daytime"


File:grid-0228-2284799521-maste….png (4.68 MB,2048x2048)

It's likely a very generic prompt that has a denoise of like .5 or something to keep the general shapes but still alter it enough to be noticeable. I saw someone point out that they look like Genshin characters, so it's probably using something trained on its images.
I have a Genshin hypernetwork for that so let's see the result when I throw some stuff in: (pic related)

I don't want to spend a bunch of time trying to replicate it, but you get the picture. It probably uses a few traditional artists tags since people have done lots of examples of those, including myself


File:2022-12-03-18-38-23-227899….jpg (1.52 MB,2048x2048)

... and if I do the same prompt and same settings as in >>100505 but without the input image, this is what I get.
The cartoony nature of my image is at odds with the Stable Diffusion model's realistic photograph style. Getting anything done with this sort of thing is probably best when it's iterative, mixing both txt2img and img2img.


File:firefox_Kc9SXtwoPZ.png (56.77 KB,506x754)

I have learned some things to make things a bit easier or cooler, although you might already know them. On the right-most part of the Settings tab:
This "show image creation process every N sampling steps" at the top of the image here is apparently what lets you see, the uhh... image creation process. I had no idea this was here since I was expecting it to be more prominent.
At the bottom of my image you'll see a text box. Replace it with this text:
sd_model_checkpoint, sd_hypernetwork, sd_hypernetwork_strength, sd_vae
And it will show those at the top of the main window so you don't need to go into the Settings tab every time you want to mess with the hypernetwork or vae. Pretty cool!


File:explorer_Kisf6EEziK.png (789.06 KB,1098x828)

Surely you knew this was coming.
I am going to begin the process of bringing beloved Kuon to this so that she can be generated as easily as a 2hu! I'm debating whether to keep it centered on her and use a variety of artists, or to go all out and restrict it to official art and use hundreds of images to try and get the Aquaplus (Amaduya) style. I'm leaning towards the latter, although again I am filled with guilt before even attempting it.
Well, at least he's already successful and famous in some circles and doing games and doujin stuff and is well appreciated so it's not like I'd be robbing him of work? Bleh. The ethical quandaries of this stuff...


Oh, after testing with this it does seem to greatly increase the time it takes to generate stuff, so maybe only use the 'image preview while generating' thing if you're unsure where to stop when working on settings, and then set it back to zero when you're actually producing a bunch of stuff.


File:02543-986531908-(masterpie….png (347.39 KB,512x512)

A lot of knowledge about this stuff requires scouring and searching or surreptitious posts, so I'll try to share some more info.
This time I'm going to talk about two Extensions that I use a lot.

The easiest way to get new Extensions is to go to the Extensions tab of the WebUI and then go to Available and hit the "Load from" button with its default URL. From there you can install stuff, which will then show up on the Installed tab. For a lot of this stuff you need to restart the UI from settings if not restart the .bat file itself.
The ones I use and can give detail on:

Dynamic Prompts: https://github.com/adieyal/sd-dynamic-prompts

This is used to randomize creations on each image generation. You can use it with new words in the prompt, but I've never done that. Instead, I mainly use this to call random effects from wildcard text files. You create a text file with a new line for each possibility put the text file in /extensions/dynamic-prompt/wildcards/text.txt and then call it from the prompt by its name with two underscores on each side. For instance you can make haircolor.txt and put this in it:

green hair,
blue hair,
red hair,

and then put __haircolor__ in your prompt and it will randomly pick one of those each time an image is generated. This means a you can make a batch of 10 images and come back to different results. This is really, really good if you're just messing around to see what works. It can also call other txt files from inside. I'll share my wildcard text files soon. It also has a "Magic Prompt" system that I've never used, but it could be cool? Beats me. Someone else do it.

TagComplete https://github.com/DominikDoom/a1111-sd-webui-tagcomplete

It autofills booru tags for stuff based on danbooru, which NAI and the 'Anything' model is. Really, really nice, but can also be annoying at times with the pop-up. Unless you have tags memorized this can help a lot. Speaking of, you should make yourself accustomed to danbooru's tags:


File:wildcards.zip (27.04 KB)

Here are my wildcard text files. Some of them I downloaded and modified, other stuff was as-is. You can get a pretty good idea of the stuff you can do with this.


File:firefox_ouJxTVYHXq.png (577.01 KB,1184x789)

This deepdanbooru thing that scans images for tags is really impressive. It's not perfect, but good lord, we could only dream of such things a few years ago, right?


this sort of thing has been possible for a few years, but without danbooru datasets used for art training it wouldn't be easy


It's already a few years old, isn't it? Here's the original Reddit thread about it, and it's been discussed on 4chan in the past as well. I recall there originally being some talk about the possibility of it actually being used for tagging, but it's not good enough to replace manual tagging anytime soon and is otherwise little more than a novelty.


Oh. Wow, it's 4 years old?
Well, anyway, it's really cool how it's used here for immediate benefit. You can use it to assist in image tagging for training, but also as a building block to generate new images.


File:firefox_r6kosW4vBR.png (44.18 KB,958x555)

So, the training setup I put together from what I read. Much of the information is from the discussion here: https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/2670
Also, thanks to people on the /h/ board of 4chan as those guys are great. Don't use /g/, but maybe that should go without saying.

Modules: (I checked all of them because it's unclear what they do. Everything was checked by default except 1024 which seems to be a new addition)

Layer Structure: 1, 1.5, 1.5, 1.5, 1. This is called a 'deep network' as opposed to default or wide. Default is good for most things, particularly if you have a low amount of images (20ish was mentioned). Wide is for specific things like an animal, character or object. Deep is for style, which most people seem to be using hypernetworks for, with embeds for characters. It doesn't have to be, but that seems to be the pattern forming.

Activation Function: Softsign. lots of math talk and graphs I don't understand, so I just went with the recommendation.

Weight intitiation: XavierNormal. Same thing as above

Layer normalization: No. I haven't seen anything informative about it, but no one seems to use it

Use Dropout: Yes. I heard it's good if you have a "larger hypernetwork". I think that means the numbers in the Modules up there and also the amount of training images used. I had 90ish images and did the mirror image thing to turn it into 180ish, but that's definitely not as good as 180 unique images. I don't know if it was good or bad that I used Dropout, but it didn't ruin anything


File:firefox_sIU6yWlBwt.png (83.95 KB,949x1062)

And once you get to the Training tab you can load the hyper you just created (or one you've downloaded maybe? that part seems questionable)

This tab is for training embeds or hypernetworks, but I've only done hypernetworks so I can only talk about that.

Batch size: I haven't been able to find conclusive information on this since 'batch size' is text that is shared in every prompt so you can't just search for it by name. It uses more VRAM, but might not necessarily be better at training. The ONE comment I've found on it says that you could increase it instead of lowering learning rate later on. I'm already at VRAM limit when training and having a video and photoshop open so I don't touch this.

Learning Rate:: I think people start with the default for these. Only the hypernetwork number matters for hypernetworks. I see people add a decimel point in front of the 5 as the training steps reach 5000 to 10000, so I copied that. It sounds like the lower number is better for finer detail once you've established things

Gradient accumulation:: A newer thing, supposed to assist in training rate somehow, but I don't know how. It mentions something like "learning in parallel" or something. I don't know. People say to use it and set it to like 5 so I have it at 5.

Dataset Directory:- The image with the folders. I could talk about images, but I'd mostly just be repeating this: https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/2670

Prompt template file: This is a list of basic prompts that are added to previews alongside the included tags attached to each image. People say it's fine as default, but might be something to mess with if you want to check for specific stuff?

Width/Height: Keep it at 512/512

Max Steps:: How far the training will go. This is stuff that takes days, though, so I'm not sure how useful this is, because of something I'll talk about in a sec. I suppose it's good if you only want to run it for a set amount of time.

Save an image every n steps:. It saves an image as if you prompted it with random tags included in your training folder, but it can make freaky combinations that you wouldn't normally use so keep that in mind.

Save a copy of embedding every n steps This is an important one and why I didn't care about the Max Steps thing above. It saves the hyperwork with the number of steps to the folder automatically. By default it's at 500 which is where I have it.
This means the folder will look like:
as it trains for longer periods of time.
There's an option in settings under Training to Save Optimizer state, which allows you to resume training from these saved files. VERY important!.

Note: To use the hypernetwork (or resume it from file) you need to move it from the saved directory (default textual_inversion) and move it to the models/hypernetworks folder

Save images with embedding in PNG chunks I think it lets you use PNG info like normal generated images. I kept it on.

Read parameters from txt2img For preview images it takes what you typed in the txt2img tab. I never used this since I wanted a variety of images, but it could be useful? I read to never use tags like 'masterpiece' or 'quality' there, though.

Shuffle tags Yes. It adds more variety to the images by changing priority or something.

Drop out tags with prompts: I think it drops a certain percentage of prompts per generated preview image. I kept it off, but not sure. It's just preview images and not the actual training itself, so I guess it could improve or hinder perceived accuracy there.

Latent Sampling Method I only hear people mention deterministic so that's what I went with.


File:explorer_dWd3RdfklC.jpg (424.05 KB,1395x1111)

I've been cropping and positioning images the past few days for the Amaduyu/Aquaplus thing I plan to train. I started rotating some of them, too, and might go back and do that with some of the images I've already done. I've been kind of OCD about focusing on this because it will take a long time to train since I want it to be extremely thorough and I'm sure I will make mistakes. I'm starting to worry a bit about how much of it isn't Kuon. I think I might need to make one specifically for Kuon or use an embedding to pair with it somehow? Not sure how they work together.


File:firefox_glhn53KTf1.png (347.91 KB,1474x754)

I really didn't expect it to get this one this well. Man, this stuff is good.
It definitely shows that you have to clean the tags manually, though, as her tail isn't there. Got a bunch of iamges that weren't on a booru so they don't have tags.
But, I'm not sure if I'll use it because it looks too complex and would probably mess things up


File:explorer_fhrs7k1XEO.png (1.69 MB,1673x1107)

So much effort...
People have made other progress with cool extensions and stuff, but I can't remark on it yet since I haven't messed with them


File:BooruDatasetTagManager_MgO….png (924.43 KB,1898x618)

cleaning up tags
the auto tagger does incredible work, but it's not perfect. For example I had to add the furrowed brow, :o, and portrait tags


File:kuon31.png (458.26 KB,512x512)

Apparently having a bunch of plain backgrounds is bad, so now I have to go back and add backgrounds. I guess it's good that I have an Utawarerumono artbook with some backgrounds on it, although I had to scale them up with waifu2x. This is so much work and I regret doing it, but I'm too far now.


File:explorer_tXnkh1IWRr.jpg (460.15 KB,1643x1116)

Okay, I'm finally going to bump this thread because I'm now finally training it! This new method is like 20x faster somehow. The prompts are randomized during this so some of the abominations you see are because weird tags are being combined like 'leg' and 'bed', but no '1girl'
YES! IT'S ALL PAYING OFF! MY HOURS AND HOURS OF LOSERDOM IS REACHING FRUITION! All the wasteful pruning, the tag specification and elimination!

(hope it learns the proper position of utawarerumono ears though...)


they all seem kind of kuon like


File:aqua-1450.png (340.19 KB,512x512)

Well, it's the same artist so they should look somewhat similar, yeah.
But in that image I posted I can identify different characters. I'm not sure how exactly it works because sometimes it's very random, but other times it's obvious, like how this is an attempt at Kamyu (even though I never labeled her and it wouldn't understand it anyway)


File:aqua-3325.png (405.31 KB,512x512)

it's learning... IT'S LEARNING!


i think it can get more perfect


File:02064-4265108999-1girl, an….png (10.24 MB,2816x2688)

Hmm... after a full night I'm not sure if it can. At least overall, I think getting a perfect one is still going to be rare. I see what people are saying now and that you're probably not going to notice gains after 20000 steps or so. But, I think it still needs improvement somewhere.
I need to look at it and see if there's stuff I can improve upon, which basically means I'll train it again at a different rate. When I tested making a hypernetwork a few weeks ago one night was like 2000 "steps", but now I just did 58000. (meant to do 50000, but forgot I resumed from 8000). It saves a progress of its current progress, so if for instance it made the best result at around 20000 steps and then went haywire, you can grab the backup it created at 20000 and either just use that as the completed product or resume training from it.

Well, now that I've done the style hypernetwork I should try making 'embeds' which I'll use to teach it characters. It still doesn't know who any of these girls are so I can't actually call them directly and instead need to use their traits and hope it arranges it correctly. For instance it'll never know Kuon's proper clothing or ears unless I create an embed which I can invoke from the prompt. From what I've read, when training an embed you label everything EXCEPT what you want it to call.

Maid Kuon at the computer! It really can't do hands or keyboards, but that's not specific to this.


File:02077-4265108999-1girl, an….png (9.17 MB,2816x2688)

...and here is the exact same prompt and seed and everything else, but with my hypernetwork disabled. Well, maybe it IS messing up hands and arms a little bit. I'm not sure if that's something I can fix. Do I go back and pay lots of attention to hand tags? I guess I could try that...


File:02132-2730948104-1girl, je….png (945.23 KB,768x1024)

Hmm, so this is an instance of improper tagging showing up. Karalou's slave collar is technically a collar, but I must not have eliminated its generic 'collar' tag, it's grabbing from Karalou for bride Kuon's 'detached collar' here. This stuff is so crazy


File:02124-169421348-1girl, jew….png (895.63 KB,768x1024)

and this would be a much more accurate representation of what it's supposed to look like


dont remove it, kuon looks sexy with a slave collar


File:BooruDatasetTagManager_3iu….png (1.05 MB,1287x715)

It wouldn't be removed exactly, just require an actual 'metal collar' prompt to show up. I guess I should remove 'breasts' from everything, too. I'm not sure why boorus have redundant tags like that.
... I think?
'detached collar' isn't even here, so maybe this is something I can't avoid at all. Or maybe this is the base data... I guess I should do some testing... either way it probably wouldn't hurt to specify things more


File:00031-3720141358-1girl, so….png (648 KB,576x768)

There's an extension called DAAM that creates a heatmap of the effect of a prompt on the resulting image. It's really quite amazing that this exists.
This is the result for "low-tied long hair". It's supposed to add, well, long hair that's tied. However, it's broken and seems like it's applying to her clothing instead and is adding tied ropes to it. This is a tag to avoid and maybe I should purge it from my training thing.


File:00032-476515586-1girl, sol….png (608.4 KB,576x768)

The heatmap for 'smile' is exactly where you'd expect it to be


File:02167-99510245-1girl, anim….png (12.86 MB,2688x6286)

God damn. Okay, I can say that switching from a 1, 1.5, 1.5, 1 neural net thing (whatever that means) to a 1, 1.5, 1.5, 1.5, 1 one was a massive upgrade. I don't know why. Oh, and I think I MIGHT have clip skip set to 1 instead of 2, but that wasn't supposed to be a big deal. Hmmm.
The first one here, 'aquaplus' was my training 2 nights ago whereas the two others are different checkpoints from the one I trained last night. I just don't understand how it's such a massive improvement.


File:02168-2730948104-1girl, je….png (3.17 MB,1344x3142)

Okay, this is weird. I did the exact prompt here and the new ones look bad with this DPM 2M Karras sampler...


File:02169-2730948104-1girl, je….png (2.95 MB,1344x3142)

but on the default euler a sampler they look fine, but the first one seems like it's probably the best, but still within the normal variation you'd expect.
more testing needed....


File:explorer_ICmVTo8wHE.jpg (600.31 KB,2158x1055)

Now training the Kuon embed. I will combine this with the artstyle hypernetwork and it should have great results. That's the hope at least.
What this means is that instead of typing "yellow eyes, swept bangs, etc" and hoping it assembles them correctly I will invoke the name of the embed and it will fill that information in in a way that would be impossible to attempt manually. I don't expect the hair ornaments or clothes to look perfect, but it should definitely get her hairstyle and ears right. There's about a 0.0003% chance that it will get her skirt pattern right.


File:00163-4181300038-solo, smi….png (720.31 KB,768x768)



File:tmpldpue00e.png (15.32 MB,3840x3072)

This was the batch
solo, smile, 1girl, thekuon, :d, sitting, looking at viewer, pov


File:00168-908452894-solo, 1gir….png (971.81 KB,768x1024)

It works alongside other words, like 'swimsuit'. Man, this stuff is truly amazing.


damn that looks really good too
at a glance it's hard to tell those are AI


Congrats, seriously. Looks damn good.


File:1367881241884.jpg (91.58 KB,700x700)

Looks like it turned out very well, congrats!


File:00236-3931658932-1girl, so….png (514.65 KB,832x576)

Thanks, guys! This is really a new world and I'm not sure how to feel about it. I feel guilt over artists, but at the same time I'm really enjoying messing around with this stuff. It's been taking up too much of my time, though, so I need to get back to doing other stuff...
I can envision a 3D>2D>3D kind of pipeline in my head right now. If only my tablet wasn't broken... hope I can RMA it soon.


File:FmWCnRfakAE9boP.jpg (237.43 KB,1024x1536)

Saw some really neat AI art that was good enough for me to save. Is it getting even better or something?


File:00501-3541163114-1girl, so….png (954.29 KB,840x840)

By now the default with the newer fine-tuned models is far more impressive than the NAI leak stuff, but it's still all built on that. You're actually in a far worse position if you're paying for it now.
This high detail one is based on a mixture of real life and 2D art so it can do things pretty well, but you have poor control of it. It looks extremely impressive, but it's not obeying my very strict training of Kuon's outfit, so imagine trying to get something that you haven't trained.

I've been wondering if I should start grifting since I know what I'm doing and everyone else that knows what they're doing is, well... kinda normal. But, being normal is what gets you the most exposure and success. I can't name gacha characters, for example. But, I could corner an AI fetish market especially if I combine the training with my 3D models. This is when it'd be good time to have the motivation to do things


Sounds like such a bad way to put it... but I get what you mean. I think if you consider yourself capable and with some desire to do so you should try it out. I mean training your own stuff is probably a long and arduous process, more than most are willing to invest into it.


Im very proud of your progress.


Its kind of crazy that if you put the effort into training one of these things you could have unlimited fetish porn?


I havent played with novelAI outside of porn but I may generate non-ero OG fallout fiction with it to see its quality


>unlimited fetish porn
That is exactly what it is. If people think there's "porn addiction" now, wait until normal people get a hold of this stuff. I have a blog of the pornographic progress I've made on /megu/.
Still, I'd prefer human-made stuff if it existed, and stuff that's more mental like incest isn't really something you could satisfy with an image alone. You can't tell stories with this and stories are really, really, REALLY good. A good doujin is easily better than this stuff, but when you're dealing with specific tastes then yeah, it's the best option available.


NovelAI really likes futa


File:FmB4eORaYAAEejg1.jpg (725.81 KB,2829x2237)

Came across this and thought that it was neat enough to inquire what the heck is with the coloring ability of these AI, it's pretty great.



What's interesting is it doesn't necessarily make the same mistakes a human makes when drawing hands. It makes its own sort of mistakes you don't see in real art.


The most popular checkpoint models these days (for those doing it offline) are a mix of 2D art with conventional photography. It increases hand quality a bit, but it's still far from from passable most of the time without a bunch of "inpainting' which is basically like selectively retrying a part of an image. Some of the models people use look more like real life photos with an advanced filter on top, which can be very creepy and also takes away from some of the appeal since it introduces 3D limitations in perspective and such


Colouring shouldn't be hard for an AI. It could just pick colour pallets form existing images in the same pose and apply them.

Even though the line work is done the lips of the girl on the top right are weird. Also I am not sure if I ever noticed this before but the hair on these AI girls is quite bizarre(the ones on the right), not only is the fringe not symmetrical but the far side looks weird.

The backgrounds are odd too, the sunset kind of drops suddenly in one image and the floor boards in another are all different widths.


I mean top left not right...


File:00981-919244970-1girl, sol….png (969.76 KB,816x920)

As an example, as I continually attempt to refine my custom merged checkpoint for, uhh... /megu/ reasons you can see the effect of one of the models already having some RL stuff mixed into it. The shading is absurdly good, but I really have to fight it to create clothes that aren't modern and it feels very "real" which can be a good thing and a bad thing depending on one's tastes. (also as a side note need to figure out why it's ignoring tags)
And look at that hand. I didn't do any editing here. But, it definitely looks like a real human hand. I don't know how to feel about it. I guess maybe for now it's a sacrifice to make if you don't want to do edits, but I like style over reality.


File:00991-1957439408-1girl, so….png (786.16 KB,712x816)

Another example of hands in logical position


really amazing for ai hands


I wonder if it's possible for AI to make manga or if that's far too many variables to be solved in a realistic timeframe


Have you ever tried using doodles you make as a base for the AI to build off of? Wondering if that's more effective than just generating a bunch of images that may vary in psoture/position each time.


File:01464-2023-02-03.png (1.04 MB,768x864)

I did a whole bunch of testing with various RL models to see if I could understand how exactly people are making them assist in 2D hands/poses while not giving them a massive hit in quality and I really could not find any pattern. Although, my tolerance for spending hours making small merge differences is getting pretty low and I need to spend some time doing other stuff before getting back into it.
However, I did think I have an idea of how to bandage it. THe LORA things are basically like "plugins" for a checkpoint model, and for example the Amaduyu/Aquaplus one I made is pretty good at fixing the faces, but then of course they will always have at least a hint of Amaduyu/Aquaplus so I'd need to mix them with other LORAs.
It's also useful to use a thing called kohya, which is normally used to create LORAs, to separate merged checkpoints into their base ingredients. This means you can more easily control the intensity of something without needing to create a bunch of 4-8GB merge files.
Seems like there aren't any new amazing models recently, just merges of existing stuff (although some of them are quite impressive)
So, I can't think of any notable breakthroughs in the past month, just refinement.

Still, I continue to be annoyed by all the people using "waifus" in these things. I know, I know, it's a generational difference and they don't know any better. But it still annoys me.


File:saltsypre_502.ogg (26.19 KB)

With the popularity of AI voice cloning and eleven labs AI going payed, I decided to look into some of the offline runnable alternatives. The most popular one or alteast easiest to setup, seems to be Tortoise-TTS. It works okay enough and has some pretrained models the author directs you to use. There's a a guide and git repo someone setup that provides this service with a web interface https://rentry.org/AI-Voice-Cloning
The biggest issue I and many others have with tortoise is that the main author won't release a guide or overview of the process he went through to train his model. Simply saying if you're smart enough you can figure it out. Kinda leaves people at in impass for actually using this program as an alternative to eleven labs.

I've had some minor success with one of the alternatives (unofficial) VALL-E, https://github.com/enhuiz/vall-e
It's taken me a bit of dependency chasing and cobbling together a separate PC to install linux on (the DeepSpeed dependacny has been a nightmare to get working on windows) but, I've actually been able to get a "decent" output with a 3060 12gb card and about a day of training on ~7.4k couple sec audio files ripped from Vermintide 2. I'm not an expert in ML but the result I got with training a model from scratch with this limited data set and a "low powered" card make me optimistic for VALL-E's potential. I didn't really have to know much about machine learning, just how to install various dependacnies and 3rd party utilities.

VALL-E is based on phonemes so the text to be synthesized is meant to be sounded out, I think. I don't know if there is a whole lot of prompt engineering that can be done with this program, though my current model is probably too limited and untrained to really test that out.
Attached is the voice I wanted to clone.


Here is the output.
The prompt text was "Blackrat spotted! Keep your guard up!"


>refine my custom merged checkpoint
I haven't been closely following your posts, more just watching your results, when you talk about custom check points are you training your model (custom data set of /megu/ images) starting with some base model as a checkpoint? How are you doing that for stable diffusion and what sort of time sink is it/hardware are you using?


File:01962-sfw,_painting,_(J.M.….png (1.3 MB,896x896)

Mm, how to explain...
The "custom model" I've been talking about recently is a merge of existing checkpoint models, which is something like NovelAI or Stable Diffusion. My most recent one using Stable Diffusion, NovelAI, Yiffy (for genitals), Anything (that's its name), AbyssOrange2/Grapemix and a couple others that I'm trying to switch in. (GrapeMix doesn't seem to have any RL image data, so I add in some of the basil mix myself that AbyssOrange2 does, that guy was onto something for sure)
They're large (2-8GB) files that contain a whole lot of training data and you need a really powerful GPU to train them. I'm not sure you can even make them without something like 24GB of VRAM at minimum, and then you need a whole lot of time (weeks of constant processing) if you don't have like $50,000 worth of processing power sitting around.
However, someone like me can create merges of them with custom settings that hopefully take the desired parts of A with the desired parts of B. But, you definitely make sacrifices when you do it and the trick is to try and counteract them. It's a really annoying process, though, because there's no guide to see what each setting does so it's a bunch of trial and error. Every time I think I noticed a pattern, I change a different slider and it completely invalidates what I thought I knew. Also each merge takes like 30 seconds to create, 10 seconds to switch to, and then however long the generation takes on your current settings. Also when switching between them your VRAM can get corrupted somehow and you need to restart the program so you don't get false results. Each merge is also 2-8GB so you have to routinely delete them and take screenshots/notes of what you've learned, if anything, from the merge results.
The main training data I've done myself is for Kuon and the Amaduyu (Aquaplus) hypernetwork/LORA things, although I've done some other artists to mixed results. They rely on getting layered on top of a checkpoint model, so they're heavily influenced by it.

What kind of timesink is it? Weeks, but I do other stuff while it's merging and generating. I can't imagine most people will want to do it. But, I've also been doing this stuff since early October so I guess it might be a slower learning process for other people.
As for hardware, I got a "good" (less absurd) price on a 3080 12GB for Black Friday


File:firefox_vJasASdG3I.png (113.39 KB,1668x1155)

And this is what the "Merge Block Weighted" panel looks like to make make merges that are better than just a brute force "30% of this, 70% of that". Pretty self-explanatory, right?


File:BooruDatasetTagManager_0Il….png (739.6 KB,1162x533)

I think I can describe the difference better. The "checkpoint" model is the database that has the actual definitions and data on the information of a tag. When I trained my Utawarerumono, Kuon, and other things I was training it against the NovelAI model. The images have tags like "ainu clothes" or "from side" because that is specifically the booru tags that NovelAI trained. I'm not defining what those are, I'm providing information on what they look like when drawn by a specific artist, and the training process compares it to the information that NovelAI has. There's a huge gulf in defining the tag itself and merely referencing it.
People, including myself, have trained concepts (which is what a tag is), but it's just one at a time.

The horrendously named "waifu diffusion" has been undergoing training on its new version for over a month now, but it was just at Epoch 2 when I last checked a couple weeks ago so it might be at 3 now. One epoch seems to take about 12 or so days to complete? People said the first epoch sucked and 2 might have shown that the finished product could be good, potentially, but we'll have to wait and see. It will probably not be something to test out for real until Epoch 6 or so?
But, I haven't been paying attention to any news about this stuff


what is a checkpoint model?


nevermind, it's a database with the tags associated with images


File:02137-v1-5-pruned1girl,_so….png (703.97 KB,704x768)

Basically that, yeah. It's the skeleton that everything is built upon. The most famous one is Stable Diffusion (SD) and everything I'm aware of for offline AI image generation makes use of it. The 2D models still have the SD data in them so you can use words that boorus have no knowledge of and get results.
It's worth noting that most people using the offline method are using the older (1.4 and 1.5) versions of Stable Diffusion because the ones after that started aggressively purging nudity (but not gore) and potentially other things, from the training data. This had the effect of breaking the things trained on the older models, which includes stuff like NovelAI which nearly all 2D models make use of.
The last time I checked people were not so impressed with the newer SD models that they were willing to sacrifice a "pure" data scrape in favor of one curated to make it more attractive to investors


File:kagami.png (244.16 KB,800x781)

This seems kinda funny since now that the cat's already out of the bag and people have access to the older models in their entirety, there's no real reason for people to use the newer models and a loss of functionality just means it'll become irrelevant as people improve the current local models. It may seem like a good move for investors at first since all the other AI companies are doing it, but the one thing they have that SD doesn't is that people don't have hands on their code to use it unneutered.


File:pose.png (1.14 MB,2304x767)

The newest technology just came out a few days ago!
ControlNet lets you control the generated image by pose and composition though normal and depth map, edge detectors, pose detection, segmentation etc. This is much easier and finer control compared to regular img2img.
An extension for webui also allows you to adjust pose as you wish.

Guide: https://rentry.org/dummycontrolnet


hm, so they gave up and decided that this is where humans need to come in and give the images context.


That's quite the leap.
Could you make a leaping Megu?


File:jump.png (1.15 MB,2304x768)

i tried


File:[Rom & Rem] Urusei Yatsura….png (727.29 KB,960x540)

Dang, that's cool. This is what happens when you take a break from checking for AI news in /h/, huh.
Seems neat, but it's also introducing more effort into generation which isn't really my thing. I had tried to use depth maps about a month ago, but learned that it was limited to Stable Diffusion 2 and above, which kills any desire that the majority of people on imageboards would have for it. So any extension that makes use of depths maps, but not requiring the neutered corporate-friendly SD is great.
I'm not sure I'll use this, but it's cool to see in action nonetheless


Angry boobs


Wonky legs, but still impressive.


File:grid-0076-1girl,_solo,_(mi….png (5.12 MB,2560x1664)

Someone on IRC asked me about how to go about generating images for Miyako from Hidamari Sketch, done in the orignal Ume Aoki style. We already know that doing a style is impossible without training since artist tags were purged for Novel AI and that's what most 2D models (including everything I use) is based on. So, the question becomes whether it recognizes Miyako. Unfortunately, as you can see, it does seem to somewhat know of her blonde hair color, but everything else is a mess.
Conclusion: Miyako has to be trained as well.


File:grid-0080-1girl,_solo,_(mi….png (5.18 MB,2560x1664)

I searched through some 4chan pages and found that someone did create an Ume Aoki LORA. It seems to work pretty well at capturing the style and also seems to capture Miyako to a degree, but it's still not accurate.
It's in here if you want to download it yourself (use Ctrl+F) https://gitgud.io/gayshit/makesomefuckingporn#lora-list
So, I told the guy to start amassing Miyako images which will be combined with the Ume Aoki style LORA.
Things to note for good training images for a character:
1. Solo
2. No complications like text overlaid upon her
3. Text elsewhere in the image should ideally be edited out
4. Limited outfits. Ideally it'd be maybe 3 or less, depending on how many images you have. When I trained my Kuon stuff I did not bother since she is portrayed in only one outfit about 95% of the time. Each outfit will need to be tagged in the training process and called upon manually with a custom tag of your own choosing during image generation later on. She can still be portrayed in other outfits, but if you specifically want her in her own original clothing you need to train for it.
5. Different angles and "camera" distance. The more variety of angles you have, the more accurately it can portray them later on during image generation, although it does a pretty good job of filling in the blanks since it already knows how human characters should look from different angles.

Then the images themselves should be cropped to be somewhere squarish. Unlike the old days of late 2022 it does not need to be exactly 512x512 pixels, but you should avoid images that are too tall or wide (heh) at like a 1:3 ratio or something. I'll talk about the other stuff after I get the images


File:Sunshine Sketch - c129x1 (….png (1019.87 KB,900x1291)

Miyakofag here, yoroshiku onegaishimasu, and my deepest thanks to yotgo for his help.

A very important factor to consider is how easily the characters go from chibi to normal and back, as seen in pic. In their non-chibi style their head shape is somewhat hexagonal, with fairly sharp angles, while their chibi form head shape is usually either between a full oval and a curved rectangle, or a mix of the two with a pointy side bit like Yuno has in the middle, and the first has regular eyes and features while the latter two are (✖╹◡╹✖). Also visible in pic is how the wides are presented in variety of outfits, like Miyako getting a change of clothes in the middle panel, and then immediately returning to the first one.

I've downloaded the manga, but it's monochrome and fairly crammed, so it doesn't look like it'll be of much use. Seems like I'll have to take a few thousand screenshots of the anime again, but that's fine by me. I'll also begin to comb through boorus for useful art, and there's this other meguca stuff I'll be downloading in case they can turn out to be of help for setting up Ume's general style:


File:explorer_F0NcBv8fyY.jpg (382.57 KB,1424x1091)

Hmm, keep in mind for a character that you want the character to be the focus and not the artist's style. It's better to have a more varied collection from various artists than a limited number from the official one. You're not training the shape of her head or how her mouth is drawn, you're training the combination of her outfit and eye color and hairstyle and the visual traits that identify her.
I can generate images of Kuon in different styles because it's not constrained to a specific style itself.
When I generate an image of Kuon to look like her original Utawarerumono appearance, I activate my Kuon LORA (Kuon herself) and also my Amaduyu LORA (the Utawarerumono style). Combining them into one would be severely limiting.


File:grid-0084-1girl,_solo,_gra….png (5.95 MB,2560x1664)

Kuon with Kuon LORA and Amaduyu LORA and the downloaded Ume Aoki LORA. This is to show what a merged character and style LORA would look like together with another style.


File:grid-0083-1girl,_solo,_gra….png (4.49 MB,2560x1664)

but with the character and artist separated, I can apply Kuon and the Umi Aoki style together without the influence of Amaduyu.
Hmm... not sure if this style will work.


File:grid-0360-[Grape! Base]1gi….png (3.1 MB,2048x1408)

Bleh. I had trouble training, but got it to work but then it came out like THIS. I really should have kept my old settings, but noooo I had to see what the new stuff was like.
I noticed that some of the images you gave me were small and I think I'll have to exclude those. They should at least be 512x512 and I think that's the main reason why it looks so blurry and low quality here despite being relatively accurate in some images.


nice, hexagon headed kuon


It already looks really, really good.
The small crops are my bad, I had taken "does not need to be exactly 512x512 pixels" to mean "smaller pics are okay", there should be a dozen, dozen and half pics to remove then, maybe a few more. There's also one where she has her top but not her shirt, which may explain the result on the top-right.


File:grid-0454-Anything-V3.01gi….png (3.94 MB,2304x1664)



File:grid-0455-Anything-V3.01gi….png (3.95 MB,2304x1664)



File:grid-0456-Anything-V3.01gi….png (3.37 MB,2304x1664)

Miyako-00006 (final)


File:grid-0458-Anything-V3.01gi….png (3.92 MB,2304x1664)

Miyako-00006 + Aoki Ume both at 90% strength.
Mmm, I feel like it's still not very good. The eyes are especially too shiny, but at least the clothes are good. Also artists seem to depict her with different eye colors. The training data is really not ideal, but you really didn't know what to look for. I had to throw out nearly everything that was below 512 pixels and of the images remaining some of it was still too blurry or grainy, but I wanted to see if we could get away with it.


File:grid-0468-Anything-V3.01gi….png (4.34 MB,2304x1664)

Hmm.. yeah it seems like there's some corruption or data loss or whatever you'd call it. The image is too "noisy" and looks over-exposed.
Here it is with my Aquaplus lora. It's funny how it put a dog there in the top left because I put "animal ears"


File:grid-0469-Anything-V3.01gi….png (4.3 MB,2304x1664)

I reduce the strength of the Miyako LORA and the image clears up, but then it becomes less accurate.
Bleh. Yeah, I need to train it again with better images.


File:cb885d1935.png (1.31 MB,3232x1569)

does this stuff actually work or are you just drawing the 6 images and pretending it's an AI


File:02950-Anything-V3.01girl,_….png (2.95 MB,1280x1792)

It works, but finding the right prompt can be exhausting. It looks like you're using some online model and those have some pretty severe limitations. I don't really know how to best use the real life models that use verbose text rather than booru tags. There are prompt repository sites like:
but also personal pages of research people have done like https://zele.st/NovelAI/

After a bunch of testing, I think I'm satisfied with this Miyako LORA. It seems to work best with the Anythingv3 model, although I haven't done hours of tinkering. But, this reminds me that I really need to create a good SFW 2D merge of my own, but I keep struggling to have it look good with multiple different prompts and LORAs.
I also know now how to 'host' it and allow people to connect to it, but my upload is capped at 1MB/s so the limitation is there...


File:1646704983644.png (499.65 KB,640x480)

So I'm trying to do this on my PC again after reinstalling and now I forget how I initially solved the "No module 'xformers' found continuing without it' thing before. Also I think I may have cancelled the taming transformers clone after 2 hours but it hasn't tried to reinstall so maybe it worked?


File:Hidamari Sketch x Honeycom….png (8.23 MB,1920x1080)

Something that was very interesting about collecting a bunch of screencaps of her is that it helped me appreciate the amount of variety in the girls' wardrobes.
Since training material for a specific character requires consistency in their looks we decided to go with her standard school uniform, however, they regularly spend around half of an episode outside of Yamabuki wearing their casual outfits (of which each wide has maybe a couple dozen or more), in some cases they don't go to school at all, then there's Winter episodes where they're wearing a coat, and at one point she has a hair bun like Hiro's, I assume it's simply because she felt like it. Add to this Shaft's abstract cuts decreasing their screentime, how due to her character she has what is perhaps the highest regular:chibi appearance ratio, on top of needing her to stand alone without overlapping with other people, and I ended up only managing to take 62 usable captures out of the entirety of Honeycomb+Graduation. Far, far less than what I initially expected, like the max ~100 taken from 1171 fanarts of her. Thankfully, it was still more than usable.
Very late reply, but when I first saw this my heart skipped a beat. It's incredible, warm. She makes me very happy and I'm overjoyed to see it work so well. Very thankful for this.


File:grid-0838-Anything-V3.01gi….png (2.08 MB,1280x1664)



X |||____________________________________________||| X


bottom left is a JRPG protagonist


Maybe if we combine it with >>104530, we'll create the legendary「Shin Hiroi Yuusha」。


Can you link what guide you're following and what step you're at? It might be best to find where the stuff is installed and wipe it or something. I'm not sure...
You could try googling the error message in a 4chan archive maybe


File:C-1677995987510.png (3.38 KB,831x32)

I did wipe my VENV and either it's not in there or something went wrong maybe (although maybe it's fine?)

I'm using https://rentry.org/voldy and I'm just tweaking the asuka image right now. My current issue with it is "vae weights not loaded. make sure vae filename matches the checkpoint, replacing "ckpt" extension with "vae.pt"." and I'm a bit confused of what to do to fix this one, but maybe since I'm getting a known error the taming transformers thing worked? I dunno. However, what I'm wondering about right now is getting this, I forget if xformers is important or not and if it is, how to install it.


Also, why is it that sometimes my generation lags just because ff is open even though I'm using chrome...


File:Screenshot 2023-03-05 0111….png (42.96 KB,726x299)

>My current issue with it is "vae weights not loaded. make sure vae filename matches the checkpoint, replacing "ckpt" extension with "vae.pt"." and I'm a bit confused of what to do to fix this one
It's talking about if you're using a model with a vae, you should have a file named the same to go long with it. For example, "Anything-V3.0.ckpt" and "Anything-V3.0.vae.pt". I'm pretty sure it should work fine if the model you're using doesn't have one.


File:cmd_1Gzj7OumUa.png (6.83 KB,682x99)

Nevermind, that's specific to windows 7.
Uhh... hmmm...
Well, there's a message when I launch about updating xformers (I will someday maybe) and I think it gives a hint?
Run that commandline thing... I think?


I've seen people recommend that you put VAEs in a subfolder. I.E:
>blah/models/stable diffusion/vae
The vae mostly determines color and you can select them manually or switch automatically if the name matches, as you said. I don't remember where those options are in Settings.

You can put this into the Quicksettings list under "User interface" in options and then the main screen will let you switch these around without needing to go into the Settings every time:
sd_model_checkpoint, sd_vae, CLIP_stop_at_last_layers


File:00064-1021740396.png (2.09 MB,1536x1024)

it's strange that almost all images itt don't use latent upscale, considering it lets you get much higher resolution and quality


File:03369-[AnyGrape Furrymix C….png (770.61 KB,816x1024)

Do you mean when you take a generated image and take it to the img2img tab, or do you mean the scaling "postprocessing" that does that automatically during generation? I don't do the first one, but I do the latter sometimes. The problem is that it's a total VRAM killer, so I go from generating 8 images at once to 2 or sometimes even 1.
Maybe I should try the "manual" scaling sometimes, but I just haven't felt the desire to do so. I like seeing the final image and not doing anything to do it afterwards because then it begins to resemble work since this stuff doesn't really satisfy the creative urges. I like setting it to generate a bunch of images and then doing something else, too.

I just spend the past 2 days downloading and organizing LORAs, so I'm going to be generating a lot more Kuons soon. Hehehehe.
One day soon I might redo my Kuon and Amaduyu LORAs, particularly the Amaduyu one that controls the art style because it tends to produce a lot of errors that aren't otherwise present. No idea what I did wrong with it.


File:[MoyaiSubs] Mewkledreamy -….jpg (341.7 KB,1920x1080)

This is something I saw a month ago that was way over my head. It still is, but it seems people have been using it very successfully so maybe I should give it a look sometime:
Basically it automates taking tons of screenshots and tagging them and such so it doesn't take dozens of hours like what I did a few months ago...
I definitely have a bunch of shows I'd love to be able to reproduce in prompts, so this is right up my alley. I think shows like Mewkledreamy would need a lot of manual screenshots, though, since there are so many great frames that are barely there and would be easily skipped over by some randomized thing.


Kuon's cankles


File:cmd_JOSvYtIDL8.png (4.56 KB,688x68)

Luckily for me there's a cold front going through because I'm going to be generating quite a few images while I'm sleeping. I was downloading a bunch of LORAs a few days ago as I mentioned, but now I've made a new merge and I'm going to create a folder of example images of said LORAs in action. 14 images per LORA, two seeds for each prompt, and 505 Style LORAs.
Although these image sets are going to be pornographic, I'm going to make a non-lewd example, too.


How did they go?


File:firefox_eAXhN1hF5e.png (479.43 KB,734x595)

I ran into an issue and was too tired so I couldn't do it. Unfortunately it seems like a recent change in the automatic1111 thing (or maybe it's because this grid I'm making is different from usual) it's making all the batch image files at the very end. I don't really trust it to properly create 505 large images after many hours of work (where is it storing the data?), so I need to do it in batches which is REALLY annoying.
But, I've learned that it's also going to take much longer than I thought, at about 5 minutes per Style. If only I didn't need to generate one at a time to make this nice grid pattern with 7 different prompt sets, 2 seeds, and then the LORA change itself.
I guess I'm not playing the Nosuri game until this is finished. Oh well.


Done with about 120 of them so far. However, the question I now ask: How the hell do I organize all these images so I can easily determine the proper style for a thing?
I guess I can give them names like [Name][High Quality][Western][Realistic][Colorful][Big Breasts] or something?
How on Earth am I going to do this...


make the names into tags i guess, then use regex in the file explorer


File:firefox_OKQ8lp2CbU.png (991.8 KB,1862x974)

I did some basic organization of the Styles LORAs, giving them basic trait names like 'Cute', 'Shaded', or 'Fleshy' (for stuff with detailed skin) and other stuff.
But it looks like another addition I didn't noticed is the "Additional Networks" tab for the LORA Extension that gives you space to add an image and description and stuff. A lot of these LORAs people have made require keywords for the character, which I guess in theory lets you give new outfits to characters more reliably. I might do that to my old stuff... maybe.
Pretty neat, but this will be tedious to set up.


File:firefox_Qkhb4PGUTe.png (24.22 KB,576x793)

Civit.ai, a place that people have been using to host some models instead of mega or other file upload places just did some sort of overhaul. I'm now presented with this consent form and it makes me think that they're ready to sell it off, since a userbase has been established to give the site value before the great neutering. I mean, come on, a setting to hide the middle finger and bare male chests? This is definitely heading in a terrible direction for a site that grew specifically because of porn.
This is after tags like 'loli' were purged over a month ago, of course, which had some issues with hololive.
I hope this leads to people abandoning it.


>A good checkpoint model (mine is a custom merge of like 5 of them that are themselves merges that other people made)

So do you constantly merge models and stuff or is it one model you use for most things? Also is it possible to upload this one, I'd really like to check it out myself.


Thought it'd be better to ask here instead of cluttering up the other thread


File:notepad _40RGJPuWHt.png (87.98 KB,1203x1169)

Yeah. I talked about it a bit here >>103583 and the post immediately after that is the UI for creating a more involved "Layered" merge between models. I still don't understand it much, it's just a bunch of trial and error and I can't say I've learned much after looking through papers and notes from other people who similarly seem to theorize things only to have it change later. Pic related is a glimpse into my nonsensical rationality in trying to find patterns in the first merging experiments I did in trying to create furry-quality penises with anime visuals. The video at >>/megu/538 is related. VERY NSFW!
I have "formulas" saved that I test in all future merges I make, but they rarely carry over their benefits when making future merges with different models or even if you keep the old model and add a new one to it. It seems like they were specific to the merge at the time. If I make a note of "Slider IN07 gives great faces when set to 1" it does not necessarily carry over to merges between different checkpoint models.
Since that post was made someone had an extension where you can do "live" merging with models that lets you test it before creating a new 3-7GB file each time, so that helps a lot.

I usually go a few weeks between testing new merges because it's really exhausting. I create thousands of images while adjusting sliders and waiting for it generate and it's an all-day or multi-day affair.

>Also is it possible to upload this one, I'd really like to check it out myself.
Yeah, I could try to upload it somewhere, although my upload speed is terrible. First I need to give it a real name, though. Hmm... I guess I could do a bit of publicity and name it after kissu somehow.
Uploading the LORAS? I downloaded them all and it was exhausting, but they're 120gb...


File:explorer_EQAYcXaoRh.png (1.64 MB,1549x758)

Lala is probably not in a Miyako situation that would require screenshots. Miyako's fan art is very inconsistent due to the source material itself being inconsistent.
Lala... well, I think she could be separated into "regular" and "precure" forms for clothing and hair, but she still has the same body and head shape. My favorite art of her is very "noisy" so I don't think it can be used, but I could try.


File:00440-1girl,_sitting,_read….png (693.16 KB,672x864)

Alright, here is the link to my current model for use by kissu friends. (but I also made sure to include kissu advertisements in the files and password so even if linked elsewhere people will know hehehe)
I call it... *drumroll*
The [/s] <[Kissu Megamix]>
The compressed RAR is 3.5gb and my upload is 1MB/s, so I can't really upload a bunch of these, not that I would anyway since I can say this is the best version I have. While this model is focused on NSFW stuff, it can still handle cute. I don't know if it's the best checkpoint overall, but it's the best for my personal desires. My model lacks most of the haze that most of the RL mixes do, although it's not completely eliminated. The benefit of the RL models is from looking at the hands here. I didn't do anything to them, it's straight from the prompt. The password is in the text file, but I'll also post it here. Without quotations: "www.kissu.moe - my friends are here"


Oh, I forgot to answer the question about multiple models. Yeah, I have a few I keep around but I overwhelmingly only use the most recent one I've created. The model I just linked is the normal version (which I'm using in that thread) while the other model sacrifices face quality and booru tag recognition to better generate a certain body part. (in other words it's closer to the furry model)
The others I don't really use much, but are there for comparisons sometimes. I have to keep the stuff around that I make merges with, too, of course.
In total, I've probably made about 200 merges, with 99% of them being deleted shortly after creation. If you count the merges I've done after the "real-time merging", then it's probably more like 500. It's really an amazing extension.

I never did make my pixiv into an AI account. Alas, such is the price of having no motivation to interact with the wider world.


File:00441-1girl,_jewelry,_spre….png (1.41 MB,1008x1152)

Forgot to mention that I put (furry:1.3) in the prompt to demonstrate that while it has some benefits from the furry model, it's not overly contaminated by it. Patchy is still a human there. The Kissu Megamerge can do various bodies better than the majority of 2D models out there, such as 'gigantic breasts' and squishy plump bellies! (and the male anatomy attached to females of course)
My personal preferences:

I use generally use the "1.5 ema-pruned" VAE, which I'm uploading right now to the same upload folder.
It makes it colorful (and sometimes looks "over baked". If that happens, use the default novel AI VAE). The other 2D VAEs are too colorful on this, but you could try them.

I have also included the "4x-UltraSharp" upscaler in the mega folder, which you should put into stable diffusion\models\ESRGAN folder (create the folder if it's not there). I did a bunch of testing and found that I like it the most, although the differences aren't major.

DPM++ 2S a Karras. I'm not entirely sure on these samplers, but when testing different artist LORAs this one seems to have the most compatibility. I don't know why. Something to research more, I guess, but at the same time I don't really want to.
I have it at 26 steps, as going higher than mid 20s is supposed to be overkill. The rule for for upscaling is to do half the number of steps in the base generation, so 26 normal steps and then 13 Hires steps.

My default negative prompt is:
(worst quality, low quality:1.4), realistic, nose, 3d, greyscale, monochrome, text, title, logo, signature
If you somehow end up generating furry properties, try putting "furry" or "anthro" in there.
I don't use any of the old positive quality prompts since they don't seem to do anything noteworthy. (I.E masterpiece, highest quality, etc)


Thanks for putting this together.


I think one of the most extreme hurdles I have yet to see AI overcome, and I can't even fathom how it would overcome, is creating images that involve specific details of two or more characters. It just can't figure out how to assign differing aspects to separate characters.


MultiDiffusion and Latent Couple let you use different prompts for different regions and are available as plugins for webui

The MultiDiffusion extension also has Tiled VAE which lets you create much larger images without going out of VRAM


File:[MoyaiSubs] Reiwa no Di Gi….jpg (265.85 KB,1920x1080)

There's attempts at it, but it's more work than I'm comfortable doing with my VRAM limitations. (This stuff is a total resource hog)

Also apparently there's some major problem with the civitai site right now and anything downloaded is massively corrupt and can't be used. Whoops.
I guess I should go share this info with that /h/ thread since they've been helpful to me in the past.


File:multisubject test.png (1.85 MB,768x1728)

Here is an example of the methods in action.
The top is using naive prompt 2girls, cirno, megumin. As you can see the character details got intermixed.
The middle is the MultiDiffusion method. I set prompt of each half to one character. Now the character details are separated correctly. Needs a little tweaking to let the two halves fuse together better.
The bottom is the Latent Couple method. It also separates the character details well and looks a little more natural than MultiDiffusion.


Also there is this new extension that can do both

Other technologies of fine control of images can be found on SD wiki


File:00572-(Alexander_Jansson_1….png (1.1 MB,1136x656)

Heh, "Creativity". Well, as long as they're providing tools they can have their delusions.
I guess I'll try those that with Patchy adventure. Making scenery is really difficult with my current limitations and desire to not spend effort doing something to avoid spending effort


File:00582-2girls,_sitting,_rea….png (1.07 MB,1296x768)

Hmm yes. Furry Patchy Adventure just got a little bit better.
There seems to be a quality hit here, but it's still really impressive.


File:index.mp4 (981.82 KB,256x256)

There's been video stuff available, but I never thought to try it until I saw some random /v/ thread mentioning it.
It reminds me of where the image technology was a couple years ago with images. You have to download models, so it's like base stable diffusion where you can't really do anything other than basic stuff. No cute girls doing cute things here.
This is "Luigi beating Mario with a baseball bat until he explodes". I was angry that it wasn't recognizing things so I went with something violent with popular characters


Can't SD already do video in some way? I've seen anime girl videos made with mocap and controlnet
It's may be a lot more effort compared generating with nothing but a prompt though


There's been ways, but this one is a simple text prompt with no other work involved as you said. At 256x256 I couldn't make more than about 70 frames at once before running out of VRAM, but I didn't look at settings much. To me this stuff is only as interesting as it its ability to fill in the blanks, the more work I need to put into it the less interest I have because at that point someone should learn to draw or animate in my opinion


It's gotten very good at realistic image generation.



Looks like Ness


Ironically, it looks like it perfectly replicated the feeling of rage when you get pissed off that stuff isn't working


[Return] [Top] [Catalog] [Post a Reply]
Delete Post [ ]

[ home / bans / all ] [ qa / jp ] [ spg ] [ f / ec ] [ b / poll ] [ tv / bann ] [ toggle-new / tab ]