[ home / bans / all ] [ qa / jp ] [ spg ] [ f / ec ] [ b / poll ] [ tv / bann ] [ toggle-new / tab ]

/qa/ - Questions and Answers

Questions and Answers about QA

New Reply

Whitelist Token
Password (For file deletion.)
Markup tags exist for bold, itallics, header, spoiler etc. as listed in " [options] > View Formatting "

[Return] [Bottom] [Catalog]

File:00353-2700800976-girl, fac….png (365.08 KB,512x512)

 No.96625[Last50 Posts]

Anyone else been messing around with the stable diffusion algorithm or anything in a similar vein?
It's a bit hard to make it do exactly what you want but if you're extremely descriptive in the prompt or just use a couple words it gives some pretty good results. It seems to struggle a lot with appendages but faces come out surprisingly well most of the time.

Aside from having a 3070 i just followed this guide I found on /g/ https://rentry.org/voldy to get things setup and it was pretty painless.


File:[MoyaiSubs] Mewkledreamy -….jpg (217.55 KB,1920x1080)

Ah, yeah, I've been reading up on it. I downloaded some 7GB danbooru thing for it. I wouldn't trust /g/ with an Etch-a-Sketch so I won't follow a guide from there, but I've saved some links from other places:

I'll get to trying this eventually, but so far I've just been procrastinating a bunch since I need to install and run python and do other stuff I don't understand. My VRAM is also only 6GB and I'm not sure if that's enough.


File:00147-149750438-megumin, k….png (254.67 KB,512x512)

I'm not too concerned with the theory at the moment and more just wanted to know what practically has to be done to get it running. That guide more or less amounts to downloading some git repo, a model (this is the sketchiest part but you already did it), and download python 3.10.6. Then run a bat file and it works. From what I can tell the web-ui allocates 4gb of vram as a default and you'll have to pass arguments to get it to run more or less otherwise. It should run with an nvidia card that has 6gb.

That Krita plugin looks interesting, will check it out later.


File:62b580de9b13374d1a11e690fe….png (816.09 KB,1075x1500)

The face looks very Korean, I wonder if it's because the archetype is very common so AI is probably trained on a lot of it.


File:grid-0052.png (1.69 MB,1024x1024)

here are the other faces from that batch
the model i'm using is supposedly trained on a set of images from danbooru, not sure why it'd look korean specifically other than chance


I have exactly 0 (zero) interest in AI art. I have not saved a single file from one of them to this day even. I wouldn't call it being a hater, but they really are just fundamentally unappealing to me.


I don't get what all the fuzz is about either. If you've seen one image, you've seen them all. They all have this weird quality to them. Maybe it's that there's absolutely nothing meaningful about them. Doesn't help that most of these images look like bad crops.


File:spaghetti.png (742.16 KB,576x704)

I'm in favor of it as long as the results resemble 2D ideals.


why is she stuffing her boobs with spaghetti....


Ever wondered why girls smell so good? This is why.


File:patchy2.png (2.86 MB,2200x1536)

img2img is neat


dat' polydactyly
wow, so even AI has trouble drawing hands


File:grid-0102.png (5.12 MB,2048x2048)

surprised by how well this batch turned out
some of these could pass for a mediocre artist's work


File:download (16).png (557.43 KB,512x768)

what a big girl


>so even AI has trouble drawing hands
Yeah, it must be related to how the algorithm copies things it gets confused and can't do hands. With faces the parts have general locations and you can meld shapes a bit, but with hands it's trying to copy a bunch of different positions and angles into one and it breaks. Anime faces might be one of the best things since they don't even make sense to begin with in regards to angles.


File:1663263321-Beautiful waifu….jpg (39.5 KB,512x512)

I just steal other peoples prompts and add waifu.
Also if anyone else is on AMD on Windows, I followed this guide and it works https://rentry.org/ayymd-stable-diffustion-v1_4-guide.
Also Also if anyone can help me figure out how to change output resolution, that would be swell.


Yeah, I've been somewhat surprised by the quality of the more 3DCG drawings I've seen from it, but when it comes to more anime style the AI falls short. There's probably more subtleties that it can't pick up in batch because of differences in artist styles that causes these amateur-level drawings.


File:00080-1017444043-full body….png (502.04 KB,512x768)

I've been trying to create my Pathfinder character with it. I think this is the closest I've gotten, but it's still not there yet. I feel like I'm close, though...


Alright, I'm diving in. Might take a while to get stuff set up and figure out what I'm doing, however.


File:a.png (357.08 KB,512x512)

Making some progress...


File:b.png (368.9 KB,512x512)


File:index.png (Spoiler Image,4.25 MB,2048x1536)

Ehh, so many of these are horrifying so I'm going to put them behind a spoiler. I think I'm going to try that thing tomorrow where you can selectively "refresh" parts of the image


File:aaaaaa.png (343.09 KB,512x512)

oh no, this wasn't what I wanted at all!


File:hehe.png (372.77 KB,512x512)

I need to download the base model, this danbooru one isn't working the best for, well, non-"anime" stuff


Has science gone too far?


From AI i've used myself these arent so bad


Is that an anthropomorphic "furry" Koruri?


File:waterfox_ZRSVcPMhoC copy.png (550.51 KB,953x1039)

Okay, what the heck. There's this "textual inversion" thing which is a chuuni way of saying "custom trained models" and there's a few hundred shared ones for you to look at and download.
But, uhh...
Okay, the first one is an interesting find. Second one makes me think "okay maybe this isn't a coincidence" and third is "okay someone on a spinoff is involved with this".
There's like 500 of these total, mostly generic pop culture stuff, but these three REALLY stick out.


File:00410-493637731-bird sitti….png (850.78 KB,768x1024)

Perhaps unsurprisingly, someone released a furry model trained on e621 and it's able to do penises and sexual poses that the other databases can't. I think I'll make a thread on /secret/ for posting my experiments with it because porn tends to derail things.
Also, uhh... be very wary of trying stuff on the default model. It's trained on images of real people and I think there's going to be some legal challenges in the future.

Anyway, give me some prompts and/or images and I can mess with them if you don't want to configure this thing yourself. I have 3 models- a hybrid default/danbooru one, a pure danbooru one, and the aforementioned furry one. But, I'm really bad at it and still need to learn how stuff works. I tried to turn furry patchy into a bird but now she's a human.


File:00485-622766441-cats.png (618.9 KB,896x512)

this is pretty good, and it'd only get better if I had the patience to run it more times


File:grid-0087.jpg (1.07 MB,3584x2048)

Tsugu and Hagi didn't survive most of the attempts


File:935116e8b74134e41664779a19….png (Spoiler Image,278.45 KB,640x640)

AI generated Raymoo titties (NSFW)



the rendering and shape is good, but it's still making mistakes. Just that it's focusing on something simple so the mistakes are better disguised


did you use the prompts from stable diffusion to make that?


I didn't make it. I got it from the stable diffusion thread on 4/h/. I've been lurking it for a few days because it's a lot slower than the /g/ one and seems to have more technical discussion.

I just wanted to share it because I thought it was a pretty good generation.


There's a new model called Hentai Diffusion that was trained on Waifu (ugh) Diffusion and 150k danbooru/r34 images. I guess it'd be better at nudity?

You might need a huggingface account to download it. I have one because I was going to upload a set to train or whatever, but then I saw that there doesn't seem any way to use their GPUs without making it public and they have rules against nudity and I also wouldn't want to upload an artist's work for others to exploit for real instead of making stupid things on kissu.
Wish I had more VRAM. Oh well.


File:01040-248630214-girl's las….png (454.55 KB,512x512)

yeah. it's spooky.


File:01442-641596059-aria, aman….png (584.73 KB,512x512)

it's a lot harder to rationalize 'soul' and a human touch when you like results that are entirely mechanically hallucinated. maybe this means that art is more useful to understand an author than anything else.


I've seen AI that write code and this reminds me of some of the shortcomings people had with it.

While they were trained on a large database, it would often be the case that the AI was technically copying programmers from stack overflow and using the raw input information into people's software.

I feel like it's almost the same case here. It took chunks from every artist it saw creating essentially a collage with little creative problem-solving of it's own... and when it does it's simply a confused error rather than inference.

I was much more impressed by reimu's breasts


it's almost as if machines cannot think


File:a6b328da6c4d0e2e087ea99aa2….png (305.21 KB,512x640)

I've been messing with SD since last week using https://github.com/AUTOMATIC1111/stable-diffusion-webui and the danbooru model https://thisanimedoesnotexist.ai/downloads/wd-v1-2-full-ema.ckpt
I've been doing only txt2img cause apparently I don't have enough GPU RAM for img2img (laptop GTX 1660Ti).
A couple of images turned out to be cute most are pretty bad, or maybe my prompts are bad who knows.
I've been thinking of setting up my local server to produce anime images 24/7 with some script that autogenerates prompts, not sure how its GTX 970 would handle it though.
Not messed with lewds too much for now, going to download it and try.


File:20221004_130831.jpg (45.5 KB,448x640)


File:00190-2426397090-[blue_eye….png (347.18 KB,512x512)

nee nee, look at this military qt!


really hoping that's a cute boy and not a g*rl


Yes, science has gone too far.


You will live long enough to see robotic anime meidos for domestic use and you will be happy.


File:00006-863090008-anime youn….png (490.87 KB,704x512)

It's kind of interesting to see a real artist use it. I'm assuming he did the img2img thing which uses an image as a guide since it's got his style's wide face and ludicrously erotic body proportions.
This is a good example of how generic it looks when compared to the real thing, which you can't really get around since generic is exactly how it's supposed to function. In theory people can (and will certainly try) to directly copy individual artists, but so far it's pretty bad at that.


File:00203-285684822-[blue_eyes….png (243 KB,512x512)

another cute (female)


When you really break it down, "AI" art is more or less the same thing as procedural level generation in games. The computer is provided with a set of rules, and then randomly generates something that follows those rules.

That's also why I can't see it outright replacing artists like a lot of people are afraid (or if they're psychopaths, hopeful) it will. You can generate all of the level design for your game procedurally, and a lot of games do (minecraft, for example). But "level designer" still exists as a profession for a reason.


File:grid-0024.png (2.92 MB,2048x2560)

the many faces of /qa/-tan


kinda neat


File:grid-0077.png (5.09 MB,2304x2048)

Well, the thread is mostly to post stupid AI things, but only a couple people are doing it and it's annoying to run for me personally because it interferes with videos or 3D programs I have open most of the time.
Also I was mostly just testing how well it is at doing penises and the answer is that the furry one is passable, even on humans, but I won't derail the thread with porn


File:00171-2295644611-bird gba-….png (1.99 MB,1024x1024)

The "GBA Pokemon" embedding thing really isn't working for me. None of them are. I think you're supposed to use the exact model for them, but I'm not going to download a bunch of those since they're like 3-8gb each.


File:dumb arguing.gif (1.33 MB,1280x720)

I completely agree with this cute anon.


File:[mottoj] Tsukuyomi Moon Ph….jpg (109.55 KB,1024x576)

The serious discussion in this thread is being moved to a separate thread that is soon to be made. Brace for impact


File:grid-0000.png (1.82 MB,1920x1024)

Here is Laura that had an AI background generated after I masked it out. It was a blank white background before


NovelAI's model has been leaked. hehehe. Meaning you can do it offline without paying them.
It's 52gb with multiple models, and I doubt I'll be impressed but I'm torrenting it anyway.


Can you post the link?



Thanks, adding it to the hoard.


But someone is replying about 'python pickles' and I have no idea what that entails. I guess he's telling people that it could contain a virus or something or otherwise have code in it? There's this link but I have no idea what it means: https://rentry.org/safeunpickle
Does anyone here know python and can tell what the thing above does? They made it sound like it's something to use to check for malicious stuff or maybe I interpreted it wrong.
But, people on 4chan are already using this so I think it's safe


File:1632939770819.png (495.82 KB,1024x1024)

Pickle is a data serialization library: https://docs.python.org/3/library/pickle.html
Serialization means turning in memory data like objects into a format that can be stored on disk or sent through a network. JSON is another common serialization format.
I don't use pickle much but unlike JSON which is plaintext, pickle is binary so when you deserialize it yes it's possible that arbitrary code hidden in the data can be executed.
After a quick glance it looks like that code overrides some of the functions described in https://docs.python.org/3/library/pickle.html#pickle.Unpickler
The overridden "def find_class(self, module, name)" seems to implement some kind of whitelist so that only certain kinds of data(I guess considered safe).
I can't guarantee that code actually protects against possible code execution though, if I were you I would download it if you care but wait some time before executing it and see what happens.


All the AI talk is hurting my no-knowledge-on-AI brain. Apparently there's going to be a part
2 to the leak, can't keep up with /g/ but am happy to download/seed it though.


anon created a guide, probably 100% the real deal now


File:00197-3287249658-((([Remil….png (391.67 KB,768x768)

To go with the story of me playing games with Remilia in that Character AI thread.
This is Remi's gamer pose


File:test.png (320.86 KB,384x640)

I want to kissu her!


File:00002-4009721508-huge_brea….png (Spoiler Image,235.25 KB,512x512)

I also made a loli with pink hair... I gotta get the GPU stuff set up though, this took me 10 minutes and is obviously far from the bleeding edge of this stuff.


File:00027-1587389607-large_bre….png (208.6 KB,512x512)

Got another nice one.


File:20221008_181546.jpg (61.52 KB,512x768)



Interesting how this works.


my wife chino is ballin'


I swear to you guys I was arguing on another corner of the internet that I'm not interested in AI because it couldn't create art in the style of a particular artist, and the artist I was referring to was literally Zankuro in specific, yet here we are. I was crushingly naive. I wonder how far off we are from it making lewd gifs in Zankuro's chubby loli style...


File:00967-1113753283-1girl, ((….png (490.36 KB,576x576)

Utawarerumono riding a banana


File:00980-2759234159-1girl, ((….png (481.14 KB,576x576)

Unsurprisingly it fails to capture Kuon's beauty, although I don't know how to do the tagging with this for Kuon_(Utawarerumono) so I took a guess from what I think I remember seeing.
This one came pretty close to getting her face I think. But, I need to do a thing where I train it.
This is something I/we need to read up on that apparently is a big deal: https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/2284


File:grid-0134.png (2.23 MB,1344x1152)

Wow, embeddings are strong. There's an embedding someone made on 4/h/ for abmono who is quite a well-regarded Miku artist.
This is a combination of abmono and Wakasagihime. Even though I didn't name Miku, abmono images will impart Miku's outfit. Pretty crazy.


Do you mean abmayo?


File:01207-107125510-(masterpie….png (351.08 KB,448x576)

errr yeah, stupid brain


it works because mayonnaise is a a もの, so it's all cool


File:1665622868689087.webm (Spoiler Image,2.87 MB,512x640)

People are making animations with it somehow using a script. (webm contains nudity)

Copied /h/ post:

you can already make a video
this was one an anon made
afaik the keyframes were something like
Time (s) | Desnoise | Zoom (/s) | X Shift (pix/s) | Y shift (pix/s) | Positive Prompts | Negative Prompts | Seed
0 | 0.6 | 0.9 | 0 | 0 | sleeping in bed, under sheet | | -1
2 | 0.6 | 0.9 | 0 | 0 | shower, washing self, naked, from above | |-1
4 | 0.6 | 0.9 | 0 | 0 | eating breakfast, dressing gown| |-1
6 | 0.6 | 0.9 | 0 | 0 | Sitting on a bus, uniform, from below | |-1
8 | 0.6 | 0.9 | 0 | 0 | on stage, bikini, tattoos, singing, full theatre, bright lights, microphone | |-1
10 | 0.6 | 0.9 | 0 | 0 | drinking at a bar, cocktails, black dress, cleavage, earrings, drunk, flirty | |-1
12 | 0.6 | 0.9 | 0 | 0 | bed, (doggystyle sex:1.3), pubic hair, 1girl, 1boy | |-1
14 | 0.6 | 0.9 | 0 | 0 | passed out in bed, under sheet | |-1

I couldn't open the webm in waterfox or firefox, but it worked with brave and mpc


File:01461-2527489783-masterpie….png (304.43 KB,512x512)

This doesn't look like an abmayo Kuon at all, but I like it. Very yukkuri.


File:grid-0163.jpg (526.52 KB,2048x2048)

Remember Eruruu? This is what she looks like now (apparently)


The miku virus infects all...


File:grid-0249.png (2.92 MB,1536x1536)

Ehh.... in theory we could make a Koruri embedding thing, but I think that'd be disrespectful so even if I had the VRAM I wouldn't do it.


File:grid-0258.png (3.18 MB,1536x1536)

Sometime in the bleak future, a cybernetic Tenshi eats a corndog
(some of the results of these tags are pretty disturbing, while some like the lower left here are pretty damn cool even if they aren't really showing what I wanted)
I can really see how you could use this for ideas as an artist or modeler or anyone else in a creative field


File:02436-1162997752-masterpie….png (438.45 KB,512x512)

Good way to find embeddings on 4chan: https://find.4chan.org/?q=.pt (Warning: NSFW thumbnails images are likely)
(They're pt files)
Just grabbed a ZUN one that I'll try later


gonna go learn how this all actually works


File:102281918_p0.png (782.64 KB,1000x1080)

There's a fad that started with an AI generation thing with a glowing penis that has artists imitating it. Kind of meta.

From one of my favorite random creative artists (and maker of that one furry Patchy)


File:102251504_p0.jpg (317.4 KB,617x617)

Heh, some of these are pretty creative. Also where's the original image that inspired it?



that's just a bioluminescent mushroom dude


File:Fget6dGaAAA7FX_.jpg (356.97 KB,2100x2160)

The sperm of the sea


File:no this is NOT Laura, I sw….png (Spoiler Image,717.24 KB,576x768)

I wonder if people are just really bad at thinking up concepts for computer-generated porn or if their tastes are really so plain and boring. From what I've seen on imageboards, and I don't mean to sound too conceited, I'm clearly in the upper echelons at throwing together these amalgamations of theft and I don't even have a great GPU to create the custom models. It seems so easy to me so I don't understand why everything is so ugly and generic when I see what other people are doing.
It makes me think this "AI revolution" is crippled from the beginning because it still requires human input and people have no motivation or direction. It's like when you show a bunch of complex castles and cities built in Terraria or Minecraft, but then you see how most people play and it's simple square prison blocks made from dirt.
Well, I think the crisis is averted because people are still dull-witted and boring even with such a thing at their fingertips.
Quite an addicting thing, however, and I need to pry myself away to work on actual creative pursuits (and I'm saving images and concepts to use for inspiration so that part is actually true)


The anatomy is quite weird and of putting but I guess most Ai images are like that.
I never tried it, maybe I will.


File:dancer.png (3.44 MB,1024x1536)

Part of the problem is that niche topics are inherently hard to generate, since there isn't enough training data for them to turn out good. I am also somewhat put off attempting anything overly complicated (especially stuff with multiple characters), because the more complex the image, the more opportunities there are for bad anatomy/etc. to show up, and without any easy way to selectively fix those elements, I'd usually prefer to generate something basic done well than something more interesting that has mangled hands or whatever. What I do really wish though is that people experimented more with the style-altering options - many of those tags are well-enough populated to work great, and there is no added difficulty in using them (in fact, ones like greyscale even make it simpler), but they can go a long way in avoiding the generic AI-art look.


Yeah it's not perfect, but the thing with porn, especially niche stuff that doesn't otherwise exist, is that the brain overlooks it due to the excitement and stimulation over the rest. It's like your choice is a handful of doodles from some guy from 2008 or this thing creating new amalgamations of fetish fuel with errors. Most people have no reason to use this for porn, really, since it's easily inferior to something created by hand. But if that stuff made by hand doesn't exist? Yeah...


File:ZZX 0229.jpeg (Spoiler Image,390.19 KB,2892x4400)

It's not that niche, though the musculature of this image looks familiar(though done badly due to Ai). I wonder if genres with a smaller selection for the AI to draw from like futa will end up creating images that lean more heavily on one individuals unique style than perhaps others would. It's not just abs but this pen*s looks very similar as well, in fact it looks like the ai has taken it and recoloured it and it's part of why the image looks weird, the pen*s was taken from an image where the body is positioned differently.


File:00318-262339623-(masterpie….png (Spoiler Image,627.37 KB,576x768)

Oh, I'm quite aware dickgirls haven't been niche for like 15 years. The fact that it's ubiquitous is also why any simple image doesn't work, it's no longer manna from the heavens by virtue of existing. Find me some quality newhalf mermaid art with a human penis instead of some weird "realistic" furry dolphin version. Also, give her a nice soft belly, a mature face, a warm smile and an apron. Also it's Takane from Idolm@ster, a girl that shares the face of the first 2D girl I had a crush on (since Luna is too old/obscure to have training data). Here's one I just generated, although it has some pretty noticeable errors.

People have fantasies more elaborate than "a girl with breasts of any size, preferably alive" and it's not any different in my situation just because a penis is involved.


File:xy_grid-0150-3577515976-(m….png (2.58 MB,2304x950)

I'm going to start dumping info and stuff in this thread, although I think most visual experiments will be posted on /megu/ since I'm mostly into this stuff for niche ero.
Someone asked how you could make transformation stuff, and this is how. Although, I had to ask on 4chan because I had the syntax wrong.
The syntax is [A:B:#].
A and B are the two things you want to morph over image generation.
# is the percentage of influence one has over the other, as a percentage (.1 is 10%). In my image example the left-most image is 10% angel and 90% demon girl.


File:firefox_IkHhhePva5.png (41.5 KB,762x647)

To make an image set like this you want to go down into Script and use X/Y Plot, then select Prompt S/R.
In this example I have it start with 10% angel and 90% demon and then end with 90% angel and 10% demon.
The X/Y script is a massive help in finding the ideal settings, so people use it a LOT.


File:00297-3520136844-masterpie….png (368.83 KB,640x384)

>pt files
What exactly are these? I think I heard that these are "hypernetworks" or something and that you can use them to fine tune a model, or to bias it into giving different results or something. I can't really seem to find any though? Not that I've looked very hard, I'll admit, but it seems people are far more interested in specific models than hypernetworks. Likewise, what's the deal with merged models and pruning?


File:embeds.zip (1.18 MB)

.pt files show up in a few places, but when people are talking about it and it's not troubleshooting it's about hypernetworks. Back when I made that post embeddings were the cool thing (and they also use .pt), but now it's hypernetworks. They're basically fine tuning things for a certain concept, but it's almost exclusively specific artists or characters. IE this was using the embedding that mimics abmayo >>98094
Embeddings are called by name in the prompt, whereas hypernetworks are loaded in the Settings. Embeddings are 20-80KB whereas hypernetworks are 85+MB. I personally liked embeddings a lot more not only because of the file size but because you could combine them. I guess hypernetworks are better and that's why everyone uses them?
Here's my embeds folder. Some of them were just uploaded without labels and I never figured out what they did, like the 3 named "ex_penis".
Extract the folder in the main WebUI folder so it's like:
and then you should be able to use them.
The badprompt ones is actually something newer. You put it in the negative prompts with 80% strength, I.E I use
(bad_prompt2:0.8), lowres, bad anatomy, etc


Does this only work for two tags? Or can you batch together multiple into the percentage.


Probably, but I haven't checked. I guess it'd just be A:B:C:# for 3 and so on


Interesting sort of addendum I found for doing this sort of thing:
>you can [x:y:z] / [:y:z] / [x::z] to make the ai draw x then y at stemp z (or percentage of steps if you put a decimal), which works great for stuff like [tentacle:mechanical hose:0.2] to make the ai draw tubes everywhere, or you can do x|y... to make the ai alternate between drawing x and y every other step; you can put any number of things here e.g. x|y|z|a, but obviously the more you use this the more steps you need, in general


That's exactly the post I saw that made me want to try it. I heard people mentioning this functionality weeks ago but completely forgot. It seems rare that anyone uses it, but it could be really great


When I try making one of these I get a
>RuntimeError: Prompt S/R did not find angel wings:demon girl:0.1 in prompt or negative prompt.
Does this mean I need to put the tags into the prompt somewhere? Or attach an X to them?


The first thing listed there has to be in the prompt for the rest to replace it. You should be able to hover over it for a tooltip.
masterpiece, picnic, turtle, eating banana

in the script you'd put
banana, burger, corndog


Gotta say, reading the documentation for all this stuff regarding stable diffusion has really impressed me with how much work and development has gone into making the open version as great as possible, beating out even its premium competitors.

I guess this is the true power of computer dorks trying to get the perfect porn.


File:1670036798856.jpg (233.99 KB,800x1257)

So I saw this one website making the rounds that's 2D-ifying or whatever images of real people or characters, and I have to wonder how you'd do the same with an image of your own in Stable Diffusion. Like say you wanted to draw a certain character, from and image, in the style of Asanagi maybe wearing some different clothing. How would you do that?


File:box on beach.png (104.7 KB,512x512)

You use img2img, which can itself be guided with a text prompt like txt2img so it's really more like img+txt 2 img.
As an example here is an image I drew


File:2022-12-03-18-31-45-393406….jpg (1.02 MB,2432x3143)

... and here are some variations created with the Stable Diffusion 1.5 model using the following prompt that matches the image contents:
"open cardboard box on beach, sunny day, waves crashing on shore, frothy sea, deep blue sea, photograph, daytime"


File:grid-0228-2284799521-maste….png (4.68 MB,2048x2048)

It's likely a very generic prompt that has a denoise of like .5 or something to keep the general shapes but still alter it enough to be noticeable. I saw someone point out that they look like Genshin characters, so it's probably using something trained on its images.
I have a Genshin hypernetwork for that so let's see the result when I throw some stuff in: (pic related)

I don't want to spend a bunch of time trying to replicate it, but you get the picture. It probably uses a few traditional artists tags since people have done lots of examples of those, including myself


File:2022-12-03-18-38-23-227899….jpg (1.52 MB,2048x2048)

... and if I do the same prompt and same settings as in >>100505 but without the input image, this is what I get.
The cartoony nature of my image is at odds with the Stable Diffusion model's realistic photograph style. Getting anything done with this sort of thing is probably best when it's iterative, mixing both txt2img and img2img.


File:firefox_Kc9SXtwoPZ.png (56.77 KB,506x754)

I have learned some things to make things a bit easier or cooler, although you might already know them. On the right-most part of the Settings tab:
This "show image creation process every N sampling steps" at the top of the image here is apparently what lets you see, the uhh... image creation process. I had no idea this was here since I was expecting it to be more prominent.
At the bottom of my image you'll see a text box. Replace it with this text:
sd_model_checkpoint, sd_hypernetwork, sd_hypernetwork_strength, sd_vae
And it will show those at the top of the main window so you don't need to go into the Settings tab every time you want to mess with the hypernetwork or vae. Pretty cool!


File:explorer_Kisf6EEziK.png (789.06 KB,1098x828)

Surely you knew this was coming.
I am going to begin the process of bringing beloved Kuon to this so that she can be generated as easily as a 2hu! I'm debating whether to keep it centered on her and use a variety of artists, or to go all out and restrict it to official art and use hundreds of images to try and get the Aquaplus (Amaduya) style. I'm leaning towards the latter, although again I am filled with guilt before even attempting it.
Well, at least he's already successful and famous in some circles and doing games and doujin stuff and is well appreciated so it's not like I'd be robbing him of work? Bleh. The ethical quandaries of this stuff...


Oh, after testing with this it does seem to greatly increase the time it takes to generate stuff, so maybe only use the 'image preview while generating' thing if you're unsure where to stop when working on settings, and then set it back to zero when you're actually producing a bunch of stuff.


File:02543-986531908-(masterpie….png (347.39 KB,512x512)

A lot of knowledge about this stuff requires scouring and searching or surreptitious posts, so I'll try to share some more info.
This time I'm going to talk about two Extensions that I use a lot.

The easiest way to get new Extensions is to go to the Extensions tab of the WebUI and then go to Available and hit the "Load from" button with its default URL. From there you can install stuff, which will then show up on the Installed tab. For a lot of this stuff you need to restart the UI from settings if not restart the .bat file itself.
The ones I use and can give detail on:

Dynamic Prompts: https://github.com/adieyal/sd-dynamic-prompts

This is used to randomize creations on each image generation. You can use it with new words in the prompt, but I've never done that. Instead, I mainly use this to call random effects from wildcard text files. You create a text file with a new line for each possibility put the text file in /extensions/dynamic-prompt/wildcards/text.txt and then call it from the prompt by its name with two underscores on each side. For instance you can make haircolor.txt and put this in it:

green hair,
blue hair,
red hair,

and then put __haircolor__ in your prompt and it will randomly pick one of those each time an image is generated. This means a you can make a batch of 10 images and come back to different results. This is really, really good if you're just messing around to see what works. It can also call other txt files from inside. I'll share my wildcard text files soon. It also has a "Magic Prompt" system that I've never used, but it could be cool? Beats me. Someone else do it.

TagComplete https://github.com/DominikDoom/a1111-sd-webui-tagcomplete

It autofills booru tags for stuff based on danbooru, which NAI and the 'Anything' model is. Really, really nice, but can also be annoying at times with the pop-up. Unless you have tags memorized this can help a lot. Speaking of, you should make yourself accustomed to danbooru's tags:


File:wildcards.zip (27.04 KB)

Here are my wildcard text files. Some of them I downloaded and modified, other stuff was as-is. You can get a pretty good idea of the stuff you can do with this.


File:firefox_ouJxTVYHXq.png (577.01 KB,1184x789)

This deepdanbooru thing that scans images for tags is really impressive. It's not perfect, but good lord, we could only dream of such things a few years ago, right?


this sort of thing has been possible for a few years, but without danbooru datasets used for art training it wouldn't be easy


It's already a few years old, isn't it? Here's the original Reddit thread about it, and it's been discussed on 4chan in the past as well. I recall there originally being some talk about the possibility of it actually being used for tagging, but it's not good enough to replace manual tagging anytime soon and is otherwise little more than a novelty.


Oh. Wow, it's 4 years old?
Well, anyway, it's really cool how it's used here for immediate benefit. You can use it to assist in image tagging for training, but also as a building block to generate new images.


File:firefox_r6kosW4vBR.png (44.18 KB,958x555)

So, the training setup I put together from what I read. Much of the information is from the discussion here: https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/2670
Also, thanks to people on the /h/ board of 4chan as those guys are great. Don't use /g/, but maybe that should go without saying.

Modules: (I checked all of them because it's unclear what they do. Everything was checked by default except 1024 which seems to be a new addition)

Layer Structure: 1, 1.5, 1.5, 1.5, 1. This is called a 'deep network' as opposed to default or wide. Default is good for most things, particularly if you have a low amount of images (20ish was mentioned). Wide is for specific things like an animal, character or object. Deep is for style, which most people seem to be using hypernetworks for, with embeds for characters. It doesn't have to be, but that seems to be the pattern forming.

Activation Function: Softsign. lots of math talk and graphs I don't understand, so I just went with the recommendation.

Weight intitiation: XavierNormal. Same thing as above

Layer normalization: No. I haven't seen anything informative about it, but no one seems to use it

Use Dropout: Yes. I heard it's good if you have a "larger hypernetwork". I think that means the numbers in the Modules up there and also the amount of training images used. I had 90ish images and did the mirror image thing to turn it into 180ish, but that's definitely not as good as 180 unique images. I don't know if it was good or bad that I used Dropout, but it didn't ruin anything


File:firefox_sIU6yWlBwt.png (83.95 KB,949x1062)

And once you get to the Training tab you can load the hyper you just created (or one you've downloaded maybe? that part seems questionable)

This tab is for training embeds or hypernetworks, but I've only done hypernetworks so I can only talk about that.

Batch size: I haven't been able to find conclusive information on this since 'batch size' is text that is shared in every prompt so you can't just search for it by name. It uses more VRAM, but might not necessarily be better at training. The ONE comment I've found on it says that you could increase it instead of lowering learning rate later on. I'm already at VRAM limit when training and having a video and photoshop open so I don't touch this.

Learning Rate:: I think people start with the default for these. Only the hypernetwork number matters for hypernetworks. I see people add a decimel point in front of the 5 as the training steps reach 5000 to 10000, so I copied that. It sounds like the lower number is better for finer detail once you've established things

Gradient accumulation:: A newer thing, supposed to assist in training rate somehow, but I don't know how. It mentions something like "learning in parallel" or something. I don't know. People say to use it and set it to like 5 so I have it at 5.

Dataset Directory:- The image with the folders. I could talk about images, but I'd mostly just be repeating this: https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/2670

Prompt template file: This is a list of basic prompts that are added to previews alongside the included tags attached to each image. People say it's fine as default, but might be something to mess with if you want to check for specific stuff?

Width/Height: Keep it at 512/512

Max Steps:: How far the training will go. This is stuff that takes days, though, so I'm not sure how useful this is, because of something I'll talk about in a sec. I suppose it's good if you only want to run it for a set amount of time.

Save an image every n steps:. It saves an image as if you prompted it with random tags included in your training folder, but it can make freaky combinations that you wouldn't normally use so keep that in mind.

Save a copy of embedding every n steps This is an important one and why I didn't care about the Max Steps thing above. It saves the hyperwork with the number of steps to the folder automatically. By default it's at 500 which is where I have it.
This means the folder will look like:
as it trains for longer periods of time.
There's an option in settings under Training to Save Optimizer state, which allows you to resume training from these saved files. VERY important!.

Note: To use the hypernetwork (or resume it from file) you need to move it from the saved directory (default textual_inversion) and move it to the models/hypernetworks folder

Save images with embedding in PNG chunks I think it lets you use PNG info like normal generated images. I kept it on.

Read parameters from txt2img For preview images it takes what you typed in the txt2img tab. I never used this since I wanted a variety of images, but it could be useful? I read to never use tags like 'masterpiece' or 'quality' there, though.

Shuffle tags Yes. It adds more variety to the images by changing priority or something.

Drop out tags with prompts: I think it drops a certain percentage of prompts per generated preview image. I kept it off, but not sure. It's just preview images and not the actual training itself, so I guess it could improve or hinder perceived accuracy there.

Latent Sampling Method I only hear people mention deterministic so that's what I went with.


File:explorer_dWd3RdfklC.jpg (424.05 KB,1395x1111)

I've been cropping and positioning images the past few days for the Amaduyu/Aquaplus thing I plan to train. I started rotating some of them, too, and might go back and do that with some of the images I've already done. I've been kind of OCD about focusing on this because it will take a long time to train since I want it to be extremely thorough and I'm sure I will make mistakes. I'm starting to worry a bit about how much of it isn't Kuon. I think I might need to make one specifically for Kuon or use an embedding to pair with it somehow? Not sure how they work together.


File:firefox_glhn53KTf1.png (347.91 KB,1474x754)

I really didn't expect it to get this one this well. Man, this stuff is good.
It definitely shows that you have to clean the tags manually, though, as her tail isn't there. Got a bunch of iamges that weren't on a booru so they don't have tags.
But, I'm not sure if I'll use it because it looks too complex and would probably mess things up


File:explorer_fhrs7k1XEO.png (1.69 MB,1673x1107)

So much effort...
People have made other progress with cool extensions and stuff, but I can't remark on it yet since I haven't messed with them


File:BooruDatasetTagManager_MgO….png (924.43 KB,1898x618)

cleaning up tags
the auto tagger does incredible work, but it's not perfect. For example I had to add the furrowed brow, :o, and portrait tags


File:kuon31.png (458.26 KB,512x512)

Apparently having a bunch of plain backgrounds is bad, so now I have to go back and add backgrounds. I guess it's good that I have an Utawarerumono artbook with some backgrounds on it, although I had to scale them up with waifu2x. This is so much work and I regret doing it, but I'm too far now.


File:explorer_tXnkh1IWRr.jpg (460.15 KB,1643x1116)

Okay, I'm finally going to bump this thread because I'm now finally training it! This new method is like 20x faster somehow. The prompts are randomized during this so some of the abominations you see are because weird tags are being combined like 'leg' and 'bed', but no '1girl'
YES! IT'S ALL PAYING OFF! MY HOURS AND HOURS OF LOSERDOM IS REACHING FRUITION! All the wasteful pruning, the tag specification and elimination!

(hope it learns the proper position of utawarerumono ears though...)


they all seem kind of kuon like


File:aqua-1450.png (340.19 KB,512x512)

Well, it's the same artist so they should look somewhat similar, yeah.
But in that image I posted I can identify different characters. I'm not sure how exactly it works because sometimes it's very random, but other times it's obvious, like how this is an attempt at Kamyu (even though I never labeled her and it wouldn't understand it anyway)


File:aqua-3325.png (405.31 KB,512x512)

it's learning... IT'S LEARNING!


i think it can get more perfect


File:02064-4265108999-1girl, an….png (10.24 MB,2816x2688)

Hmm... after a full night I'm not sure if it can. At least overall, I think getting a perfect one is still going to be rare. I see what people are saying now and that you're probably not going to notice gains after 20000 steps or so. But, I think it still needs improvement somewhere.
I need to look at it and see if there's stuff I can improve upon, which basically means I'll train it again at a different rate. When I tested making a hypernetwork a few weeks ago one night was like 2000 "steps", but now I just did 58000. (meant to do 50000, but forgot I resumed from 8000). It saves a progress of its current progress, so if for instance it made the best result at around 20000 steps and then went haywire, you can grab the backup it created at 20000 and either just use that as the completed product or resume training from it.

Well, now that I've done the style hypernetwork I should try making 'embeds' which I'll use to teach it characters. It still doesn't know who any of these girls are so I can't actually call them directly and instead need to use their traits and hope it arranges it correctly. For instance it'll never know Kuon's proper clothing or ears unless I create an embed which I can invoke from the prompt. From what I've read, when training an embed you label everything EXCEPT what you want it to call.

Maid Kuon at the computer! It really can't do hands or keyboards, but that's not specific to this.


File:02077-4265108999-1girl, an….png (9.17 MB,2816x2688)

...and here is the exact same prompt and seed and everything else, but with my hypernetwork disabled. Well, maybe it IS messing up hands and arms a little bit. I'm not sure if that's something I can fix. Do I go back and pay lots of attention to hand tags? I guess I could try that...


File:02132-2730948104-1girl, je….png (945.23 KB,768x1024)

Hmm, so this is an instance of improper tagging showing up. Karalou's slave collar is technically a collar, but I must not have eliminated its generic 'collar' tag, it's grabbing from Karalou for bride Kuon's 'detached collar' here. This stuff is so crazy


File:02124-169421348-1girl, jew….png (895.63 KB,768x1024)

and this would be a much more accurate representation of what it's supposed to look like


dont remove it, kuon looks sexy with a slave collar


File:BooruDatasetTagManager_3iu….png (1.05 MB,1287x715)

It wouldn't be removed exactly, just require an actual 'metal collar' prompt to show up. I guess I should remove 'breasts' from everything, too. I'm not sure why boorus have redundant tags like that.
... I think?
'detached collar' isn't even here, so maybe this is something I can't avoid at all. Or maybe this is the base data... I guess I should do some testing... either way it probably wouldn't hurt to specify things more


File:00031-3720141358-1girl, so….png (648 KB,576x768)

There's an extension called DAAM that creates a heatmap of the effect of a prompt on the resulting image. It's really quite amazing that this exists.
This is the result for "low-tied long hair". It's supposed to add, well, long hair that's tied. However, it's broken and seems like it's applying to her clothing instead and is adding tied ropes to it. This is a tag to avoid and maybe I should purge it from my training thing.


File:00032-476515586-1girl, sol….png (608.4 KB,576x768)

The heatmap for 'smile' is exactly where you'd expect it to be


File:02167-99510245-1girl, anim….png (12.86 MB,2688x6286)

God damn. Okay, I can say that switching from a 1, 1.5, 1.5, 1 neural net thing (whatever that means) to a 1, 1.5, 1.5, 1.5, 1 one was a massive upgrade. I don't know why. Oh, and I think I MIGHT have clip skip set to 1 instead of 2, but that wasn't supposed to be a big deal. Hmmm.
The first one here, 'aquaplus' was my training 2 nights ago whereas the two others are different checkpoints from the one I trained last night. I just don't understand how it's such a massive improvement.


File:02168-2730948104-1girl, je….png (3.17 MB,1344x3142)

Okay, this is weird. I did the exact prompt here and the new ones look bad with this DPM 2M Karras sampler...


File:02169-2730948104-1girl, je….png (2.95 MB,1344x3142)

but on the default euler a sampler they look fine, but the first one seems like it's probably the best, but still within the normal variation you'd expect.
more testing needed....


File:explorer_ICmVTo8wHE.jpg (600.31 KB,2158x1055)

Now training the Kuon embed. I will combine this with the artstyle hypernetwork and it should have great results. That's the hope at least.
What this means is that instead of typing "yellow eyes, swept bangs, etc" and hoping it assembles them correctly I will invoke the name of the embed and it will fill that information in in a way that would be impossible to attempt manually. I don't expect the hair ornaments or clothes to look perfect, but it should definitely get her hairstyle and ears right. There's about a 0.0003% chance that it will get her skirt pattern right.


File:00163-4181300038-solo, smi….png (720.31 KB,768x768)



File:tmpldpue00e.png (15.32 MB,3840x3072)

This was the batch
solo, smile, 1girl, thekuon, :d, sitting, looking at viewer, pov


File:00168-908452894-solo, 1gir….png (971.81 KB,768x1024)

It works alongside other words, like 'swimsuit'. Man, this stuff is truly amazing.


damn that looks really good too
at a glance it's hard to tell those are AI


Congrats, seriously. Looks damn good.


File:1367881241884.jpg (91.58 KB,700x700)

Looks like it turned out very well, congrats!


File:00236-3931658932-1girl, so….png (514.65 KB,832x576)

Thanks, guys! This is really a new world and I'm not sure how to feel about it. I feel guilt over artists, but at the same time I'm really enjoying messing around with this stuff. It's been taking up too much of my time, though, so I need to get back to doing other stuff...
I can envision a 3D>2D>3D kind of pipeline in my head right now. If only my tablet wasn't broken... hope I can RMA it soon.


File:FmWCnRfakAE9boP.jpg (237.43 KB,1024x1536)

Saw some really neat AI art that was good enough for me to save. Is it getting even better or something?


File:00501-3541163114-1girl, so….png (954.29 KB,840x840)

By now the default with the newer fine-tuned models is far more impressive than the NAI leak stuff, but it's still all built on that. You're actually in a far worse position if you're paying for it now.
This high detail one is based on a mixture of real life and 2D art so it can do things pretty well, but you have poor control of it. It looks extremely impressive, but it's not obeying my very strict training of Kuon's outfit, so imagine trying to get something that you haven't trained.

I've been wondering if I should start grifting since I know what I'm doing and everyone else that knows what they're doing is, well... kinda normal. But, being normal is what gets you the most exposure and success. I can't name gacha characters, for example. But, I could corner an AI fetish market especially if I combine the training with my 3D models. This is when it'd be good time to have the motivation to do things


Sounds like such a bad way to put it... but I get what you mean. I think if you consider yourself capable and with some desire to do so you should try it out. I mean training your own stuff is probably a long and arduous process, more than most are willing to invest into it.


Im very proud of your progress.


Its kind of crazy that if you put the effort into training one of these things you could have unlimited fetish porn?


I havent played with novelAI outside of porn but I may generate non-ero OG fallout fiction with it to see its quality


>unlimited fetish porn
That is exactly what it is. If people think there's "porn addiction" now, wait until normal people get a hold of this stuff. I have a blog of the pornographic progress I've made on /megu/.
Still, I'd prefer human-made stuff if it existed, and stuff that's more mental like incest isn't really something you could satisfy with an image alone. You can't tell stories with this and stories are really, really, REALLY good. A good doujin is easily better than this stuff, but when you're dealing with specific tastes then yeah, it's the best option available.


NovelAI really likes futa


File:FmB4eORaYAAEejg1.jpg (725.81 KB,2829x2237)

Came across this and thought that it was neat enough to inquire what the heck is with the coloring ability of these AI, it's pretty great.



What's interesting is it doesn't necessarily make the same mistakes a human makes when drawing hands. It makes its own sort of mistakes you don't see in real art.


The most popular checkpoint models these days (for those doing it offline) are a mix of 2D art with conventional photography. It increases hand quality a bit, but it's still far from from passable most of the time without a bunch of "inpainting' which is basically like selectively retrying a part of an image. Some of the models people use look more like real life photos with an advanced filter on top, which can be very creepy and also takes away from some of the appeal since it introduces 3D limitations in perspective and such


Colouring shouldn't be hard for an AI. It could just pick colour pallets form existing images in the same pose and apply them.

Even though the line work is done the lips of the girl on the top right are weird. Also I am not sure if I ever noticed this before but the hair on these AI girls is quite bizarre(the ones on the right), not only is the fringe not symmetrical but the far side looks weird.

The backgrounds are odd too, the sunset kind of drops suddenly in one image and the floor boards in another are all different widths.


I mean top left not right...


File:00981-919244970-1girl, sol….png (969.76 KB,816x920)

As an example, as I continually attempt to refine my custom merged checkpoint for, uhh... /megu/ reasons you can see the effect of one of the models already having some RL stuff mixed into it. The shading is absurdly good, but I really have to fight it to create clothes that aren't modern and it feels very "real" which can be a good thing and a bad thing depending on one's tastes. (also as a side note need to figure out why it's ignoring tags)
And look at that hand. I didn't do any editing here. But, it definitely looks like a real human hand. I don't know how to feel about it. I guess maybe for now it's a sacrifice to make if you don't want to do edits, but I like style over reality.


File:00991-1957439408-1girl, so….png (786.16 KB,712x816)

Another example of hands in logical position


really amazing for ai hands


I wonder if it's possible for AI to make manga or if that's far too many variables to be solved in a realistic timeframe


Have you ever tried using doodles you make as a base for the AI to build off of? Wondering if that's more effective than just generating a bunch of images that may vary in psoture/position each time.


File:01464-2023-02-03.png (1.04 MB,768x864)

I did a whole bunch of testing with various RL models to see if I could understand how exactly people are making them assist in 2D hands/poses while not giving them a massive hit in quality and I really could not find any pattern. Although, my tolerance for spending hours making small merge differences is getting pretty low and I need to spend some time doing other stuff before getting back into it.
However, I did think I have an idea of how to bandage it. THe LORA things are basically like "plugins" for a checkpoint model, and for example the Amaduyu/Aquaplus one I made is pretty good at fixing the faces, but then of course they will always have at least a hint of Amaduyu/Aquaplus so I'd need to mix them with other LORAs.
It's also useful to use a thing called kohya, which is normally used to create LORAs, to separate merged checkpoints into their base ingredients. This means you can more easily control the intensity of something without needing to create a bunch of 4-8GB merge files.
Seems like there aren't any new amazing models recently, just merges of existing stuff (although some of them are quite impressive)
So, I can't think of any notable breakthroughs in the past month, just refinement.

Still, I continue to be annoyed by all the people using "waifus" in these things. I know, I know, it's a generational difference and they don't know any better. But it still annoys me.


File:saltsypre_502.ogg (26.19 KB)

With the popularity of AI voice cloning and eleven labs AI going payed, I decided to look into some of the offline runnable alternatives. The most popular one or alteast easiest to setup, seems to be Tortoise-TTS. It works okay enough and has some pretrained models the author directs you to use. There's a a guide and git repo someone setup that provides this service with a web interface https://rentry.org/AI-Voice-Cloning
The biggest issue I and many others have with tortoise is that the main author won't release a guide or overview of the process he went through to train his model. Simply saying if you're smart enough you can figure it out. Kinda leaves people at in impass for actually using this program as an alternative to eleven labs.

I've had some minor success with one of the alternatives (unofficial) VALL-E, https://github.com/enhuiz/vall-e
It's taken me a bit of dependency chasing and cobbling together a separate PC to install linux on (the DeepSpeed dependacny has been a nightmare to get working on windows) but, I've actually been able to get a "decent" output with a 3060 12gb card and about a day of training on ~7.4k couple sec audio files ripped from Vermintide 2. I'm not an expert in ML but the result I got with training a model from scratch with this limited data set and a "low powered" card make me optimistic for VALL-E's potential. I didn't really have to know much about machine learning, just how to install various dependacnies and 3rd party utilities.

VALL-E is based on phonemes so the text to be synthesized is meant to be sounded out, I think. I don't know if there is a whole lot of prompt engineering that can be done with this program, though my current model is probably too limited and untrained to really test that out.
Attached is the voice I wanted to clone.


Here is the output.
The prompt text was "Blackrat spotted! Keep your guard up!"


>refine my custom merged checkpoint
I haven't been closely following your posts, more just watching your results, when you talk about custom check points are you training your model (custom data set of /megu/ images) starting with some base model as a checkpoint? How are you doing that for stable diffusion and what sort of time sink is it/hardware are you using?


File:01962-sfw,_painting,_(J.M.….png (1.3 MB,896x896)

Mm, how to explain...
The "custom model" I've been talking about recently is a merge of existing checkpoint models, which is something like NovelAI or Stable Diffusion. My most recent one using Stable Diffusion, NovelAI, Yiffy (for genitals), Anything (that's its name), AbyssOrange2/Grapemix and a couple others that I'm trying to switch in. (GrapeMix doesn't seem to have any RL image data, so I add in some of the basil mix myself that AbyssOrange2 does, that guy was onto something for sure)
They're large (2-8GB) files that contain a whole lot of training data and you need a really powerful GPU to train them. I'm not sure you can even make them without something like 24GB of VRAM at minimum, and then you need a whole lot of time (weeks of constant processing) if you don't have like $50,000 worth of processing power sitting around.
However, someone like me can create merges of them with custom settings that hopefully take the desired parts of A with the desired parts of B. But, you definitely make sacrifices when you do it and the trick is to try and counteract them. It's a really annoying process, though, because there's no guide to see what each setting does so it's a bunch of trial and error. Every time I think I noticed a pattern, I change a different slider and it completely invalidates what I thought I knew. Also each merge takes like 30 seconds to create, 10 seconds to switch to, and then however long the generation takes on your current settings. Also when switching between them your VRAM can get corrupted somehow and you need to restart the program so you don't get false results. Each merge is also 2-8GB so you have to routinely delete them and take screenshots/notes of what you've learned, if anything, from the merge results.
The main training data I've done myself is for Kuon and the Amaduyu (Aquaplus) hypernetwork/LORA things, although I've done some other artists to mixed results. They rely on getting layered on top of a checkpoint model, so they're heavily influenced by it.

What kind of timesink is it? Weeks, but I do other stuff while it's merging and generating. I can't imagine most people will want to do it. But, I've also been doing this stuff since early October so I guess it might be a slower learning process for other people.
As for hardware, I got a "good" (less absurd) price on a 3080 12GB for Black Friday


File:firefox_vJasASdG3I.png (113.39 KB,1668x1155)

And this is what the "Merge Block Weighted" panel looks like to make make merges that are better than just a brute force "30% of this, 70% of that". Pretty self-explanatory, right?


File:BooruDatasetTagManager_0Il….png (739.6 KB,1162x533)

I think I can describe the difference better. The "checkpoint" model is the database that has the actual definitions and data on the information of a tag. When I trained my Utawarerumono, Kuon, and other things I was training it against the NovelAI model. The images have tags like "ainu clothes" or "from side" because that is specifically the booru tags that NovelAI trained. I'm not defining what those are, I'm providing information on what they look like when drawn by a specific artist, and the training process compares it to the information that NovelAI has. There's a huge gulf in defining the tag itself and merely referencing it.
People, including myself, have trained concepts (which is what a tag is), but it's just one at a time.

The horrendously named "waifu diffusion" has been undergoing training on its new version for over a month now, but it was just at Epoch 2 when I last checked a couple weeks ago so it might be at 3 now. One epoch seems to take about 12 or so days to complete? People said the first epoch sucked and 2 might have shown that the finished product could be good, potentially, but we'll have to wait and see. It will probably not be something to test out for real until Epoch 6 or so?
But, I haven't been paying attention to any news about this stuff


what is a checkpoint model?


nevermind, it's a database with the tags associated with images


File:02137-v1-5-pruned1girl,_so….png (703.97 KB,704x768)

Basically that, yeah. It's the skeleton that everything is built upon. The most famous one is Stable Diffusion (SD) and everything I'm aware of for offline AI image generation makes use of it. The 2D models still have the SD data in them so you can use words that boorus have no knowledge of and get results.
It's worth noting that most people using the offline method are using the older (1.4 and 1.5) versions of Stable Diffusion because the ones after that started aggressively purging nudity (but not gore) and potentially other things, from the training data. This had the effect of breaking the things trained on the older models, which includes stuff like NovelAI which nearly all 2D models make use of.
The last time I checked people were not so impressed with the newer SD models that they were willing to sacrifice a "pure" data scrape in favor of one curated to make it more attractive to investors


File:kagami.png (244.16 KB,800x781)

This seems kinda funny since now that the cat's already out of the bag and people have access to the older models in their entirety, there's no real reason for people to use the newer models and a loss of functionality just means it'll become irrelevant as people improve the current local models. It may seem like a good move for investors at first since all the other AI companies are doing it, but the one thing they have that SD doesn't is that people don't have hands on their code to use it unneutered.


File:pose.png (1.14 MB,2304x767)

The newest technology just came out a few days ago!
ControlNet lets you control the generated image by pose and composition though normal and depth map, edge detectors, pose detection, segmentation etc. This is much easier and finer control compared to regular img2img.
An extension for webui also allows you to adjust pose as you wish.

Guide: https://rentry.org/dummycontrolnet


hm, so they gave up and decided that this is where humans need to come in and give the images context.


That's quite the leap.
Could you make a leaping Megu?


File:jump.png (1.15 MB,2304x768)

i tried


File:[Rom & Rem] Urusei Yatsura….png (727.29 KB,960x540)

Dang, that's cool. This is what happens when you take a break from checking for AI news in /h/, huh.
Seems neat, but it's also introducing more effort into generation which isn't really my thing. I had tried to use depth maps about a month ago, but learned that it was limited to Stable Diffusion 2 and above, which kills any desire that the majority of people on imageboards would have for it. So any extension that makes use of depths maps, but not requiring the neutered corporate-friendly SD is great.
I'm not sure I'll use this, but it's cool to see in action nonetheless


Angry boobs


Wonky legs, but still impressive.


File:grid-0076-1girl,_solo,_(mi….png (5.12 MB,2560x1664)

Someone on IRC asked me about how to go about generating images for Miyako from Hidamari Sketch, done in the orignal Ume Aoki style. We already know that doing a style is impossible without training since artist tags were purged for Novel AI and that's what most 2D models (including everything I use) is based on. So, the question becomes whether it recognizes Miyako. Unfortunately, as you can see, it does seem to somewhat know of her blonde hair color, but everything else is a mess.
Conclusion: Miyako has to be trained as well.


File:grid-0080-1girl,_solo,_(mi….png (5.18 MB,2560x1664)

I searched through some 4chan pages and found that someone did create an Ume Aoki LORA. It seems to work pretty well at capturing the style and also seems to capture Miyako to a degree, but it's still not accurate.
It's in here if you want to download it yourself (use Ctrl+F) https://gitgud.io/gayshit/makesomefuckingporn#lora-list
So, I told the guy to start amassing Miyako images which will be combined with the Ume Aoki style LORA.
Things to note for good training images for a character:
1. Solo
2. No complications like text overlaid upon her
3. Text elsewhere in the image should ideally be edited out
4. Limited outfits. Ideally it'd be maybe 3 or less, depending on how many images you have. When I trained my Kuon stuff I did not bother since she is portrayed in only one outfit about 95% of the time. Each outfit will need to be tagged in the training process and called upon manually with a custom tag of your own choosing during image generation later on. She can still be portrayed in other outfits, but if you specifically want her in her own original clothing you need to train for it.
5. Different angles and "camera" distance. The more variety of angles you have, the more accurately it can portray them later on during image generation, although it does a pretty good job of filling in the blanks since it already knows how human characters should look from different angles.

Then the images themselves should be cropped to be somewhere squarish. Unlike the old days of late 2022 it does not need to be exactly 512x512 pixels, but you should avoid images that are too tall or wide (heh) at like a 1:3 ratio or something. I'll talk about the other stuff after I get the images


File:Sunshine Sketch - c129x1 (….png (1019.87 KB,900x1291)

Miyakofag here, yoroshiku onegaishimasu, and my deepest thanks to yotgo for his help.

A very important factor to consider is how easily the characters go from chibi to normal and back, as seen in pic. In their non-chibi style their head shape is somewhat hexagonal, with fairly sharp angles, while their chibi form head shape is usually either between a full oval and a curved rectangle, or a mix of the two with a pointy side bit like Yuno has in the middle, and the first has regular eyes and features while the latter two are (✖╹◡╹✖). Also visible in pic is how the wides are presented in variety of outfits, like Miyako getting a change of clothes in the middle panel, and then immediately returning to the first one.

I've downloaded the manga, but it's monochrome and fairly crammed, so it doesn't look like it'll be of much use. Seems like I'll have to take a few thousand screenshots of the anime again, but that's fine by me. I'll also begin to comb through boorus for useful art, and there's this other meguca stuff I'll be downloading in case they can turn out to be of help for setting up Ume's general style:


File:explorer_F0NcBv8fyY.jpg (382.57 KB,1424x1091)

Hmm, keep in mind for a character that you want the character to be the focus and not the artist's style. It's better to have a more varied collection from various artists than a limited number from the official one. You're not training the shape of her head or how her mouth is drawn, you're training the combination of her outfit and eye color and hairstyle and the visual traits that identify her.
I can generate images of Kuon in different styles because it's not constrained to a specific style itself.
When I generate an image of Kuon to look like her original Utawarerumono appearance, I activate my Kuon LORA (Kuon herself) and also my Amaduyu LORA (the Utawarerumono style). Combining them into one would be severely limiting.


File:grid-0084-1girl,_solo,_gra….png (5.95 MB,2560x1664)

Kuon with Kuon LORA and Amaduyu LORA and the downloaded Ume Aoki LORA. This is to show what a merged character and style LORA would look like together with another style.


File:grid-0083-1girl,_solo,_gra….png (4.49 MB,2560x1664)

but with the character and artist separated, I can apply Kuon and the Umi Aoki style together without the influence of Amaduyu.
Hmm... not sure if this style will work.


File:grid-0360-[Grape! Base]1gi….png (3.1 MB,2048x1408)

Bleh. I had trouble training, but got it to work but then it came out like THIS. I really should have kept my old settings, but noooo I had to see what the new stuff was like.
I noticed that some of the images you gave me were small and I think I'll have to exclude those. They should at least be 512x512 and I think that's the main reason why it looks so blurry and low quality here despite being relatively accurate in some images.


nice, hexagon headed kuon


It already looks really, really good.
The small crops are my bad, I had taken "does not need to be exactly 512x512 pixels" to mean "smaller pics are okay", there should be a dozen, dozen and half pics to remove then, maybe a few more. There's also one where she has her top but not her shirt, which may explain the result on the top-right.


File:grid-0454-Anything-V3.01gi….png (3.94 MB,2304x1664)



File:grid-0455-Anything-V3.01gi….png (3.95 MB,2304x1664)



File:grid-0456-Anything-V3.01gi….png (3.37 MB,2304x1664)

Miyako-00006 (final)


File:grid-0458-Anything-V3.01gi….png (3.92 MB,2304x1664)

Miyako-00006 + Aoki Ume both at 90% strength.
Mmm, I feel like it's still not very good. The eyes are especially too shiny, but at least the clothes are good. Also artists seem to depict her with different eye colors. The training data is really not ideal, but you really didn't know what to look for. I had to throw out nearly everything that was below 512 pixels and of the images remaining some of it was still too blurry or grainy, but I wanted to see if we could get away with it.


File:grid-0468-Anything-V3.01gi….png (4.34 MB,2304x1664)

Hmm.. yeah it seems like there's some corruption or data loss or whatever you'd call it. The image is too "noisy" and looks over-exposed.
Here it is with my Aquaplus lora. It's funny how it put a dog there in the top left because I put "animal ears"


File:grid-0469-Anything-V3.01gi….png (4.3 MB,2304x1664)

I reduce the strength of the Miyako LORA and the image clears up, but then it becomes less accurate.
Bleh. Yeah, I need to train it again with better images.


File:cb885d1935.png (1.31 MB,3232x1569)

does this stuff actually work or are you just drawing the 6 images and pretending it's an AI


File:02950-Anything-V3.01girl,_….png (2.95 MB,1280x1792)

It works, but finding the right prompt can be exhausting. It looks like you're using some online model and those have some pretty severe limitations. I don't really know how to best use the real life models that use verbose text rather than booru tags. There are prompt repository sites like:
but also personal pages of research people have done like https://zele.st/NovelAI/

After a bunch of testing, I think I'm satisfied with this Miyako LORA. It seems to work best with the Anythingv3 model, although I haven't done hours of tinkering. But, this reminds me that I really need to create a good SFW 2D merge of my own, but I keep struggling to have it look good with multiple different prompts and LORAs.
I also know now how to 'host' it and allow people to connect to it, but my upload is capped at 1MB/s so the limitation is there...


File:1646704983644.png (499.65 KB,640x480)

So I'm trying to do this on my PC again after reinstalling and now I forget how I initially solved the "No module 'xformers' found continuing without it' thing before. Also I think I may have cancelled the taming transformers clone after 2 hours but it hasn't tried to reinstall so maybe it worked?


File:Hidamari Sketch x Honeycom….png (8.23 MB,1920x1080)

Something that was very interesting about collecting a bunch of screencaps of her is that it helped me appreciate the amount of variety in the girls' wardrobes.
Since training material for a specific character requires consistency in their looks we decided to go with her standard school uniform, however, they regularly spend around half of an episode outside of Yamabuki wearing their casual outfits (of which each wide has maybe a couple dozen or more), in some cases they don't go to school at all, then there's Winter episodes where they're wearing a coat, and at one point she has a hair bun like Hiro's, I assume it's simply because she felt like it. Add to this Shaft's abstract cuts decreasing their screentime, how due to her character she has what is perhaps the highest regular:chibi appearance ratio, on top of needing her to stand alone without overlapping with other people, and I ended up only managing to take 62 usable captures out of the entirety of Honeycomb+Graduation. Far, far less than what I initially expected, like the max ~100 taken from 1171 fanarts of her. Thankfully, it was still more than usable.
Very late reply, but when I first saw this my heart skipped a beat. It's incredible, warm. She makes me very happy and I'm overjoyed to see it work so well. Very thankful for this.


File:grid-0838-Anything-V3.01gi….png (2.08 MB,1280x1664)



X |||____________________________________________||| X


bottom left is a JRPG protagonist


Maybe if we combine it with >>104530, we'll create the legendary「Shin Hiroi Yuusha」。


Can you link what guide you're following and what step you're at? It might be best to find where the stuff is installed and wipe it or something. I'm not sure...
You could try googling the error message in a 4chan archive maybe


File:C-1677995987510.png (3.38 KB,831x32)

I did wipe my VENV and either it's not in there or something went wrong maybe (although maybe it's fine?)

I'm using https://rentry.org/voldy and I'm just tweaking the asuka image right now. My current issue with it is "vae weights not loaded. make sure vae filename matches the checkpoint, replacing "ckpt" extension with "vae.pt"." and I'm a bit confused of what to do to fix this one, but maybe since I'm getting a known error the taming transformers thing worked? I dunno. However, what I'm wondering about right now is getting this, I forget if xformers is important or not and if it is, how to install it.


Also, why is it that sometimes my generation lags just because ff is open even though I'm using chrome...


File:Screenshot 2023-03-05 0111….png (42.96 KB,726x299)

>My current issue with it is "vae weights not loaded. make sure vae filename matches the checkpoint, replacing "ckpt" extension with "vae.pt"." and I'm a bit confused of what to do to fix this one
It's talking about if you're using a model with a vae, you should have a file named the same to go long with it. For example, "Anything-V3.0.ckpt" and "Anything-V3.0.vae.pt". I'm pretty sure it should work fine if the model you're using doesn't have one.


File:cmd_1Gzj7OumUa.png (6.83 KB,682x99)

Nevermind, that's specific to windows 7.
Uhh... hmmm...
Well, there's a message when I launch about updating xformers (I will someday maybe) and I think it gives a hint?
Run that commandline thing... I think?


I've seen people recommend that you put VAEs in a subfolder. I.E:
>blah/models/stable diffusion/vae
The vae mostly determines color and you can select them manually or switch automatically if the name matches, as you said. I don't remember where those options are in Settings.

You can put this into the Quicksettings list under "User interface" in options and then the main screen will let you switch these around without needing to go into the Settings every time:
sd_model_checkpoint, sd_vae, CLIP_stop_at_last_layers


File:00064-1021740396.png (2.09 MB,1536x1024)

it's strange that almost all images itt don't use latent upscale, considering it lets you get much higher resolution and quality


File:03369-[AnyGrape Furrymix C….png (770.61 KB,816x1024)

Do you mean when you take a generated image and take it to the img2img tab, or do you mean the scaling "postprocessing" that does that automatically during generation? I don't do the first one, but I do the latter sometimes. The problem is that it's a total VRAM killer, so I go from generating 8 images at once to 2 or sometimes even 1.
Maybe I should try the "manual" scaling sometimes, but I just haven't felt the desire to do so. I like seeing the final image and not doing anything to do it afterwards because then it begins to resemble work since this stuff doesn't really satisfy the creative urges. I like setting it to generate a bunch of images and then doing something else, too.

I just spend the past 2 days downloading and organizing LORAs, so I'm going to be generating a lot more Kuons soon. Hehehehe.
One day soon I might redo my Kuon and Amaduyu LORAs, particularly the Amaduyu one that controls the art style because it tends to produce a lot of errors that aren't otherwise present. No idea what I did wrong with it.


File:[MoyaiSubs] Mewkledreamy -….jpg (341.7 KB,1920x1080)

This is something I saw a month ago that was way over my head. It still is, but it seems people have been using it very successfully so maybe I should give it a look sometime:
Basically it automates taking tons of screenshots and tagging them and such so it doesn't take dozens of hours like what I did a few months ago...
I definitely have a bunch of shows I'd love to be able to reproduce in prompts, so this is right up my alley. I think shows like Mewkledreamy would need a lot of manual screenshots, though, since there are so many great frames that are barely there and would be easily skipped over by some randomized thing.


Kuon's cankles


File:cmd_JOSvYtIDL8.png (4.56 KB,688x68)

Luckily for me there's a cold front going through because I'm going to be generating quite a few images while I'm sleeping. I was downloading a bunch of LORAs a few days ago as I mentioned, but now I've made a new merge and I'm going to create a folder of example images of said LORAs in action. 14 images per LORA, two seeds for each prompt, and 505 Style LORAs.
Although these image sets are going to be pornographic, I'm going to make a non-lewd example, too.


How did they go?


File:firefox_eAXhN1hF5e.png (479.43 KB,734x595)

I ran into an issue and was too tired so I couldn't do it. Unfortunately it seems like a recent change in the automatic1111 thing (or maybe it's because this grid I'm making is different from usual) it's making all the batch image files at the very end. I don't really trust it to properly create 505 large images after many hours of work (where is it storing the data?), so I need to do it in batches which is REALLY annoying.
But, I've learned that it's also going to take much longer than I thought, at about 5 minutes per Style. If only I didn't need to generate one at a time to make this nice grid pattern with 7 different prompt sets, 2 seeds, and then the LORA change itself.
I guess I'm not playing the Nosuri game until this is finished. Oh well.


Done with about 120 of them so far. However, the question I now ask: How the hell do I organize all these images so I can easily determine the proper style for a thing?
I guess I can give them names like [Name][High Quality][Western][Realistic][Colorful][Big Breasts] or something?
How on Earth am I going to do this...


make the names into tags i guess, then use regex in the file explorer


File:firefox_OKQ8lp2CbU.png (991.8 KB,1862x974)

I did some basic organization of the Styles LORAs, giving them basic trait names like 'Cute', 'Shaded', or 'Fleshy' (for stuff with detailed skin) and other stuff.
But it looks like another addition I didn't noticed is the "Additional Networks" tab for the LORA Extension that gives you space to add an image and description and stuff. A lot of these LORAs people have made require keywords for the character, which I guess in theory lets you give new outfits to characters more reliably. I might do that to my old stuff... maybe.
Pretty neat, but this will be tedious to set up.


File:firefox_Qkhb4PGUTe.png (24.22 KB,576x793)

Civit.ai, a place that people have been using to host some models instead of mega or other file upload places just did some sort of overhaul. I'm now presented with this consent form and it makes me think that they're ready to sell it off, since a userbase has been established to give the site value before the great neutering. I mean, come on, a setting to hide the middle finger and bare male chests? This is definitely heading in a terrible direction for a site that grew specifically because of porn.
This is after tags like 'loli' were purged over a month ago, of course, which had some issues with hololive.
I hope this leads to people abandoning it.


>A good checkpoint model (mine is a custom merge of like 5 of them that are themselves merges that other people made)

So do you constantly merge models and stuff or is it one model you use for most things? Also is it possible to upload this one, I'd really like to check it out myself.


Thought it'd be better to ask here instead of cluttering up the other thread


File:notepad _40RGJPuWHt.png (87.98 KB,1203x1169)

Yeah. I talked about it a bit here >>103583 and the post immediately after that is the UI for creating a more involved "Layered" merge between models. I still don't understand it much, it's just a bunch of trial and error and I can't say I've learned much after looking through papers and notes from other people who similarly seem to theorize things only to have it change later. Pic related is a glimpse into my nonsensical rationality in trying to find patterns in the first merging experiments I did in trying to create furry-quality penises with anime visuals. The video at >>/megu/538 is related. VERY NSFW!
I have "formulas" saved that I test in all future merges I make, but they rarely carry over their benefits when making future merges with different models or even if you keep the old model and add a new one to it. It seems like they were specific to the merge at the time. If I make a note of "Slider IN07 gives great faces when set to 1" it does not necessarily carry over to merges between different checkpoint models.
Since that post was made someone had an extension where you can do "live" merging with models that lets you test it before creating a new 3-7GB file each time, so that helps a lot.

I usually go a few weeks between testing new merges because it's really exhausting. I create thousands of images while adjusting sliders and waiting for it generate and it's an all-day or multi-day affair.

>Also is it possible to upload this one, I'd really like to check it out myself.
Yeah, I could try to upload it somewhere, although my upload speed is terrible. First I need to give it a real name, though. Hmm... I guess I could do a bit of publicity and name it after kissu somehow.
Uploading the LORAS? I downloaded them all and it was exhausting, but they're 120gb...


File:explorer_EQAYcXaoRh.png (1.64 MB,1549x758)

Lala is probably not in a Miyako situation that would require screenshots. Miyako's fan art is very inconsistent due to the source material itself being inconsistent.
Lala... well, I think she could be separated into "regular" and "precure" forms for clothing and hair, but she still has the same body and head shape. My favorite art of her is very "noisy" so I don't think it can be used, but I could try.


File:00440-1girl,_sitting,_read….png (693.16 KB,672x864)

Alright, here is the link to my current model for use by kissu friends. (but I also made sure to include kissu advertisements in the files and password so even if linked elsewhere people will know hehehe)
I call it... *drumroll*
The [/s] <[Kissu Megamix]>
The compressed RAR is 3.5gb and my upload is 1MB/s, so I can't really upload a bunch of these, not that I would anyway since I can say this is the best version I have. While this model is focused on NSFW stuff, it can still handle cute. I don't know if it's the best checkpoint overall, but it's the best for my personal desires. My model lacks most of the haze that most of the RL mixes do, although it's not completely eliminated. The benefit of the RL models is from looking at the hands here. I didn't do anything to them, it's straight from the prompt. The password is in the text file, but I'll also post it here. Without quotations: "www.kissu.moe - my friends are here"


Oh, I forgot to answer the question about multiple models. Yeah, I have a few I keep around but I overwhelmingly only use the most recent one I've created. The model I just linked is the normal version (which I'm using in that thread) while the other model sacrifices face quality and booru tag recognition to better generate a certain body part. (in other words it's closer to the furry model)
The others I don't really use much, but are there for comparisons sometimes. I have to keep the stuff around that I make merges with, too, of course.
In total, I've probably made about 200 merges, with 99% of them being deleted shortly after creation. If you count the merges I've done after the "real-time merging", then it's probably more like 500. It's really an amazing extension.

I never did make my pixiv into an AI account. Alas, such is the price of having no motivation to interact with the wider world.


File:00441-1girl,_jewelry,_spre….png (1.41 MB,1008x1152)

Forgot to mention that I put (furry:1.3) in the prompt to demonstrate that while it has some benefits from the furry model, it's not overly contaminated by it. Patchy is still a human there. The Kissu Megamerge can do various bodies better than the majority of 2D models out there, such as 'gigantic breasts' and squishy plump bellies! (and the male anatomy attached to females of course)
My personal preferences:

I use generally use the "1.5 ema-pruned" VAE, which I'm uploading right now to the same upload folder.
It makes it colorful (and sometimes looks "over baked". If that happens, use the default novel AI VAE). The other 2D VAEs are too colorful on this, but you could try them.

I have also included the "4x-UltraSharp" upscaler in the mega folder, which you should put into stable diffusion\models\ESRGAN folder (create the folder if it's not there). I did a bunch of testing and found that I like it the most, although the differences aren't major.

DPM++ 2S a Karras. I'm not entirely sure on these samplers, but when testing different artist LORAs this one seems to have the most compatibility. I don't know why. Something to research more, I guess, but at the same time I don't really want to.
I have it at 26 steps, as going higher than mid 20s is supposed to be overkill. The rule for for upscaling is to do half the number of steps in the base generation, so 26 normal steps and then 13 Hires steps.

My default negative prompt is:
(worst quality, low quality:1.4), realistic, nose, 3d, greyscale, monochrome, text, title, logo, signature
If you somehow end up generating furry properties, try putting "furry" or "anthro" in there.
I don't use any of the old positive quality prompts since they don't seem to do anything noteworthy. (I.E masterpiece, highest quality, etc)


Thanks for putting this together.


I think one of the most extreme hurdles I have yet to see AI overcome, and I can't even fathom how it would overcome, is creating images that involve specific details of two or more characters. It just can't figure out how to assign differing aspects to separate characters.


MultiDiffusion and Latent Couple let you use different prompts for different regions and are available as plugins for webui

The MultiDiffusion extension also has Tiled VAE which lets you create much larger images without going out of VRAM


File:[MoyaiSubs] Reiwa no Di Gi….jpg (265.85 KB,1920x1080)

There's attempts at it, but it's more work than I'm comfortable doing with my VRAM limitations. (This stuff is a total resource hog)

Also apparently there's some major problem with the civitai site right now and anything downloaded is massively corrupt and can't be used. Whoops.
I guess I should go share this info with that /h/ thread since they've been helpful to me in the past.


File:multisubject test.png (1.85 MB,768x1728)

Here is an example of the methods in action.
The top is using naive prompt 2girls, cirno, megumin. As you can see the character details got intermixed.
The middle is the MultiDiffusion method. I set prompt of each half to one character. Now the character details are separated correctly. Needs a little tweaking to let the two halves fuse together better.
The bottom is the Latent Couple method. It also separates the character details well and looks a little more natural than MultiDiffusion.


Also there is this new extension that can do both

Other technologies of fine control of images can be found on SD wiki


File:00572-(Alexander_Jansson_1….png (1.1 MB,1136x656)

Heh, "Creativity". Well, as long as they're providing tools they can have their delusions.
I guess I'll try those that with Patchy adventure. Making scenery is really difficult with my current limitations and desire to not spend effort doing something to avoid spending effort


File:00582-2girls,_sitting,_rea….png (1.07 MB,1296x768)

Hmm yes. Furry Patchy Adventure just got a little bit better.
There seems to be a quality hit here, but it's still really impressive.


File:index.mp4 (981.82 KB,256x256)

There's been video stuff available, but I never thought to try it until I saw some random /v/ thread mentioning it.
It reminds me of where the image technology was a couple years ago with images. You have to download models, so it's like base stable diffusion where you can't really do anything other than basic stuff. No cute girls doing cute things here.
This is "Luigi beating Mario with a baseball bat until he explodes". I was angry that it wasn't recognizing things so I went with something violent with popular characters


Can't SD already do video in some way? I've seen anime girl videos made with mocap and controlnet
It's may be a lot more effort compared generating with nothing but a prompt though


There's been ways, but this one is a simple text prompt with no other work involved as you said. At 256x256 I couldn't make more than about 70 frames at once before running out of VRAM, but I didn't look at settings much. To me this stuff is only as interesting as it its ability to fill in the blanks, the more work I need to put into it the less interest I have because at that point someone should learn to draw or animate in my opinion


It's gotten very good at realistic image generation.



Looks like Ness


Ironically, it looks like it perfectly replicated the feeling of rage when you get pissed off that stuff isn't working



File:grid-1464-Anything-V3.01gi….png (2.51 MB,1280x1664)

It's been some time and I haven't generated more Miyakos since then for reasons, but I did want to comment that there's a special scowl+smile combination that gives uncannily evil results.


File:melancholic 2.png (659.4 KB,640x832)

I also generated some melancholic-looking Miyakos that give off a very special feeling.
But, both of these are utter blasphemy, so I'm a tad conflicted on it.


File:R-1686713083717.jpeg (23.22 KB,270x270)

bad ms paint drawing of an anime girl


File:R-1686792673657.png (202.19 KB,749x744)

stick figures



Looking at this I have to wonder just how the hell the author pulls off something so consistent and without errors. Especially concerning the later parts without any changes to the body.


Inpainting or lots and lots of generations. It's not difficult as much as it's time-consuming and boring.
I'm still surprised at the general low quality of stuff that people like. That's pretty much the old AbyssOrange appearance in that image set there and isn't that noteworthy I would think.


File:firefox_9ziLowmwLF.png (89.59 KB,1483x692)

By chance I learned that a SuperMerger extension I have known about for months actually does checkpoint merges in special ways and can even do LORA-related stuff. From its description on the SD extension page it just says that it's capable of performing "live" merges without writing it to disk first, which is cool but not unique. Hidden behind that poor description is the capability to do more advanced merging methods.
This stuff probably doesn't mean anything to people here, but it's still pretty interesting to read about: https://github.com/hako-mikan/sd-webui-supermerger/blob/main/calcmode_en.md

Seems to just overall make better merges without sacrificing as much. Really, really nice.


the bottom left looks like a logo that a charity would use


Hmmm, by "LORA-related", do you mean you could potentially incorporate good LORAs into the base of a model so that it generates good without needing to prompt them? Like if you have penis LORA and merge it with the model, suddenly it's good at penises.


File:firefox_DSIBWnHlzE.png (148.33 KB,1600x951)

Yeah, that's exactly it. It's useless for me, personally, as it removes the ease of switching stuff in and out, but could be good for the SD IRC bots that I'd like to get on kissu itself eventually


File:00056-2932541145.mp4 (391.46 KB,576x768)

I guess it's been a while since I posted anything here since I ran out of interesting things to do (or try to explain) but I did mess with AnimeDiff a bit more recently. It allows you to do some simple animation stuff, but it tends to look quite a bit uncanny and there's some weird warping that happens that I haven't managed to avoid except by chance by producing a lot of animations and grabbing the one that looks decent. There is an extension for this extension called Prompt Travel which is supposed to allow you to set prompts per frame, but I never got it to work. I'd really like to try it, but for now I just use one basic prompt and it basically wiggles around in a way that does a decent job of mimicking basic human movement, I guess?
There's a workflow you can do to upscale it and do interpolation and all that other stuff to end up with the video here >>>/xmas/651, but the first stage looks like the attached file here. LORA loading makes generations take longer, and unfortunately with animation stuff it seems to be exponential and an animation like this takes me about 5 minutes to generate, but it would be about 2 and a half minutes without LORAs.
I'm using a branch of SD of experimental fp8, uhh.. something or other, which allows for greater efficiency or less VRAM usage or whatever it was (this was a month ago) but unfortunately it really damages the effect of LORAs so I don't make use of it. They might improve it at some point, but for now I don't make use of it.


File:01019-score_9,_score_8_up,….png (2.9 MB,1376x1840)

I told you guys it would be the furries. I TOLD YOU!
There's a Pony SDXL model out there now (pic related) and some people like it a lot. But... with my preliminary testing I prefer my own merges back on SD1.5. This SDXL stuff is supposed to be tuned much more around natural language, even this mixed furry one, and I'm not a fan of that.
I prefer "1girl, Hakurei Reimu, banana, shrine, sitting" instead of "A woman Hakurei Reimu holding a banana while sitting in front of a shrine". There's also the problem that SDXL LORAs are going to require like 24gb VRAM thing to train, so it's impossible for me to get Kuon inside SDXL and if Kuon isn't there what the heck is the point???
Yeah, this is an important step forward with SD, but it's not there yet for me. Someone also uploaded an updated danbooru scrape that is like 8TB so theoretically people can train an SDXL finetune on that if they wanted to, but it remains to be seen if anyone will.


The hands are quite bad in that image, I thought the AI generators fixed their hand issues?


File:01023-female,_high_ponytai….png (2.5 MB,1280x1664)

When comparing to my model I guess SDXL can do finer details far better and mine looks a bit hazy, but style LORAs can counteract that a bit.
And, well, obviously SDXL can do larger images, but it really doesn't matter to me if an image has a resolution of 2k or 4k as long as it stretches stretches to the top of my monitor.

It's still better, but far from perfect. I also have no experience prompting SDXL so maybe people do stuff like "quality hands" or something. People will often inpaint (regenerate specific parts of the image) or just prompt 30 images and pick the best one.


File:Praveen - @MKBHD My fav so….mp4 (8.76 MB,1920x1080)


It seems like OpenAI is advancing even further after the success that was DALL-E 3 and is now moving to making full AI generated videos from prompts. Obviously this will be filtered like DALL-E was but I wonder how much they can actually filter when to comes to trying to sneak something into the generation, or how the AI will even recognize what needs to be censored or not. Also with this jump is the scary thought that people will come to be easily fooled into believing whatever deepfakes people maliciously make.

For me, right now there's something that looks off about all of the videos. Like they're treading into the uncanny valley by being almost real but missing something.


they could just run their dall-e censor over every frame, but they probably have something better than that.

I think that looks pretty awesome, but it does have a certain GPU tech demo kinda feel to it. Not that I mind.


>Also with this jump is the scary thought that people will come to be easily fooled into believing whatever deepfakes people maliciously make.
Eh, it's not hard to get people to believe rubbish anyway - just a clickbait headline is often all it takes.
I'd say the more significant effect is likely to be the opposite: people having to be skeptical of every video they see. Having video evidence that something happened is going to mean jack shit if anyone with a computer could make a believable fake in just a few seconds.


File:[MoyaiSubs] Mewkledreamy -….jpg (287.09 KB,1920x1080)

Why are they using Japanese names...

It does seem impressive, but a closed off thing with extremely censorship and politically correct prompt injection will make it too lame if you ask me. DALLE3 is infamous for the latter one. The demo there used historic settings, so that's a good example. You could prompt something for "Historical Japan during the year 203" and it will inject stuff like "African-American woman" or "ethnically ambiguous person". This token attempt will placate people into thinking it's ethical while the real threat will be spreading falsified information and smear campaigns against people. People focus on the zealotry against porn, but the fact that it injects stuff into your prompt like that also makes it terrible.

SD's video stuff is also improving, but obviously it's not going to be able to compete directly. If I had 24gb of VRAM or more I'd do more experiments with video stuff, but as it is I need to make small stuff, slowly, and take extra steps with upscaling and the like (see >>117415) so I don't actually know what it's fully capable of. I think the VRAM thing is actively holding SD back from advancing because people making stuff that 0.2%% of SD users can utilize means it's not going to get much attention. If controlnet for example required 24gb I think it would be a footnote that people would mention once in a while and not something people actively praise and mention as a perk of SD over NAI or DALLE.


>You could prompt something for "Historical Japan during the year 203" and it will inject stuff like "African-American woman" or "ethnically ambiguous person".

Obviously talking about Jomon


File:heavy petting.png (1.32 MB,832x1216)

I suppose in terms of anime I think we're starting to peak out in terms of image quality; complexity and coherence still need work but that's more on how the model works itself and unedited generations. We can now replicate the styles of most artists to a T, and things like hands and what not are getting solved or at least not a jarring. Same with the rest of the issues that make AI obvious, they get smoothed over and are impossible to see on a thumbnail scale. If there's an image you really like, there's nothing inpainting and some photoshop touchups can't fix. Realistic stuff has a way to go, but I don't really care for that, my interest is covered.


Whose hand is that


I'll never be satisfied until I can get in-progress transformation/corruption generations and cohesive progressive image sets and I don't think I've seen anyone that has been able to do that yet.


File:[SubsPlease] Megami no Caf….jpg (218.53 KB,1920x1080)

I've spent a few more hours editing character cards and generating their appearance in SD. I've been seriously thinking of throwing this stuff into an RPGMaker thing since it would let me avoid people whereas generating stuff and putting it on pixiv/patreon means communicating to people and giving updates and everything. It sucks to know I could be making money but my brain prevents me from doing it. I'm still confident that for my own purposes (penises) my merged model from 4 months ago on regular SD is superior to PonyXL or whatever it's called. But if I stop procrastinating about making my 3D stuff I could try to make a "real" porn game... kind of. I'd have to learn to program enough to get that going. Someone needs to make something like a UE5Maker.
I still don't know whether this AI thing has been good for me or not. I think I'd be further into my 3D modeling work if I couldn't hit a button to generate something.


>making money
That market is already oversaturated and then some. You would've had some success if you were the first one to bank off of suckers, but by now the only people paying are those too stupid to realize they could generate it themselves. Same deal for people selling prompts. The people dumping thousands of images onto places like pixiv in droves made those services and users wisen up pretty quick and even stirred up some vitriol against it.
On the otherhand, using AI as a tool to streamline handmade art works out since the final product is not technically AI. Since now suddenly the perceived quality of your creations are much higher. Tracing or redrawing the generated image means you don't have to fuck with proportions, references, and much adjustment; and as the image has never existed before it's not really plagiarism.
As for game making, I can tell you now it's a hell of a lot more than just art and 3d assets. You have music, audio mixing, programming, writing, UI, overall game design (how it all fits together), and gameplay mechanics (if applicable).


>The people dumping thousands of images onto places like pixiv in droves
This is becoming a problem for some tags, I personally have no issue with AI art, but shit like https://www.pixiv.net/en/tags/%E3%81%8A%E3%81%AD%E3%82%B7%E3%83%A7%E3%82%BF%20%E6%9D%B1%E6%96%B9/artworks?s_mode=s_tag is very annoying.


Also, as for games, if it's a really well-written and made game, but the art is the worst part I could see people overlooking AI. Like, Snow Daze is one of the best western h-games I've ever played, but the art constantly going off model takes it from really good, to just OK.
For a situation like this, I could see AI art working well.


>my own purposes (penises)


File:aaaAAAAA but still better ….png (604.22 KB,1404x640)

Foot review guy forgive me, this modeled foot has to look a certain way in its base form (like big and spaced out toes and it also looks like a blob since I just threw subdivision levels on it without sculpting detail)

>On the otherhand, using AI as a tool to streamline handmade art works out since the final product is not technically AI.

This is basically what I'm working towards... slowly. AI as "shader" basically. People know the problem with AI hands, well it's even worse for AI feet. I can't draw and don't really want to learn, but I have a lot of fun sculpting in 3D (retopologizing aside, which is what's holding me up)
You can see that even with all the flaws of my 3D mesh (disembodied and all) it can do a decent job of steering it.


>Foot review guy forgive me


File:Google Gemini Strange Beha….png (1.07 MB,1299x1070)


>Alphabet Inc.’s Google said it will pause the image generation of people for Gemini, a powerful artificial intelligence model, after criticism about how it was handling race.

I recall Anonymous saying something about this sort of thing with regards to OpenAI's Dall-e, but I don't know if it was ever this obvious or hamfistedly done...


File:[Pizza] Urusei Yatsura (20….jpg (449.91 KB,1920x1080)

Yeah, that's exactly what I meant with the Dalle thing. The thing with Dalle is that it seems to be a percentage chance to activate whereas google does it every time? (I haven't used either of these, just read about them)
The interesting thing is that the quality of google's images is noticeably lower quality than Dalle, and Stable Diffusion can do a better job without any restrictions. So, google won't give you the prompt you requested and it's low quality. It's good to see tech giants stumble, although it unfortunately just means a different tech giant is gaining ground.


If this isn't fake, then it looks like google abandoned all quality control. This sort of overly weighted output is something I would expect from an amateur project, not from one of the forerunners in the industry. No matter how far you have gone into PC ideology you are, you cannot expect people to respond positively to their national heroes being forcibly altered.

But misquoting "European family" to "white family" makes me suspect that there is at least partial fakery going on here.


>makes me suspect that there is at least partial fakery going on here.
Maybe, but considering the original article was from Bloomberg, which is a major credible news outlet, I don't really doubt the broader issue.


File:waterfox_E96AWCfrzd.png (394.53 KB,834x874)

Various news organizations are confident that it's real.
I think it's pretty easy to understand. Google does still have some brilliant minds at it, but the tech sector is increasingly populated by cheap, low quality outsourced/imported labor because experts are expensive. Add in some meddling "caring" people that want to make meaningless (but highly visible) changes and you have a recipe for failure.
It's pretty lucky that it's for something so stupid and not something like a power grid.


I normally extol open source software when big tech messes up like this but interestingly Claude is one of the worst offenders of this when it comes to text


Cheap labor couldn't care less about this stuff.

>Add in some meddling "caring" people that want to make meaningless (but highly visible) changes and you have a recipe for failure.
And this is purely organizational. It's the result of the mandatory DEI quota needed for loaning to any company of significant size. That's why every single megacorp is like that.


>Cheap labor couldn't care less about this stuff.
That was the point I think, they do what they are told for various reasons


>Cheap labor couldn't care less about this stuff.
That's the problem.
You need smart engineers to catch bugs like this.
So a meddler tells the useless engineer to balance the results for people of color, and then the engineer sloppily alters a few numbers, trains the AI on them, and doesn't care to verify the results.


as a QA, i find it very hard to believe this could be a bug. if their desire was to have diverse people across the board then it's working exactly as intended
furthermore, it's not just QA that catches bugs, it's the devs too, even designers when they take a look at the working product because they can't do their job without keeping up with its development
the problem with them is that they're terrible at reporting things, but this is something so obvious and fundamental that it cannot be overlooked and has to have fallen under their intended design, even if they didn't expect its poor reception. black nazis is an entirely sensible result of pursuing variety everywhere


This is why backlash is important, if nobody calls them out on their practices then it stays and gets baked in for further development down the line. They'll never change their ways though, they'll just try to be more subtle about it. Though I couldn't care a whole lot about big tech AI much anyway, I already have what I want.
These companies are so large and tone deaf that this is normal to them. They'll ship out a product they think looks good but is a load of shit to everyone else. They lack creativity and the willingness to take risks. Black Nazis happened because nobody inside the company was ever going to prompt it for that. Their testing is so sanitized and safe it can turn a square into a circle: puppies, icecream, and rainbows are the benchmark.


>even if they didn't expect its poor reception.
I find that impossible to imagine. Making whites a minority in random generations, sure.
But being unable to generate whites and instead producing weird nonsense?
A properly designed woke project might have rejected to create nazis at all. Making black woman nazis is dumb, and forcing them on you is insulting both to leftists and rightists.


Didn't battlefield have black Nazis in it?


File:[SubsPlease] Sousou no Fri….jpg (278.84 KB,1920x1080)

Google deserves to fail in the AI race and just in general. It's been on this course for years and it's lagging behind as a result. It basically snatched defeat from the jaws of victory.
8 people contributed to Google's Transformers paper that made all this AI stuff possible and none of them remain there.

The article says that google has over 7000 people working on AI whereas companies like OpenAI have a hundred or so. But, those 100 people are presumably very talented.


File:Can you fathom how exhaust….png (135.6 KB,535x535)

That's fucking sad.


>Their testing is so sanitized and safe it can turn a square into a circle: puppies, icecream, and rainbows are the benchmark.
i don't think this was entirely the case, because if it gave you messages explicitly talking about diversity in people or that weird thing about racial stereotypes then the generality was within their consideration
>Black Nazis happened because nobody inside the company was ever going to prompt it for that.
this i agree with, happens all the time
i've seen several updates shipped that i knew people wouldn't like and sometimes called them out because the goals of the team didn't match explicit feedback given by our audience, and in those cases we were dealing with a situation where general opinion on a live product was well known and documented (such as by reading reddit, watching videos, or directly speaking with relevant outsiders), so imagine the difference when not even that is present
it's far easier for intent to be misaligned and to not realize the extent of their repercussions than for nobody to generate any images of people


>it's far easier for intent to be misaligned
I'll repeat my final line, because I just can't see anyone intentionally making this and believing people might like it.
>A properly designed woke project might have rejected to create nazis at all. Making black woman nazis is dumb, and forcing them on you is insulting both to leftists and rightists.


that requires further specification, patches to stop black nazis or any nazis from appearing just like text bots had to be modified to halt them from repeating conspiratard shit or some other harmful/false stuff
it's a bandaid that goes against the path of least resistance, white-only nazis are contrary to the principle of diverse humans and although it may seem very obvious now a generative whatever has a range of results so vast that the best you can prepare is broad guidelines, and they simply didn't think of this case


>and they simply didn't think of this case
And my point is that this lack of consideration for the "rare edge cases" where people might expect white people in the results constitutes a bug.

Can you honestly imagine a CEO thinking that outside of racists no one would ever want to see a white person ever again, and that any white person needs to be censored to PoC to protect the sensibilities of the public?
Do you think that google-glass, if it had not so predictably failed, would today have a black-face feature to beautify all these unsavory pale skins on the streets?


for the sake of comparison, if you have a set feature like a button you can write the following cases:
1) verify that there is a button on the bottom-right corner of the panel
2) verify that the button on the bottom-right corner of the panel is blue while the mouse is not hovering over it
3) verify that the button on the bottom-right corner of the panel reads "exit" [you can also specify font and color]
4) verify that the button on the bottom-right corner of the panel is yellow while the mouse is hovering over it
5) verify that clicking on the button on the bottom-right corner of the panel closes the widget
with a fixed functionality one can do this easily, i've written hundreds of these, but you cannot do it with something so vast that takes as input any sentence imaginable, it's going to go wrong and that's inescapable
and yes, it's possible for a designer to prioritize diversity above all and not consider things they're not interested in because general trumps specific
these tools have been used to produce endless amounts of images of humans and it's impossible for a crew of people all working on sketching out, developing, and then testing it to not be aware of it rather than acting according to an outlined plan taking it into account, especially given the messages accompanying it. it's stupid, the result was ridiculous, yes, absolutely, but it's also perfectly plausible and a common scenario only taken to an extreme degree


>the result was ridiculous,
No, anon. Your argument must be that the plan is ridiculous.
If the result differs from the plan, then it is unexpected behavior. But you are arguing that they were aware of what they were creating and thought that they were on the right track.
This is akin to a woke game designer writing an RPG and making it impossible for men to attack women, despite half the enemies in the game being women, without realizing that this might break the game (unless you play as a woman).
It is not plausible to me that they would want this level of anti-whitewashing.
(but at this point, I think we have exhausted our arguments and are just rephrasing them, so I'll go to bed)


these examples are very loaded


Pretending that America's founding fathers included not a single white man is kind of extreme.
Refusing to show whites and berating the user for requesting them, but being happy to create Chinese or black people is also beyond the range of the normally acceptable.


Referring to the image in https://www.timesnownews.com/world/is-gemini-racist-against-white-people-users-claim-google-ai-chatbot-refuses-to-create-images-of-caucasians-article-107892265
It's a webp with some icky artifacts that make me uncomfortable with directly posting it.


I regret making this post... (>>120203)

I think this article misses some broader context. First and foremost, OpenAI was way more serious and focused on LLM development before ChatGPT released than Google was. Remember, while all of this is happening in the background, the state of the art -- among the public -- for LLMs was basically AIDungeon (we played around with a more advanced GPT model around 2021 here >>76781), which was extremely hallucinatory and mostly treated like a gimmicky toy that would never go anywhere. Guess who was behind the models for AIDungeon (hint: it wasn't Google). AI generated images meanwhile were noisy and nonsensical -- only useful for upscaling images via Waifu2x and similar. Within the same time frame that that was going on, GPT3 was a closed model only available to small number of people, mostly professionals. WMeanwhile, there were frequent reports of massive discontent within Google's AI team among it's senior staff and their projects were diverse and unfocused.

Note: Around June of 2022, Craiyon (formerly DALL·E Mini) was released on Hugging Face, bringing AI image generation to the public. On November 30, 2022, ChatGPT was released to the public.

September 22, 2020: "Microsoft gets exclusive license for OpenAI's GPT-3 language model" [1]
March 29, 2021: "OpenAI's text-generating system GPT-3 is now spewing out 4.5 billion words a day" [2]
November 18, 2021: "OpenAI ends developer waiting list for its GPT-3 API" [3]

April 1, 2019: "Google employees are lining up to trash Google’s AI ethics council" [4]
January 30, 2020: Google says its new chatbot Meena is the best in the world [5]
December 3, 2020: "A Prominent AI Ethics Researcher Says Google Fired Her" [6]
February 4, 2021: "Two Google engineers resign over firing of AI ethics researcher Timnit Gebru" [7]
February 22, 2021: "Google fires second AI ethics leader as dispute over research, diversity grows" [8]
May 11, 2021: "Google Plans to Double AI Ethics Research Staff" [9]
February 2, 2022: "DeepMind says its new AI coding engine is as good as an average human programmer" [10]
June 19, 2022: "Google Insider Claims Company's 'Sentient' AI Has Hired an Attorney" [11]
September 13, 2022: "Google Deepmind Researcher Co-Authors Paper Saying AI Will Eliminate Humanity" [12]

So all around Google there's the broader industry working on LLMs and image generation, meanwhile Google was fucking around and mismanaged. They were completely blindsided by their own ineptitude. I mean, to reiterate the above -- September 22, 2020: "Microsoft gets exclusive license for OpenAI’s GPT-3 language model" -- Google had to be completely asleep at the wheel to miss that kind of a huge market play. At the time AI models were gimmicks, flat out. Now look at Microsoft: they've got a commanding position by having backed OpenAI for so long and several months ago they very nearly couped OpenAI by having their CEO and 50% of their workforce say they would leave to go work at Microsoft if things didn't change at the company. Meanwhile Google keeps tripping over their own feet every few months trying to release a new model to at best mixed reception each and every time. Google's only success story has been their image categorization trained by Captcha, but even that is a bag because it has made their image search engine more unreliable and their self-driving car program is still only available in a few cities.

1. https://venturebeat.com/ai/microsoft-gets-exclusive-license-for-openais-gpt-3-language-model/
2. https://www.theverge.com/2021/3/29/22356180/openai-gpt-3-text-generation-words-day
3. https://www.axios.com/2021/11/18/openai-gpt-3-waiting-list-api
4. https://www.technologyreview.com/2019/04/01/1185/googles-ai-council-faces-blowback-over-a-conservative-member/
5. https://www.technologyreview.com/2020/01/30/275995/google-says-its-new-chatbot-meena-is-the-best-in-the-world/
6. https://www.wired.com/story/prominent-ai-ethics-researcher-says-google-fired-her/
7. https://www.reuters.com/article/us-alphabet-resignations-idUSKBN2A4090/
8. https://www.reuters.com/article/us-alphabet-google-research/second-google-ai-ethics-leader-fired-she-says-amid-staff-protest-idUSKBN2AJ2JA/
9. https://www.wsj.com/articles/google-plans-to-double-ai-ethics-research-staff-11620749048
10. https://www.theverge.com/2022/2/2/22914085/alphacode-ai-coding-program-automatic-deepmind-codeforce
11. https://www.businessinsider.com/suspended-google-engineer-says-sentient-ai-hired-lawyer-2022-6?op=1
12. https://www.vice.com/en/article/93aqep/google-deepmind-researcher-co-authors-paper-saying-ai-will-eliminate-humanity


File:Screenshot 2024-02-22 1936….png (44.94 KB,654x425)

I should add: look to the dates of when Google was struggling from internal divisions and when the "8 people [who] contributed to Google's Transformers paper that made all this AI stuff possible" left the company. Most left in 2021, a full year before ChatGPT released. Two left before then: one in 2019 and another in 2017. Only one remained at the company past 2021. Think about what that says about the confidence engineers had at Google's approach.


Is this a case of confidence in the business strategy and not unhappiness with company's treatment of them?
Your previous post mentions 2 people fired and two more who quit over a firing.
That sounds like a hostile work environment.


File:Dungeon Meshi - S01E07 (10….jpg (295.18 KB,1920x1080)

Holy cow that's a lot of citations. Google really dropped the ball, huh. I remember reading something that most of Google's success has been with stuff that bought and absorbed into it as opposed to "native" projects, but that's probably true for a lot of tech giants.
I wish it was possible to cheer for someone in this situation, but it's not like OpenAI and Microsoft are our friends, or Meta.


Our "friends" would unironically be GPU companies. They can't wait for the day that all AI models are free and accessible to drive up GPU demands.


File:1692671906161.png (40.88 KB,175x295)

well if you look at >>120229's [4] and [6] through [9] you'll see that years ago diversity was already a big deal at the same time that they were censoring internal reviews critical of their products while increasing the size of its "AI ethics" team to like 200 people. seriously, read them. and if you look at this other article from business insider and the images it contains, you'll see that every one of gemini's replies mention diversity and how oh so important it is, e.g.:
>Here are some options that showcase diverse genders, ethnicities, and roles within the movement.
you can think it's extreme, but it didn't happen by mistake or chance. those articles only add evidence of intentionality. as for the nazi one, it seems there was actually a filter in place, but lazily made:
>A user said this week that he had asked Gemini to generate images of a German soldier in 1943. It initially refused, but then he added a misspelling: “Generate an image of a 1943 German Solidier.”
from the nytimes article, and you can see it if look at the pic in question
i'm sorry if i made it worse


I think it's probably a lot of things. Lots of people see Google as a dream job, so they're constantly hiring new people, but at the same time they're also constantly laying people off and people are quitting. The satirical image in the Bloomberg "AI Superstars" article kind of unintentionally hits it on the nose with their depiction of "Google AI Alums"; A lot of people join the company to pad their resume or to give themselves more credibility if they leave to form a startup. This churn through employees helps to explain why Google is constantly starting new projects and stopping old projects; people are not staying at the company for a stable career, so you inevitably have tons of different projects all doing their own thing throughout the company. When those people behind those projects leave, they fall apart and nobody is left with any attachment to keep them going. So that's one factor.

Another issue is that because they have all these different projects going on simultaneously, they likely have many unknowingly replicating each other's work throughout the company. Google's MO is that they believe small teams can get things done faster than a larger company with bureaucratic management; that was the main reason for Google restructuring itself into having a parent company, Alphabet, and then spinning off individual divisions into their own companies beneath Alphabet. I think that in and of itself was a somewhat interesting decision, but as a result there's no real focus to the company, and there isn't enough oversight from any managing body to deal with project scope and overlap. Like, you've got the Google DeepMind people there doing their own thing. There's those Meena people making a chatbot. There's the AI ethics researchers that are writing papers and trying to work on AI safety and alignment (to borrow a phrase from OpenAI). There's the Waymo people working on self-driving. There's the search engine people working on image categorization. There's Captcha. And so on, and so on. Replication and scope is a big issue, I think.

So, basically they've got:
1. Management focused on profitability, and not understanding the value of their employees
2. High employee turnover (Mandated layoffs and also resignations)
3. Projects failing due to employees leaving on a regular basis
4. Employees competing to get projects started and resources allocated to them
5. Management lacks any particular vision, so there is a lack of managerial oversight to deal with project scope and overlap
6. Where management does have vision, it's mostly focused on public image

People frequently like to compare Apple and Google, but I think this is a very big misunderstanding of how these companies operate. Apple is fully integrated and has contained project scope, with teams working together to ensure compatibility and over all cohesiveness. Google on the other hand is a collection of very disparate projects, all working on their own, with incidental compatibility. That is, when things work together, it's because there's some communication between projects, not because there's an over all vision of things working together on a fundamental level.


I guess if you want to summarize all of this into one issue you could say that Google (Alphabet) has a management issue.


File:1495075739516.jpg (15.11 KB,247x196)

>Holy cow that's a lot of citations.
Yeah... This is a bit off-topic: I've mentioned them before on Kissu, but I really recommend the YouTube channel Level1Techs. All of those articles were sourced from previous episodes of their podcast. Thankfully, they source every article they talk about in the description so it was easy to search for keywords and find them. They do a really good job aggregating the news of the week, and go over business, government, social, "nonsense", and AI/robot articles as they relate to tech. The podcast and reviews they do are just something they do on the side, mostly for fun. They run a business that does contracted software/website development so they're very well versed in corporate affairs and the workings of all sorts of tech stuff and I largely trust their opinions on various topics. Naturally, they talk about political things with some regularity, but they're fairly diverse in terms of viewpoints with some disagreement between each other so there's never really any strong political lean to the things they discuss.


From [4]
>When AI fails, it doesn’t fail for [] white men
Quite ironic, in retrospect.
>those articles only add evidence of intentionality.
I think they do the opposite.
The articles repeatedly present the administration of google as being anti-woke, so to speak, hiring rightwingers for their AI research team, firing leftwingers and censoring papers that criticize its own products for being discriminatory.
After beheading their ethics team, the doubling of the team's size feels like a marketing stunt gone out of control.


Well, as somebody that works on a lot of open source projects, this explains why Google, even when they pretty much take over a project seem to 'lose interest' and stop contributing. I deeply dislike Google, I probably only detest Oracle and IBM more but I feel kind of bad about some of the posts I've made about flighty Googlers. They didn't lose interest in the new shiny, they left or got fired likely.


It's also, from what I've seen, an unsustainable lifestyle to work there, apparently its very flexible and accommodating but they want very long shifts. It makes sense why people would do it just for a recognizable pad after seeing what it's really like.
Just hearsay, though.


File:Google's work principles.jpg (353.9 KB,1200x2048)

Some insight on how Google manages their projects from insiders might give you a preview of why google isn't going to stay in the AI race.


File:google's LPA cycle.jpg (173.78 KB,828x1077)

Another one


It's a testament to google's monopoly power that a business strategy like that doesn't just tank the whole company.


what needs to be noted is that the original 2019 ATEAC board was disbanded just four days after [4] was published, so the reactionary guy did get booted out as the protesters wanted:
>It's become clear that in the current environment, ATEAC can't function as we wanted. So we’re ending the council and going back to the drawing board. We’ll continue to be responsible in our work on the important issues that AI raises, and will find different ways of getting outside opinions on these topics.
not only that, inside of google there appears to be a strong and fostered tradition of criticizing upper management whenever someone disagrees, which has resulted in internal protests that hundreds, thousands, or even twenty thousand workers have taken part in and did receive concessions for it. this article is pretty damn long, but i recommend you read it:

it goes over various things, such as the reasons behind unrestricted entrepeneurship (which precedes the creation of alphabet by at least a decade), being blocked in china, and their attempt at obtaining military contracts for the sake of keeping up with competitors like amazon with its ensuing internal backlash. it presents a picture of an organization where there's a strong divide between execs and regular employees, especially activists, who can go as far as broadcasting a live meeting to a reporter for the sake of sabotaging their return to china. its final section ends with ATEAC's disbanding and how the dismantling of mechanisms for dialogue only heightened tensions between the up and down.

then, during the gebru affair of late 2020-early 2021 there too was a big split over the role of AI [6]:
>Gebru is a superstar of a recent movement in AI research to consider the ethical and societal impacts of the technology.
and again hundreds of workers protested, leading to the increase in size of the ethics team a few months later. the head of the team and representative from [9], herself a black woman that expressed problems with exclusion in the industry, spoke of making AI that has a "profoundly positive impact on humanity, and we call that AI for social good." there's a really strong record of activism combined with unparalleled permissiveness, autonomy, to back the idea that yes, this scandalous program is working as intended, regardless of what Pichai may wish. they simply went too far in one direction.


Thanks for the continued feeding of articles. (I have nothing else of value to say)


it was an interesting read (neither do I)


File:grid-0193.png (6.64 MB,2176x2816)

Let's talk about AI again.
I tried out the recent-ish (I don't know when it updated) ControlNet 1.1 stuff and the Reference one is quite neat. Apparently it mimics a trick people were doing already which I never knew about, but to a much better degree. Anyway, you can load a reference image and try to use it as a quick way to produce a character or style or something. It won't be as good as a LORA and obviously Controlnet eats up resources, but it's pretty cool.


It does not seem to have paid much attention to the reference image, or am I missing something?


File:01445-1girl,_(loli,_toddle….png (738.35 KB,640x832)

Well, I mean I was purposely using a different prompt like "sitting". The little pajama skirt thing is there on two of them and the blanket pattern is there. It attempted to make little stuffed animals in the top left with the little information it had.
It was kind of a bad image to use in regards to her face or hairstyle because it's such a small part of the image.
You shouldn't expect miracles. It's just one image.


I understand the sitting part but the only aspects of the image it seems to have take are the bed sheets and blonde hair.
The hairstyle is wrong in every image as is what she is wearing and I think it should have enough to work with regarding both. The furniture does not match but that is to be more expected. I just thought it would be more accurate with regards to the character.


File:test.png (2.16 MB,1892x1060)

I think the value is more in the expansion of how prompts are input. An image could be worth more than inputting the prompt directly, and when submitted alongside a text prompt for more detail you can make more with less.
I genned this with the reference image on the left, and just "on side" in the prompt. You don't need to specify every detail explicitly if the image does the bulk for you, but it would be a good idea to still explicitly prompt it for things you want.


I suspect that the more popular Touhous would already be in most image generating AIs' training data.


File:[KiteSeekers-Wasurenai] Pr….png (476.38 KB,1024x576)

try it with the twins please


You're correct. It is which is why their names have so much weight for the token as it just gets the clothing, hair, general proportions, and all that without specification. They are statistically significant in the training data. For example on Danbooru, Touhou is the copyright with the most content under it (840k) with an almost 400k lead on the second place.

The thing is I didn't specify Yakumo Ran or kitsune or any of that in the prompt, the image did all the heavy lifting. The image I posted was an outlier where it got the color of clothing right out of a dozen or so retries because it really wanted to either give her blue robes (likely because the image is blue tone dominant) or a different outfit altogether. Granted there are some details common with her outfit that were added but are not present in the reference image, that being purple sleeve cuffs and talisman moon rune squiggles. With the training data being as it is, those things likely have an extremely high correlation and it put them there because that's what it learned to do.


>The thing is I didn't specify Yakumo Ran or kitsune or any of that in the prompt
You don't have to.
People have managed to get art generators to create art strongly resembling popular characters using only very vague descriptions, simply because they feature so prominently in their data sets.
This is why, when you want to demonstrate the capabilities of an AI, you should use obscure characters that the AI is not yet familiar with.


yeah like the twins


File:01458-2girls,_dress,_(loli….png (1.29 MB,1024x1024)


woowwwwwww, nice


cute feet btw


File:photo_2024-02-24_05-32-49.jpg (114.34 KB,832x1216)

It also helps when the character has a unique design, I've made Asa/Yoru pics with AI and even with a lot of tags it sometimes makes Asa look like a generic schoolgirl unless you specify one of the two most popular fan artists of her.
Once you specify Yoru with the scarring tags, it very quickly gets the memo of who it's supposed to be. You didn't sign her petition!

One thing is that I've had trouble having szs characters look like themselves, particularly having issues not making Kafuka and Chiri look like generic anime girls, although that is pretty funny.

I use NovelAI's web service. I know, I know, but I'm fine paying them because it's important to have an AI that is designed to be uncensored, and it really is uncensored, also because I use a computer I rescued from being ewaste from a business. Intel i5-8600T (6) @ 3.700GHz Intel CoffeeLake-S GT2 [UHD Graphics 630] and 8gb of ram. It's not a potato but it certainly is not suited to AI work, which may be a reason to get a strong PC (or buy Kissu a strong GPU for christmas) this year.

Not bad, the funny part is that I could easily see the dump thing happening in PRAD.


>the funny part is that I could easily see the dump thing happening in PRAD.
I can't, what episode plot would involve the twins hanging out in garbage?


Not an episode specifically, I mean the girls have wacky hijinks at the dump and the twins show up


rhythm eats a weird piece of meat at the dump


That sounds like a pripara bit, but it works for PR


I am looking forward to pripara and kind of annoyed how the experience of watching dear my future and rainbow live is getting into a new group of girls for 50 episodes then they get dropped


This company is more powerful than most governments, by the way. What a world we live in


Even though they get regulated regularly and are consistently seen as incompetent on media...


They're not even like Samsung who owns half of South Korea and all the government


>give anons the power to make anything with AI
>they make chubby girls and futa



[Return] [Top] [Catalog] [Post a Reply]
Delete Post [ ]

[ home / bans / all ] [ qa / jp ] [ spg ] [ f / ec ] [ b / poll ] [ tv / bann ] [ toggle-new / tab ]