Anyone else been messing around with the stable diffusion algorithm or anything in a similar vein?
It's a bit hard to make it do exactly what you want but if you're extremely descriptive in the prompt or just use a couple words it gives some pretty good results. It seems to struggle a lot with appendages but faces come out surprisingly well most of the time.
Aside from having a 3070 i just followed this guide I found on /g/ https://rentry.org/voldy
to get things setup and it was pretty painless.
Ah, yeah, I've been reading up on it. I downloaded some 7GB danbooru thing for it. I wouldn't trust /g/ with an Etch-a-Sketch so I won't follow a guide from there, but I've saved some links from other places:https://moritz.pm/posts/parametershttps://github.com/sddebz/stable-diffusion-krita-pluginhttps://lexica.art/https://docs.google.com/document/d/1gSaw378uDgCfn6Gzn3u_o6u2y_G69ZFPLmGOkmM-Ptk/edit
I'll get to trying this eventually, but so far I've just been procrastinating a bunch since I need to install and run python and do other stuff I don't understand. My VRAM is also only 6GB and I'm not sure if that's enough.
I'm not too concerned with the theory at the moment and more just wanted to know what practically has to be done to get it running. That guide more or less amounts to downloading some git repo, a model (this is the sketchiest part but you already did it), and download python 3.10.6. Then run a bat file and it works. From what I can tell the web-ui allocates 4gb of vram as a default and you'll have to pass arguments to get it to run more or less otherwise. It should run with an nvidia card that has 6gb.
That Krita plugin looks interesting, will check it out later.
here are the other faces from that batch
the model i'm using is supposedly trained on a set of images from danbooru, not sure why it'd look korean specifically other than chance
I have exactly 0 (zero) interest in AI art. I have not saved a single file from one of them to this day even. I wouldn't call it being a hater, but they really are just fundamentally unappealing to me.
I don't get what all the fuzz is about either. If you've seen one image, you've seen them all. They all have this weird quality to them. Maybe it's that there's absolutely nothing meaningful about them. Doesn't help that most of these images look like bad crops.
why is she stuffing her boobs with spaghetti....
Ever wondered why girls smell so good? This is why.
wow, so even AI has trouble drawing hands
>>96641>so even AI has trouble drawing hands
Yeah, it must be related to how the algorithm copies things it gets confused and can't do hands. With faces the parts have general locations and you can meld shapes a bit, but with hands it's trying to copy a bunch of different positions and angles into one and it breaks. Anime faces might be one of the best things since they don't even make sense to begin with in regards to angles.
I just steal other peoples prompts and add waifu.
Also if anyone else is on AMD on Windows, I followed this guide and it works https://rentry.org/ayymd-stable-diffustion-v1_4-guide
Also Also if anyone can help me figure out how to change output resolution, that would be swell.
Yeah, I've been somewhat surprised by the quality of the more 3DCG drawings I've seen from it, but when it comes to more anime style the AI falls short. There's probably more subtleties that it can't pick up in batch because of differences in artist styles that causes these amateur-level drawings.
Alright, I'm diving in. Might take a while to get stuff set up and figure out what I'm doing, however.
File:a.png (357.08 KB,512x512)
Making some progress...
File:index.png (Spoiler Image,4.25 MB,2048x1536)
Ehh, so many of these are horrifying so I'm going to put them behind a spoiler. I think I'm going to try that thing tomorrow where you can selectively "refresh" parts of the image
Has science gone too far?
From AI i've used myself these arent so bad
Is that an anthropomorphic "furry" Koruri?
the rendering and shape is good, but it's still making mistakes. Just that it's focusing on something simple so the mistakes are better disguised
did you use the prompts from stable diffusion to make that?
I didn't make it. I got it from the stable diffusion thread on 4/h/. I've been lurking it for a few days because it's a lot slower than the /g/ one and seems to have more technical discussion.
I just wanted to share it because I thought it was a pretty good generation.
There's a new model called Hentai Diffusion that was trained on Waifu (ugh) Diffusion and 150k danbooru/r34 images. I guess it'd be better at nudity?https://huggingface.co/Deltaadams/Hentai-Diffusion/tree/main
You might need a huggingface account to download it. I have one because I was going to upload a set to train or whatever, but then I saw that there doesn't seem any way to use their GPUs without making it public and they have rules against nudity and I also wouldn't want to upload an artist's work for others to exploit for real instead of making stupid things on kissu.
Wish I had more VRAM. Oh well.
I've seen AI that write code and this reminds me of some of the shortcomings people had with it.
While they were trained on a large database, it would often be the case that the AI was technically copying programmers from stack overflow and using the raw input information into people's software.
I feel like it's almost the same case here. It took chunks from every artist it saw creating essentially a collage with little creative problem-solving of it's own... and when it does it's simply a confused error rather than inference.
I was much more impressed by reimu's breasts
it's almost as if machines cannot think
I've been messing with SD since last week using https://github.com/AUTOMATIC1111/stable-diffusion-webui
and the danbooru model https://thisanimedoesnotexist.ai/downloads/wd-v1-2-full-ema.ckpt
I've been doing only txt2img cause apparently I don't have enough GPU RAM for img2img (laptop GTX 1660Ti).
A couple of images turned out to be cute most are pretty bad, or maybe my prompts are bad who knows.
I've been thinking of setting up my local server to produce anime images 24/7 with some script that autogenerates prompts, not sure how its GTX 970 would handle it though.>>97467
Not messed with lewds too much for now, going to download it and try.
really hoping that's a cute boy and not a g*rl
Yes, science has gone too far.
You will live long enough to see robotic anime meidos for domestic use and you will be happy.
It's kind of interesting to see a real artist use it. I'm assuming he did the img2img thing which uses an image as a guide since it's got his style's wide face and ludicrously erotic body proportions.
This is a good example of how generic it looks when compared to the real thing, which you can't really get around since generic is exactly how it's supposed to function. In theory people can (and will certainly try) to directly copy individual artists, but so far it's pretty bad at that.
When you really break it down, "AI" art is more or less the same thing as procedural level generation in games. The computer is provided with a set of rules, and then randomly generates something that follows those rules.
That's also why I can't see it outright replacing artists like a lot of people are afraid (or if they're psychopaths, hopeful) it will. You can generate all of the level design for your game procedurally, and a lot of games do (minecraft, for example). But "level designer" still exists as a profession for a reason.
NovelAI's model has been leaked. hehehe. Meaning you can do it offline without paying them.
It's 52gb with multiple models, and I doubt I'll be impressed but I'm torrenting it anyway.
Can you post the link?
Thanks, adding it to the hoard.
But someone is replying about 'python pickles' and I have no idea what that entails. I guess he's telling people that it could contain a virus or something or otherwise have code in it? There's this link but I have no idea what it means: https://rentry.org/safeunpickle
Does anyone here know python and can tell what the thing above does? They made it sound like it's something to use to check for malicious stuff or maybe I interpreted it wrong.
But, people on 4chan are already using this so I think it's safe
Pickle is a data serialization library: https://docs.python.org/3/library/pickle.html
Serialization means turning in memory data like objects into a format that can be stored on disk or sent through a network. JSON is another common serialization format.
I don't use pickle much but unlike JSON which is plaintext, pickle is binary so when you deserialize it yes it's possible that arbitrary code hidden in the data can be executed.>https://rentry.org/safeunpickle
After a quick glance it looks like that code overrides some of the functions described in https://docs.python.org/3/library/pickle.html#pickle.Unpickler
The overridden "def find_class(self, module, name)" seems to implement some kind of whitelist so that only certain kinds of data(I guess considered safe).
I can't guarantee that code actually protects against possible code execution though, if I were you I would download it if you care but wait some time before executing it and see what happens.
All the AI talk is hurting my no-knowledge-on-AI brain. Apparently there's going to be a part
2 to the leak, can't keep up with /g/ but am happy to download/seed it though.
anon created a guide, probably 100% the real deal nowhttps://rentry.org/sdg_FAQ
my wife chino is ballin'
I swear to you guys I was arguing on another corner of the internet that I'm not interested in AI because it couldn't create art in the style of a particular artist, and the artist I was referring to was literally Zankuro in specific, yet here we are. I was crushingly naive. I wonder how far off we are from it making lewd gifs in Zankuro's chubby loli style...
Unsurprisingly it fails to capture Kuon's beauty, although I don't know how to do the tagging with this for Kuon_(Utawarerumono) so I took a guess from what I think I remember seeing.
This one came pretty close to getting her face I think. But, I need to do a thing where I train it.
This is something I/we need to read up on that apparently is a big deal: https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/2284
Do you mean abmayo?
it works because mayonnaise is a a もの, so it's all cool
People are making animations with it somehow using a script. (webm contains nudity)
Copied /h/ post:you can already make a video
this was one an anon made
afaik the keyframes were something like
Time (s) | Desnoise | Zoom (/s) | X Shift (pix/s) | Y shift (pix/s) | Positive Prompts | Negative Prompts | Seed
0 | 0.6 | 0.9 | 0 | 0 | sleeping in bed, under sheet | | -1
2 | 0.6 | 0.9 | 0 | 0 | shower, washing self, naked, from above | |-1
4 | 0.6 | 0.9 | 0 | 0 | eating breakfast, dressing gown| |-1
6 | 0.6 | 0.9 | 0 | 0 | Sitting on a bus, uniform, from below | |-1
8 | 0.6 | 0.9 | 0 | 0 | on stage, bikini, tattoos, singing, full theatre, bright lights, microphone | |-1
10 | 0.6 | 0.9 | 0 | 0 | drinking at a bar, cocktails, black dress, cleavage, earrings, drunk, flirty | |-1
12 | 0.6 | 0.9 | 0 | 0 | bed, (doggystyle sex:1.3), pubic hair, 1girl, 1boy | |-1
14 | 0.6 | 0.9 | 0 | 0 | passed out in bed, under sheet | |-1
I couldn't open the webm in waterfox or firefox, but it worked with brave and mpc
The miku virus infects all...
Good way to find embeddings on 4chan: https://find.4chan.org/?q=.pt (Warning: NSFW thumbnails images are likely)
(They're pt files)
Just grabbed a ZUN one that I'll try later
There's a fad that started with an AI generation thing with a glowing penis that has artists imitating it. Kind of meta.https://www.pixiv.net/en/tags/%E3%82%B2%E3%83%BC%E3%83%9F%E3%83%B3%E3%82%B0%E3%81%A1%E3%82%93%E3%81%BD%E8%8F%AF%E9%81%93%E9%83%A8/artworks
From one of my favorite random creative artists (and maker of that one furry Patchy)
that's just a bioluminescent mushroom dude
The anatomy is quite weird and of putting but I guess most Ai images are like that.
I never tried it, maybe I will.
Yeah it's not perfect, but the thing with porn, especially niche stuff that doesn't otherwise exist, is that the brain overlooks it due to the excitement and stimulation over the rest. It's like your choice is a handful of doodles from some guy from 2008 or this thing creating new amalgamations of fetish fuel with errors. Most people have no reason to use this for porn, really, since it's easily inferior to something created by hand. But if that stuff made by hand doesn't exist? Yeah...
Oh, I'm quite aware dickgirls haven't been niche for like 15 years. The fact that it's ubiquitous is also why any simple image doesn't work, it's no longer manna from the heavens by virtue of existing. Find me some quality newhalf mermaid art with a human penis instead of some weird "realistic" furry dolphin version. Also, give her a nice soft belly, a mature face, a warm smile and an apron. Also it's Takane from [email protected]
, a girl that shares the face of the first 2D girl I had a crush on (since Luna is too old/obscure to have training data). Here's one I just generated, although it has some pretty noticeable errors.
People have fantasies more elaborate than "a girl with breasts of any size, preferably alive" and it's not any different in my situation just because a penis is involved.
To make an image set like this you want to go down into Script and use X/Y Plot, then select Prompt S/R.
In this example I have it start with 10% angel and 90% demon and then end with 90% angel and 10% demon.
The X/Y script is a massive help in finding the ideal settings, so people use it a LOT
.pt files show up in a few places, but when people are talking about it and it's not troubleshooting it's about hypernetworks. Back when I made that post embeddings were the cool thing (and they also use .pt), but now it's hypernetworks. They're basically fine tuning things for a certain concept, but it's almost exclusively specific artists or characters. IE this was using the embedding that mimics abmayo >>98094
Embeddings are called by name in the prompt, whereas hypernetworks are loaded in the Settings. Embeddings are 20-80KB whereas hypernetworks are 85+MB
. I personally liked embeddings a lot more not only because of the file size but because you could combine them. I guess hypernetworks are better and that's why everyone uses them?
Here's my embeds folder. Some of them were just uploaded without labels and I never figured out what they did, like the 3 named "ex_penis".
Extract the folder in the main WebUI folder so it's like:
and then you should be able to use them.
The badprompt ones is actually something newer. You put it in the negative prompts with 80% strength, I.E I use (bad_prompt2:0.8), lowres, bad anatomy, etc
Does this only work for two tags? Or can you batch together multiple into the percentage.
Probably, but I haven't checked. I guess it'd just be A:B:C:# for 3 and so on
Interesting sort of addendum I found for doing this sort of thing:>you can [x:y:z] / [:y:z] / [x::z] to make the ai draw x then y at stemp z (or percentage of steps if you put a decimal), which works great for stuff like [tentacle:mechanical hose:0.2] to make the ai draw tubes everywhere, or you can do x|y... to make the ai alternate between drawing x and y every other step; you can put any number of things here e.g. x|y|z|a, but obviously the more you use this the more steps you need, in general
That's exactly the post I saw that made me want to try it. I heard people mentioning this functionality weeks ago but completely forgot. It seems rare that anyone uses it, but it could be really great
When I try making one of these I get a >RuntimeError: Prompt S/R did not find angel wings:demon girl:0.1 in prompt or negative prompt.
Does this mean I need to put the tags into the prompt somewhere? Or attach an X to them?
The first thing listed there has to be in the prompt for the rest to replace it. You should be able to hover over it for a tooltip.
masterpiece, picnic, turtle, eating banana
in the script you'd put
banana, burger, corndog
Gotta say, reading the documentation for all this stuff regarding stable diffusion has really impressed me with how much work and development has gone into making the open version as great as possible, beating out even its premium competitors.
I guess this is the true power of computer dorks trying to get the perfect porn.
You use img2img, which can itself be guided with a text prompt like txt2img so it's really more like img+txt 2 img.
As an example here is an image I drew
It's likely a very generic prompt that has a denoise of like .5 or something to keep the general shapes but still alter it enough to be noticeable. I saw someone point out that they look like Genshin characters, so it's probably using something trained on its images.
I have a Genshin hypernetwork for that so let's see the result when I throw some stuff in: (pic related)
I don't want to spend a bunch of time trying to replicate it, but you get the picture. It probably uses a few traditional artists tags since people have done lots of examples of those, including myself
... and if I do the same prompt and same settings as in >>100505
but without the input image, this is what I get.
The cartoony nature of my image is at odds with the Stable Diffusion model's realistic photograph style. Getting anything done with this sort of thing is probably best when it's iterative, mixing both txt2img and img2img.
Oh, after testing with this it does seem to greatly increase the time it takes to generate stuff, so maybe only use the 'image preview while generating' thing if you're unsure where to stop when working on settings, and then set it back to zero when you're actually producing a bunch of stuff.
A lot of knowledge about this stuff requires scouring and searching or surreptitious posts, so I'll try to share some more info.
This time I'm going to talk about two Extensions
that I use a lot.
The easiest way to get new Extensions is to go to the Extensions tab of the WebUI and then go to Available and hit the "Load from" button with its default URL. From there you can install stuff, which will then show up on the Installed tab. For a lot of this stuff you need to restart the UI from settings if not restart the .bat file itself.
The ones I use and can give detail on:Dynamic Prompts
This is used to randomize creations on each image generation. You can use it with new words in the prompt, but I've never done that. Instead, I mainly use this to call random effects from wildcard text files. You create a text file with a new line for each possibility put the text file in /extensions/dynamic-prompt/wildcards/text.txt and then call it from the prompt by its name with two underscores on each side. For instance you can make haircolor.txt and put this in it:green hair,
and then put __haircolor__ in your prompt and it will randomly pick one of those each time an image is generated. This means a you can make a batch of 10 images and come back to different results. This is really, really good if you're just messing around to see what works. It can also call other txt files from inside. I'll share my wildcard text files soon. It also has a "Magic Prompt" system that I've never used, but it could be cool? Beats me. Someone else do it.TagComplete https://github.com/DominikDoom/a1111-sd-webui-tagcomplete
It autofills booru tags for stuff based on danbooru, which NAI and the 'Anything' model is. Really, really nice, but can also be annoying at times with the pop-up. Unless you have tags memorized this can help a lot. Speaking of, you should make yourself accustomed to danbooru's tags:https://danbooru.donmai.us/wiki_pages/tag_group%3Aposturehttps://danbooru.donmai.us/wiki_pages/tag_group%3Aimage_compositionhttps://danbooru.donmai.us/wiki_pages/tag_group%3Aface_tagshttps://danbooru.donmai.us/wiki_pages/tag_group%3Ahair_styleshttps://danbooru.donmai.us/wiki_pages/tag_group%3Aattire
this sort of thing has been possible for a few years, but without danbooru datasets used for art training it wouldn't be easy
It's already a few years old, isn't it? Here's the original Reddit thread about it
, and it's been discussed on 4chan in the past as well. I recall there originally being some talk about the possibility of it actually being used for tagging, but it's not good enough to replace manual tagging anytime soon and is otherwise little more than a novelty.
Oh. Wow, it's 4 years old?
Well, anyway, it's really cool how it's used here for immediate benefit. You can use it to assist in image tagging for training, but also as a building block to generate new images.
So, the training setup I put together from what I read. Much of the information is from the discussion here: https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/2670
Also, thanks to people on the /h/ board of 4chan as those guys are great. Don't use /g/, but maybe that should go without saying.Modules:
(I checked all of them because it's unclear what they do. Everything was checked by default except 1024 which seems to be a new addition)Layer Structure:
1, 1.5, 1.5, 1.5, 1. This is called a 'deep network' as opposed to default or wide. Default is good for most things, particularly if you have a low amount of images (20ish was mentioned). Wide is for specific things like an animal, character or object. Deep is for style, which most people seem to be using hypernetworks for, with embeds for characters. It doesn't have to be, but that seems to be the pattern forming. Activation Function
: Softsign. lots of math talk and graphs I don't understand, so I just went with the recommendation.Weight intitiation
: XavierNormal. Same thing as aboveLayer normalization:
No. I haven't seen anything informative about it, but no one seems to use itUse Dropout:
Yes. I heard it's good if you have a "larger hypernetwork". I think that means the numbers in the Modules up there and also the amount of training images used. I had 90ish images and did the mirror image thing to turn it into 180ish, but that's definitely not as good as 180 unique images. I don't know if it was good or bad that I used Dropout, but it didn't ruin anything
And once you get to the Training tab you can load the hyper you just created (or one you've downloaded maybe? that part seems questionable)
This tab is for training embeds or hypernetworks, but I've only done hypernetworks so I can only talk about that.Batch size:
I haven't been able to find conclusive information on this since 'batch size' is text that is shared in every prompt so you can't just search for it by name. It uses more VRAM, but might not necessarily be better at training. The ONE comment I've found on it says that you could increase it instead of lowering learning rate later on. I'm already at VRAM limit when training and having a video and photoshop open so I don't touch this.Learning Rate:
: I think people start with the default for these. Only the hypernetwork number matters for hypernetworks. I see people add a decimel point in front of the 5 as the training steps reach 5000 to 10000, so I copied that. It sounds like the lower number is better for finer detail once you've established thingsGradient accumulation:
: A newer thing, supposed to assist in training rate somehow, but I don't know how. It mentions something like "learning in parallel" or something. I don't know. People say to use it and set it to like 5 so I have it at 5.Dataset Directory:
- The image with the folders. I could talk about images, but I'd mostly just be repeating this: https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/2670Prompt template file:
This is a list of basic prompts that are added to previews alongside the included tags attached to each image. People say it's fine as default, but might be something to mess with if you want to check for specific stuff?Width/Height:
Keep it at 512/512Max Steps:
: How far the training will go. This is stuff that takes days, though, so I'm not sure how useful this is, because of something I'll talk about in a sec. I suppose it's good if you only want to run it for a set amount of time.Save an image every n steps:
. It saves an image as if you prompted it with random tags included in your training folder, but it can make freaky combinations that you wouldn't normally use so keep that in mind.Save a copy of embedding every n steps
This is an important one and why I didn't care about the Max Steps thing above. It saves the hyperwork with the number of steps to the folder automatically. By default it's at 500 which is where I have it.
This means the folder will look like:
as it trains for longer periods of time.
There's an option in settings under Training to Save Optimizer state, which allows you to resume training from these saved files. VERY important!
.Note: To use the hypernetwork (or resume it from file) you need to move it from the saved directory (default textual_inversion) and move it to the models/hypernetworks folderSave images with embedding in PNG chunks
I think it lets you use PNG info like normal generated images. I kept it on.Read parameters from txt2img
For preview images it takes what you typed in the txt2img tab. I never used this since I wanted a variety of images, but it could be useful? I read to never use tags like 'masterpiece' or 'quality' there, though.Shuffle tags
Yes. It adds more variety to the images by changing priority or something.Drop out tags with prompts:
I think it drops a certain percentage of prompts per generated preview image. I kept it off, but not sure. It's just preview images and not the actual training itself, so I guess it could improve or hinder perceived accuracy there.Latent Sampling Method
I only hear people mention deterministic so that's what I went with.
they all seem kind of kuon like
Well, it's the same artist so they should look somewhat similar, yeah.
But in that image I posted I can identify different characters. I'm not sure how exactly it works because sometimes it's very random, but other times it's obvious, like how this is an attempt at Kamyu (even though I never labeled her and it wouldn't understand it anyway)
i think it can get more perfect
Hmm... after a full night I'm not sure if it can. At least overall, I think getting a perfect one is still going to be rare. I see what people are saying now and that you're probably not going to notice gains after 20000 steps or so. But, I think it still needs improvement somewhere.
I need to look at it and see if there's stuff I can improve upon, which basically means I'll train it again at a different rate. When I tested making a hypernetwork a few weeks ago one night was like 2000 "steps", but now I just did 58000. (meant to do 50000, but forgot I resumed from 8000). It saves a progress of its current progress, so if for instance it made the best result at around 20000 steps and then went haywire, you can grab the backup it created at 20000 and either just use that as the completed product or resume training from it.
Well, now that I've done the style hypernetwork I should try making 'embeds' which I'll use to teach it characters. It still doesn't know who any of these girls are so I can't actually call them directly and instead need to use their traits and hope it arranges it correctly. For instance it'll never know Kuon's proper clothing or ears unless I create an embed which I can invoke from the prompt. From what I've read, when training an embed you label everything EXCEPT what you want it to call. Maid Kuon at the computer!
It really can't do hands or keyboards, but that's not specific to this.
dont remove it, kuon looks sexy with a slave collar
It wouldn't be removed exactly, just require an actual 'metal collar' prompt to show up. I guess I should remove 'breasts' from everything, too. I'm not sure why boorus have redundant tags like that.
... I think?
'detached collar' isn't even here, so maybe this is something I can't avoid at all. Or maybe this is the base data... I guess I should do some testing... either way it probably wouldn't hurt to specify things more
God damn. Okay, I can say that switching from a 1, 1.5, 1.5, 1 neural net thing (whatever that means) to a 1, 1.5, 1.5, 1.5, 1 one was a massive upgrade. I don't know why. Oh, and I think I MIGHT have clip skip set to 1 instead of 2, but that wasn't supposed to be a big deal. Hmmm.
The first one here, 'aquaplus' was my training 2 nights ago whereas the two others are different checkpoints from the one I trained last night. I just don't understand how it's such a massive improvement.
but on the default euler a sampler they look fine, but the first one seems like it's probably the best, but still within the normal variation you'd expect.
more testing needed....
damn that looks really good too
at a glance it's hard to tell those are AI
Congrats, seriously. Looks damn good.
By now the default with the newer fine-tuned models is far more impressive than the NAI leak stuff, but it's still all built on that. You're actually in a far worse position if you're paying for it now.
This high detail one is based on a mixture of real life and 2D art so it can do things pretty well, but you have poor control of it. It looks extremely impressive, but it's not obeying my very strict training of Kuon's outfit, so imagine trying to get something that you haven't trained.
I've been wondering if I should start grifting since I know what I'm doing and everyone else that knows what they're doing is, well... kinda normal. But, being normal is what gets you the most exposure and success. I can't name gacha characters, for example. But, I could corner an AI fetish market especially if I combine the training with my 3D models. This is when it'd be good time to have the motivation to do things
Sounds like such a bad way to put it... but I get what you mean. I think if you consider yourself capable and with some desire to do so you should try it out. I mean training your own stuff is probably a long and arduous process, more than most are willing to invest into it.
Im very proud of your progress.
Its kind of crazy that if you put the effort into training one of these things you could have unlimited fetish porn?
I havent played with novelAI outside of porn but I may generate non-ero OG fallout fiction with it to see its quality
>>102405>unlimited fetish porn
That is exactly what it is. If people think there's "porn addiction" now, wait until normal people get a hold of this stuff. I have a blog of the pornographic progress I've made on /megu/.
Still, I'd prefer human-made stuff if it existed, and stuff that's more mental like incest isn't really something you could satisfy with an image alone. You can't tell stories with this and stories are really, really
good. A good doujin is easily better than this stuff, but when you're dealing with specific tastes then yeah, it's the best option available.
NovelAI really likes futa
What's interesting is it doesn't necessarily make the same mistakes a human makes when drawing hands. It makes its own sort of mistakes you don't see in real art.
The most popular checkpoint models these days (for those doing it offline) are a mix of 2D art with conventional photography. It increases hand quality a bit, but it's still far from from passable most of the time without a bunch of "inpainting' which is basically like selectively retrying a part of an image. Some of the models people use look more like real life photos with an advanced filter on top, which can be very creepy and also takes away from some of the appeal since it introduces 3D limitations in perspective and such
Colouring shouldn't be hard for an AI. It could just pick colour pallets form existing images in the same pose and apply them.
Even though the line work is done the lips of the girl on the top right are weird. Also I am not sure if I ever noticed this before but the hair on these AI girls is quite bizarre(the ones on the right), not only is the fringe not symmetrical but the far side looks weird.
The backgrounds are odd too, the sunset kind of drops suddenly in one image and the floor boards in another are all different widths.
I mean top left not right...
As an example, as I continually attempt to refine my custom merged checkpoint for, uhh... /megu/ reasons you can see the effect of one of the models already having some RL stuff mixed into it. The shading is absurdly good, but I really have to fight it to create clothes that aren't modern and it feels very "real" which can be a good thing and a bad thing depending on one's tastes. (also as a side note need to figure out why it's ignoring tags)
And look at that hand. I didn't do any editing here. But, it definitely looks like a real human hand. I don't know how to feel about it. I guess maybe for now it's a sacrifice to make if you don't want to do edits, but I like style over reality.
really amazing for ai hands
I wonder if it's possible for AI to make manga or if that's far too many variables to be solved in a realistic timeframe
Have you ever tried using doodles you make as a base for the AI to build off of? Wondering if that's more effective than just generating a bunch of images that may vary in psoture/position each time.
I did a whole bunch of testing with various RL models to see if I could understand how exactly people are making them assist in 2D hands/poses while not giving them a massive hit in quality and I really could not find any pattern. Although, my tolerance for spending hours making small merge differences is getting pretty low and I need to spend some time doing other stuff before getting back into it.
However, I did think I have an idea of how to bandage it. THe LORA things are basically like "plugins" for a checkpoint model, and for example the Amaduyu/Aquaplus one I made is pretty good at fixing the faces, but then of course they will always have at least a hint of Amaduyu/Aquaplus so I'd need to mix them with other LORAs.
It's also useful to use a thing called kohya, which is normally used to create LORAs, to separate merged checkpoints into their base ingredients. This means you can more easily control the intensity of something without needing to create a bunch of 4-8GB merge files.
Seems like there aren't any new amazing models recently, just merges of existing stuff (although some of them are quite impressive)
So, I can't think of any notable breakthroughs in the past month, just refinement.
Still, I continue to be annoyed by all the people using "waifus" in these things. I know, I know, it's a generational difference and they don't know any better. But it still annoys me.
With the popularity of AI voice cloning and eleven labs AI going payed, I decided to look into some of the offline runnable alternatives. The most popular one or alteast easiest to setup, seems to be Tortoise-TTS. It works okay enough and has some pretrained models the author directs you to use. There's a a guide and git repo someone setup that provides this service with a web interface https://rentry.org/AI-Voice-Cloning
The biggest issue I and many others have with tortoise is that the main author won't release a guide or overview of the process he went through to train his model. Simply saying if you're smart enough you can figure it out. Kinda leaves people at in impass for actually using this program as an alternative to eleven labs.
I've had some minor success with one of the alternatives (unofficial) VALL-E, https://github.com/enhuiz/vall-e
It's taken me a bit of dependency chasing and cobbling together a separate PC to install linux on (the DeepSpeed dependacny has been a nightmare to get working on windows) but, I've actually been able to get a "decent" output with a 3060 12gb card and about a day of training on ~7.4k couple sec audio files ripped from Vermintide 2. I'm not an expert in ML but the result I got with training a model from scratch with this limited data set and a "low powered" card make me optimistic for VALL-E's potential. I didn't really have to know much about machine learning, just how to install various dependacnies and 3rd party utilities.
VALL-E is based on phonemes so the text to be synthesized is meant to be sounded out, I think. I don't know if there is a whole lot of prompt engineering that can be done with this program, though my current model is probably too limited and untrained to really test that out.
Attached is the voice I wanted to clone.
Here is the output.
The prompt text was "Blackrat spotted! Keep your guard up!"
>>102756>>102801>refine my custom merged checkpoint
I haven't been closely following your posts, more just watching your results, when you talk about custom check points are you training your model (custom data set of /megu/ images) starting with some base model as a checkpoint? How are you doing that for stable diffusion and what sort of time sink is it/hardware are you using?
Mm, how to explain...
The "custom model" I've been talking about recently is a merge of existing checkpoint models, which is something like NovelAI or Stable Diffusion. My most recent one using Stable Diffusion, NovelAI, Yiffy (for genitals), Anything (that's its name), AbyssOrange2/Grapemix and a couple others that I'm trying to switch in. (GrapeMix doesn't seem to have any RL image data, so I add in some of the basil mix myself that AbyssOrange2 does, that guy was onto something for sure)
They're large (2-8GB) files that contain a whole lot of training data and you need a really powerful GPU to train them. I'm not sure you can even make them without something like 24GB of VRAM at minimum, and then you need a whole lot of time (weeks of constant processing) if you don't have like $50,000 worth of processing power sitting around.
However, someone like me can create merges of them with custom settings that hopefully take the desired parts of A with the desired parts of B. But, you definitely make sacrifices when you do it and the trick is to try and counteract them. It's a really annoying process, though, because there's no guide to see what each setting does so it's a bunch of trial and error. Every time I think I noticed a pattern, I change a different slider and it completely invalidates what I thought I knew. Also each merge takes like 30 seconds to create, 10 seconds to switch to, and then however long the generation takes on your current settings. Also when switching between them your VRAM can get corrupted somehow and you need to restart the program so you don't get false results. Each merge is also 2-8GB so you have to routinely delete them and take screenshots/notes of what you've learned, if anything, from the merge results.
The main training data I've done myself is for Kuon and the Amaduyu (Aquaplus) hypernetwork/LORA things, although I've done some other artists to mixed results. They rely on getting layered on top of a checkpoint model, so they're heavily influenced by it.
What kind of timesink is it? Weeks, but I do other stuff while it's merging and generating. I can't imagine most people will want to do it. But, I've also been doing this stuff since early October so I guess it might be a slower learning process for other people.
As for hardware, I got a "good" (less absurd) price on a 3080 12GB for Black Friday
I think I can describe the difference better. The "checkpoint" model is the database that has the actual definitions and data on the information of a tag. When I trained my Utawarerumono, Kuon, and other things I was training it against the NovelAI model. The images have tags like "ainu clothes" or "from side" because that is specifically the booru tags that NovelAI trained. I'm not defining what those are, I'm providing information on what they look like when drawn by a specific artist, and the training process compares it to the information that NovelAI has. There's a huge gulf in defining the tag itself and merely referencing it.
People, including myself, have trained concepts (which is what a tag is), but it's just one at a time.
The horrendously named "waifu diffusion" has been undergoing training on its new version for over a month now, but it was just at Epoch 2 when I last checked a couple weeks ago so it might be at 3 now. One epoch seems to take about 12 or so days to complete? People said the first epoch sucked and 2 might have shown that the finished product could be good, potentially, but we'll have to wait and see. It will probably not be something to test out for real until Epoch 6 or so?
But, I haven't been paying attention to any news about this stuff
what is a checkpoint model?
nevermind, it's a database with the tags associated with images
Basically that, yeah. It's the skeleton that everything is built upon. The most famous one is Stable Diffusion (SD) and everything I'm aware of for offline AI image generation makes use of it. The 2D models still have the SD data in them so you can use words that boorus have no knowledge of and get results.
It's worth noting that most people using the offline method are using the older (1.4 and 1.5) versions of Stable Diffusion because the ones after that started aggressively purging nudity (but not gore) and potentially other things, from the training data. This had the effect of breaking the things trained on the older models, which includes stuff like NovelAI which nearly all 2D models make use of.
The last time I checked people were not so impressed with the newer SD models that they were willing to sacrifice a "pure" data scrape in favor of one curated to make it more attractive to investors
The newest technology just came out a few days ago!
ControlNet lets you control the generated image by pose and composition though normal and depth map, edge detectors, pose detection, segmentation etc. This is much easier and finer control compared to regular img2img.
An extension for webui also allows you to adjust pose as you wish.
hm, so they gave up and decided that this is where humans need to come in and give the images context.
That's quite the leap.
Could you make a leaping Megu?
Dang, that's cool. This is what happens when you take a break from checking for AI news in /h/, huh.
Seems neat, but it's also introducing more effort into generation which isn't really my thing. I had tried to use depth maps about a month ago, but learned that it was limited to Stable Diffusion 2 and above, which kills any desire that the majority of people on imageboards would have for it. So any extension that makes use of depths maps, but not requiring the neutered corporate-friendly SD is great.
I'm not sure I'll use this, but it's cool to see in action nonetheless
Wonky legs, but still impressive.
I searched through some 4chan pages and found that someone did create an Ume Aoki LORA. It seems to work pretty well at capturing the style and also seems to capture Miyako to a degree, but it's still not accurate.
It's in here if you want to download it yourself (use Ctrl+F) https://gitgud.io/gayshit/makesomefuckingporn#lora-list
So, I told the guy to start amassing Miyako images which will be combined with the Ume Aoki style LORA.
Things to note for good training images for a character:
2. No complications like text overlaid upon her
3. Text elsewhere in the image should ideally be edited out
4. Limited outfits. Ideally it'd be maybe 3 or less, depending on how many images you have. When I trained my Kuon stuff I did not bother since she is portrayed in only one outfit about 95% of the time. Each outfit will need to be tagged in the training process and called upon manually with a custom tag of your own choosing during image generation later on. She can still be portrayed in other outfits, but if you specifically want her in her own original clothing you need to train for it.
5. Different angles and "camera" distance. The more variety of angles you have, the more accurately it can portray them later on during image generation, although it does a pretty good job of filling in the blanks since it already knows how human characters should look from different angles.
Then the images themselves should be cropped to be somewhere squarish. Unlike the old days of late 2022 it does not need
to be exactly 512x512 pixels, but you should avoid images that are too tall or wide (heh) at like a 1:3 ratio or something. I'll talk about the other stuff after I get the images
Miyakofag here, yoroshiku onegaishimasu, and my deepest thanks to yotgo for his help.
A very important factor to consider is how easily the characters go from chibi to normal and back, as seen in pic. In their non-chibi style their head shape is somewhat hexagonal, with fairly sharp angles, while their chibi form head shape is usually either between a full oval and a curved rectangle, or a mix of the two with a pointy side bit like Yuno has in the middle, and the first has regular eyes and features while the latter two are (✖╹◡╹✖). Also visible in pic is how the wides are presented in variety of outfits, like Miyako getting a change of clothes in the middle panel, and then immediately returning to the first one.
I've downloaded the manga, but it's monochrome and fairly crammed, so it doesn't look like it'll be of much use. Seems like I'll have to take a few thousand screenshots of the anime again, but that's fine by me. I'll also begin to comb through boorus for useful art, and there's this other meguca stuff I'll be downloading in case they can turn out to be of help for setting up Ume's general style:https://exhentai.org/g/2191043/e80d477043/https://exhentai.org/g/2262418/2d88611a04/
Hmm, keep in mind for a character that you want the character to be the focus and not the artist's style. It's better to have a more varied collection from various artists than a limited number from the official one. You're not training the shape of her head or how her mouth is drawn, you're training the combination of her outfit and eye color and hairstyle and the visual traits that identify her.
I can generate images of Kuon in different styles because it's not constrained to a specific style itself.
When I generate an image of Kuon to look like her original Utawarerumono appearance, I activate my Kuon LORA (Kuon herself) and also my Amaduyu LORA (the Utawarerumono style). Combining them into one would be severely limiting.
but with the character and artist separated, I can apply Kuon and the Umi Aoki style together without the influence of Amaduyu.
Hmm... not sure if this style will work.
Bleh. I had trouble training, but got it to work but then it came out like THIS. I really should have kept my old settings, but noooo I had to see what the new stuff was like.
I noticed that some of the images you gave me were small and I think I'll have to exclude those. They should at least be 512x512 and I think that's the main reason why it looks so blurry and low quality here despite being relatively accurate in some images.
nice, hexagon headed kuon
It already looks really, really good.
The small crops are my bad, I had taken "does not need to be exactly 512x512 pixels" to mean "smaller pics are okay", there should be a dozen, dozen and half pics to remove then, maybe a few more. There's also one where she has her top but not her shirt, which may explain the result on the top-right.
I reduce the strength of the Miyako LORA and the image clears up, but then it becomes less accurate.
Bleh. Yeah, I need to train it again with better images.
It works, but finding the right prompt can be exhausting. It looks like you're using some online model and those have some pretty severe limitations. I don't really know how to best use the real life models that use verbose text rather than booru tags. There are prompt repository sites like: https://lexica.art/https://docs.google.com/document/d/1ZtNwY1PragKITY0F4R-f8CarwHojc9Wrf37d0NONHDg/edit#
but also personal pages of research people have done like https://zele.st/NovelAI/
After a bunch of testing, I think I'm satisfied with this Miyako LORA. It seems to work best with the Anythingv3 model, although I haven't done hours of tinkering. But, this reminds me that I really need to create a good SFW 2D merge of my own, but I keep struggling to have it look good with multiple different prompts and LORAs.
I also know now how to 'host' it and allow people to connect to it, but my upload is capped at 1MB/s so the limitation is there...
Something that was very interesting about collecting a bunch of screencaps of her is that it helped me appreciate the amount of variety in the girls' wardrobes.
Since training material for a specific character requires consistency in their looks we decided to go with her standard school uniform, however, they regularly spend around half of an episode outside of Yamabuki wearing their casual outfits (of which each wide has maybe a couple dozen or more), in some cases they don't go to school at all, then there's Winter episodes where they're wearing a coat, and at one point she has a hair bun like Hiro's, I assume it's simply because she felt like it. Add to this Shaft's abstract cuts decreasing their screentime, how due to her character she has what is perhaps the highest regular:chibi appearance ratio, on top of needing her to stand alone without overlapping with other people, and I ended up only managing to take 62 usable captures out of the entirety of Honeycomb+Graduation. Far, far less than what I initially expected, like the max ~100 taken from 1171 fanarts of her. Thankfully, it was still more than usable.>>104046
Very late reply, but when I first saw this my heart skipped a beat. It's incredible, warm. She makes me very happy and I'm overjoyed to see it work so well. Very thankful for this.
X |||____________________________________________||| X
bottom left is a JRPG protagonist
Maybe if we combine it with >>104530
, we'll create the legendary「Shin Hiroi Yuusha」。
Can you link what guide you're following and what step you're at? It might be best to find where the stuff is installed and wipe it or something. I'm not sure...
You could try googling the error message in a 4chan archive maybe
I did wipe my VENV and either it's not in there or something went wrong maybe (although maybe it's fine?)
I'm using https://rentry.org/voldy
and I'm just tweaking the asuka image right now. My current issue with it is "vae weights not loaded. make sure vae filename matches the checkpoint, replacing "ckpt" extension with "vae.pt"." and I'm a bit confused of what to do to fix this one, but maybe since I'm getting a known error the taming transformers thing worked? I dunno. However, what I'm wondering about right now is getting this, I forget if xformers is important or not and if it is, how to install it.
Also, why is it that sometimes my generation lags just because ff is open even though I'm using chrome...
>>104604>My current issue with it is "vae weights not loaded. make sure vae filename matches the checkpoint, replacing "ckpt" extension with "vae.pt"." and I'm a bit confused of what to do to fix this one
It's talking about if you're using a model with a vae, you should have a file named the same to go long with it. For example, "Anything-V3.0.ckpt" and "Anything-V3.0.vae.pt"
. I'm pretty sure it should work fine if the model you're using doesn't have one.
I've seen people recommend that you put VAEs in a subfolder. I.E:>blah/models/stable diffusion/vae
The vae mostly determines color and you can select them manually or switch automatically if the name matches, as you said. I don't remember where those options are in Settings.
You can put this into the Quicksettings list under "User interface" in options and then the main screen will let you switch these around without needing to go into the Settings every time:sd_model_checkpoint, sd_vae, CLIP_stop_at_last_layers
Do you mean when you take a generated image and take it to the img2img tab, or do you mean the scaling "postprocessing" that does that automatically during generation? I don't do the first one, but I do the latter sometimes. The problem is that it's a total VRAM killer, so I go from generating 8 images at once to 2 or sometimes even 1.
Maybe I should try the "manual" scaling sometimes, but I just haven't felt the desire to do so. I like seeing the final image and not doing anything to do it afterwards because then it begins to resemble work since this stuff doesn't really satisfy the creative urges. I like setting it to generate a bunch of images and then doing something else, too.
I just spend the past 2 days downloading and organizing LORAs, so I'm going to be generating a lot more Kuons soon. Hehehehe.
One day soon I might redo my Kuon and Amaduyu LORAs, particularly the Amaduyu one that controls the art style because it tends to produce a lot of errors that aren't otherwise present. No idea what I did wrong with it.
This is something I saw a month ago that was way over my head. It still is, but it seems people have been using it very successfully so maybe I should give it a look sometime:https://github.com/cyber-meow/anime_screenshot_pipeline
Basically it automates taking tons of screenshots and tagging them and such so it doesn't take dozens of hours like what I did a few months ago...
I definitely have a bunch of shows I'd love to be able to reproduce in prompts, so this is right up my alley. I think shows like Mewkledreamy would need a lot of manual screenshots, though, since there are so many great frames that are barely there and would be easily skipped over by some randomized thing.
I ran into an issue and was too tired so I couldn't do it. Unfortunately it seems like a recent change in the automatic1111 thing (or maybe it's because this grid I'm making is different from usual) it's making all the batch image files at the very end. I don't really trust it to properly create 505 large images after many hours of work (where is it storing the data?), so I need to do it in batches which is REALLY annoying.
But, I've learned that it's also going to take much longer than I thought, at about 5 minutes per Style. If only I didn't need to generate one at a time to make this nice grid pattern with 7 different prompt sets, 2 seeds, and then the LORA change itself.
I guess I'm not playing the Nosuri game until this is finished. Oh well.
Done with about 120 of them so far. However, the question I now ask: How the hell do I organize all these images so I can easily determine the proper style for a thing?
I guess I can give them names like [Name][High Quality][Western][Realistic][Colorful][Big Breasts] or something?
How on Earth am I going to do this...
make the names into tags i guess, then use regex in the file explorer
>>105795>A good checkpoint model (mine is a custom merge of like 5 of them that are themselves merges that other people made)
So do you constantly merge models and stuff or is it one model you use for most things? Also is it possible to upload this one, I'd really like to check it out myself.
Thought it'd be better to ask here instead of cluttering up the other thread
Yeah. I talked about it a bit here >>103583
and the post immediately after that is the UI for creating a more involved "Layered" merge between models. I still don't understand it much, it's just a bunch of trial and error and I can't say I've learned much after looking through papers and notes from other people who similarly seem to theorize things only to have it change later. Pic related is a glimpse into my nonsensical rationality in trying to find patterns in the first merging experiments I did in trying to create furry-quality penises with anime visuals. The video at >>/megu/538 is related. VERY NSFW!
I have "formulas" saved that I test in all future merges I make, but they rarely carry over their benefits when making future merges with different models or even if you keep the old model and add a new one to it. It seems like they were specific to the merge at the time. If I make a note of "Slider IN07 gives great faces when set to 1" it does not necessarily carry over to merges between different checkpoint models.
Since that post was made someone had an extension where you can do "live" merging with models that lets you test it before creating a new 3-7GB file each time, so that helps a lot.
I usually go a few weeks between testing new merges because it's really exhausting. I create thousands of images while adjusting sliders and waiting for it generate and it's an all-day or multi-day affair. >Also is it possible to upload this one, I'd really like to check it out myself.
Yeah, I could try to upload it somewhere, although my upload speed is terrible. First I need to give it a real name, though. Hmm... I guess I could do a bit of publicity and name it after kissu somehow.
Uploading the LORAS? I downloaded them all and it was exhausting, but they're 120gb...
Lala is probably not in a Miyako situation that would require screenshots. Miyako's fan art is very inconsistent due to the source material itself being inconsistent.
Lala... well, I think she could be separated into "regular" and "precure" forms for clothing and hair, but she still has the same body and head shape. My favorite art of her is very "noisy" so I don't think it can be used, but I could try.
Alright, here is the link to my current model for use by kissu friends. (but I also made sure to include kissu advertisements in the files and password so even if linked elsewhere people will know hehehe)
I call it... *drumroll*
The [/s] <[Kissu Megamix]>
The compressed RAR is 3.5gb and my upload is 1MB/s, so I can't really upload a bunch of these, not that I would anyway since I can say this is the best version I have. While this model is focused on NSFW stuff, it can still handle cute. I don't know if it's the best checkpoint overall, but it's the best for my personal desires. My model lacks most of the haze that most of the RL mixes do, although it's not completely eliminated. The benefit of the RL models is from looking at the hands here. I didn't do anything to them, it's straight from the prompt. The password is in the text file, but I'll also post it here. Without quotations: "www.kissu.moe - my friends are here"https://mega.nz/folder/3OoAgSoZ#eqaY3KFat784_BPgk_ApbQ
Oh, I forgot to answer the question about multiple models. Yeah, I have a few I keep around but I overwhelmingly only use the most recent one I've created. The model I just linked is the normal version (which I'm using in that thread) while the other model sacrifices face quality and booru tag recognition to better generate a certain body part. (in other words it's closer to the furry model)
The others I don't really use much, but are there for comparisons sometimes. I have to keep the stuff around that I make merges with, too, of course.
In total, I've probably made about 200 merges, with 99% of them being deleted shortly after creation. If you count the merges I've done after the "real-time merging", then it's probably more like 500. It's really an amazing extension.
I never did make my pixiv into an AI account. Alas, such is the price of having no motivation to interact with the wider world.
Forgot to mention that I put (furry:1.3) in the prompt to demonstrate that while it has some benefits from the furry model, it's not overly contaminated by it. Patchy is still a human there. The Kissu Megamerge can do various bodies better than the majority of 2D models out there, such as 'gigantic breasts' and squishy plump bellies! (and the male anatomy attached to females of course)
My personal preferences:VAE
I use generally use the "1.5 ema-pruned" VAE, which I'm uploading right now to the same upload folder.
It makes it colorful (and sometimes looks "over baked". If that happens, use the default novel AI VAE). The other 2D VAEs are too colorful on this, but you could try them.Upscaler
I have also included the "4x-UltraSharp" upscaler in the mega folder, which you should put into stable diffusion\models\ESRGA
N folder (create the folder if it's not there). I did a bunch of testing and found that I like it the most, although the differences aren't major.Sampler:
DPM++ 2S a Karras. I'm not entirely sure on these samplers, but when testing different artist LORAs this one seems to have the most compatibility. I don't know why. Something to research more, I guess, but at the same time I don't really want to.
I have it at 26 steps, as going higher than mid 20s is supposed to be overkill. The rule for for upscaling is to do half the number of steps in the base generation, so 26 normal steps and then 13 Hires steps.
My default negative prompt is:(worst quality, low quality:1.4), realistic, nose, 3d, greyscale, monochrome, text, title, logo, signature
If you somehow end up generating furry properties, try putting "furry" or "anthro" in there.
I don't use any of the old positive quality prompts since they don't seem to do anything noteworthy. (I.E masterpiece, highest quality, etc)
Thanks for putting this together.
I think one of the most extreme hurdles I have yet to see AI overcome, and I can't even fathom how it would overcome, is creating images that involve specific details of two or more characters. It just can't figure out how to assign differing aspects to separate characters.
MultiDiffusion and Latent Couple let you use different prompts for different regions and are available as plugins for webuihttps://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111https://github.com/ashen-sensored/stable-diffusion-webui-two-shot
The MultiDiffusion extension also has Tiled VAE which lets you create much larger images without going out of VRAM
There's attempts at it, but it's more work than I'm comfortable doing with my VRAM limitations. (This stuff is a total resource hog)https://github.com/Extraltodeus/multi-subject-renderhttps://github.com/hnmr293/sd-webui-cutoff
Also apparently there's some major problem with the civitai site right now and anything downloaded is massively corrupt and can't be used. Whoops.
I guess I should go share this info with that /h/ thread since they've been helpful to me in the past.
Here is an example of the methods in action.
The top is using naive prompt 2girls, cirno, megumin. As you can see the character details got intermixed.
The middle is the MultiDiffusion method. I set prompt of each half to one character. Now the character details are separated correctly. Needs a little tweaking to let the two halves fuse together better.
The bottom is the Latent Couple method. It also separates the character details well and looks a little more natural than MultiDiffusion.
Heh, "Creativity". Well, as long as they're providing tools they can have their delusions.
I guess I'll try those that with Patchy adventure. Making scenery is really difficult with my current limitations and desire to not spend effort doing something to avoid spending effort
Can't SD already do video in some way? I've seen anime girl videos made with mocap and controlnet
It's may be a lot more effort compared generating with nothing but a prompt though
There's been ways, but this one is a simple text prompt with no other work involved as you said. At 256x256 I couldn't make more than about 70 frames at once before running out of VRAM, but I didn't look at settings much. To me this stuff is only as interesting as it its ability to fill in the blanks, the more work I need to put into it the less interest I have because at that point someone should learn to draw or animate in my opinion
It's gotten very good at realistic image generation.https://twitter.com/AIkawa_AIko_jp2
Ironically, it looks like it perfectly replicated the feeling of rage when you get pissed off that stuff isn't working