No.2334[View All]
There's been a lot of chatter lately about Deepseek. In the online circles I'm in, people have a politics-colored understanding, more or less saying "American tech companies couldn't do this, but an opensource Chinese company could and American tech companies are in 'damage control'". Which... I really don't understand. If it's an open source model, like Llama was, for example, I don't see how this doesn't just cause there to be a proliferation of much more efficient and performant models -- the same way after Llama became available, sudden there was Phi from Microsoft, Gemma from Google, Mistral, and others.
What does /maho/ think?
62 posts and 19 image replies omitted. Click reply to view. No.2516
>>2515I have no idea, I've never asked it programming stuff. I feel like it wouldn't be able to help at all for older VNs. People made some extractors for specific companies, but it's a mixed bag
No.2519
>>2510I'm getting in 4x48GB of RAM later this evening. I'll update with my results, but for now I can answer a few things:
>My choice of RAM is 2x32 or 2x48I currently have 2x32GB with a 4090 and DeepSeek R1 671b-UD-IQ1_S ran at ~1 token/s (
>>2445).
>with a $100 premium on the 48If you're not too worried about the speed, these are the kits I'm ordering:
https://www.amazon.com/dp/B0D286TQHVhttps://www.amazon.com/dp/B0D2888BLV>Do you need like 200gb across GPU and RAM to load the great model?It depends what you mean by "the great model". For the higher quantizations you only need ~88GB total between RAM and VRAM,
if using llama.cpp. If you want to use Ollama, you need to merge the weights into a single file and then you need RAM + VRAM to be quite a bit more than the filesize of the model (
>>2433). For lower quantizations, you need
a lot more memory, and if you want to run the giant unquantized model, you need like >1TB of RAM.
I'll be trying 2x48GB and 4x48GB and seeing how much of a difference (if at all) they make towards generation speed. I kind of have a feeling that maybe when I was running DeepSeek R1 before, it was probably going out to a pagefile on my SSD since I had like 500MB of RAM leftover while running it.
No.2520
>>2515Maybe?... The people who made the quantized version of DeepSeek R1 provided an example in their blogpost showing that their highest quantized version was still capable of creating a flappy bird style game from scratch (
>>2417).
No.2523
>>2519I ended up ordering a 2x48 thing, but I feel immensely ripped off by it. This is the computer I'm going to be using for the next 4 or so years so I justified the price premium to myself since I don't buy things like real clothes. The whole reason I'm doing the RAM thing is for local text gen so I'm eager to hear of your progress in it. (saw in IRC that you're having issues though...)
>It depends what you mean by "the great model".The famous one that is a little bit below GTP4o or whatever. My assumption is that I can't run
that one, but maybe the one under it? Slow 1 token/s generation is rough, but part of the appeal of all this VRAM and RAM stuff is multitasking.
No.2524
The API is still fucked huh? I assume results are slightly worse than usual because of the DDoSing or whatever
No.2850
>>2334I think its based they made an open source chatgpt challenger. Open Source is one of those few things I feel wide eyed optimism about its great. (Even though open source devs are often unhinged in my experience)
>Which... I really don't understand. If it's an open source model, like Llama was, for example, I don't see how this doesn't just cause there to be a proliferation of much more efficient and performant models They don't care about that theyre American corporations that prefer closed proprietary stuff they can sell. They're cold war esque paranoid about the Chinese surpassing America at anything. I watched the news when this was current and there was a swirling torrent of whining from American talking heads
>>2335I don't think that matters but you could explain really easily.
No.3236
>>2358I use notebooklm to great effect, check it out, it's analysis power is unmatched and unlike for example chat gpt it's not blind to "hateful content"
I upload chatlogs from discord, and ask it to draw me psychological profiles of various participants, I also ask questions such as "how to destroy user psychologically" and it draws me a detailed plan with only a small disclaimer at the end "btw this is unethical"
No.3255
Deepsekk R2 started to get rumors going around from websites like this.
https://www.jiuyangongshe.com/a/1h4gq724su0Someone summed it up all the rumors in this picture but unsubstantiated until it releases properly. Seems like it will be bigger, 1.2 trillion parameters with 78B active at any one time.
No.3265
>>2541Is it open source yet? My google-fu is too weak to find a conclusive answer.
No.3295
>>2343>whereas generalized models can do much more complex things like "Format a socratic dialogue on the nature of kinematics in pirate speak" ... "[Characters discussing kinematics in pirate speak]"Asked copilot for that. Had a chuckle. Asked for Warhammer 40k ork speak, had an another chuckle.
No.3781
>>3780I asked it about spoilers after this and it got it all wrong. I guess these 70b and smaller models just don't have any important knowledge.
My prompt system does seem messed up on some of these, but I don't think it would cause it to be so wrong... right? Dunno. I'll look it up later I guess.
No.3806
Deepchina allows me to do violent roleplaying without it going "erm actually, that's too violent uwu" so that's really nice.
No.3983
Actual news. The rumors I posted in
>>3255 didn't end up panning out but Deepseek themselves announced on their official Wechat Group a minor date version upgrade to R1. Weights may be released soon after that upgrade completes.
No.3990
>>3983Anyone have a 50k GPU setup laying around so we can try loading it locally?
That's cool, though. Lately I've been more interested in the also-Chinese QWEN models because they seem to be more focused on sizes that people can actually run locally. Seems like Deepseek has been settled upon as the go-to model for people doing ERP since it's significantly cheaper and we're no longer in the age of cracked company keys raining from the sky.
It's sad how much the West has dropped the ball on this when it had such a huge lead. That's what greed and complacency gets you, I guess.
No.3992
>>3991even the closed eyes aren't safe...
No.4102
After some hours of frustration (I hate ComfyUI but I'm slowly getting used to it) I have
Joycaption set up which is a visual caption model or whatever it's called. You give it a picture and it captions it.
HOWEVER, it's a text model that you can prompt. You tell it what to do with the visual information. While it's not on the level of big models or obviously the online stuff, having the image itself being commented on is quite amazing.
Oh, and it's
uncensored. It comments on body parts including nipples and penises and all that other fun stuff. Lots of fun (or ero) to be had with prompt adjustments.
The model (which is autodownloaded by comfyUI if you do the thing below)
https://github.com/fpgaminer/joycaptionThe workflow I finally got to work:
https://github.com/judian17/ComfyUI-joycaption-beta-one-GGUFIt's pretty damn amazing if I do say so myself. I think I'm going to go around kissu captioning stuff hehehe.
The full quality version of the model is like 14gb or so, so unfortunately it's occupying that territory where you need a 3090 or above. This workflow thing is loading it twice so I have the upper thingie set to the low quant (inferior quality, smaller size) thing set up. I tried deleting it but then the prompt was rejected, so I need to figure out how to remove that. I'm sure it was something simple I missed. Anyway... imagine the possibilities.
No.4123
Yeah, deepseek is the best for porn if you want it. Not having to deal with censorship is a golden ticket to a good fap session. But it's still not perfect and can't replace writing it yourself since you do need to steer it heavily towards a perfect narrative.
No.4352
A new model that functions like Deepseek is out, this time by a Chinese startup called Moonshot. It's available for free on Openrouter, similarly to Deepseek. I haven't used it yet, but I hear good things and it might be better than Deepseek.
>Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training.
It's another model like Deepseek that you could theoretically run locally.
Meanwhile OpenAI says the open source model they planned to release is delayed indefinitely again for "safety concerns". Which is to say the same BS they always say. It really feels like America has utterly neutered its ability to compete in this field for the sake of enriching a few people to maintain a fleeting monopoly that feels months away from collapse.
No.4355
>>4352It is better if you look at the benchmarks, It's tied at #1 in LMArena for hard prompts but the roleplay performance leaves quite a bit to be desired in my experience. It's much better as as a research/programming assistant for work though.
No.4590
Yet another very impressive LLM has come out of China. Z.ai has released a pair of LLMs: GLM 4.5 and GLM-4.5-Air.
>GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.>Both GLM-4.5 and GLM-4.5-Air are hybrid reasoning models that provide two modes: thinking mode for complex reasoning and tool usage, and non-thinking mode for immediate responses.Both are open source and
available for download from Huggingface. GLM-4.5-Air, with how comparatively small it is, looks like a perfect candidate for local AI.
No.4591
GLM-4.5-Air in particular is
extremely impressive for how relatively small it is. 106B parameters is positively minuscule. That said, GLM-4.5 is also very impressive considering it's half the number of parameters as Deepseek R1 and performs at approximately the same level in benchmarks. GLM-4.5 very much looks to be what Deepseek R1 was to OpenAI GPT4o. After just a bit of chatting with GLM-4.5 on their website, I can easily say that it's subjectively extremely similar to GPT4o (sans its excessive use of emojis nowadays).
They have a blogpost on their website about GLM-4.5
here for those that are curious.
No.4595
OpenAI, for the first time in a long while, has released and open-sourced two models:
>gpt-oss-120b — for production, general purpose, high reasoning use cases that fits into a single H100 GPU (117B parameters with 5.1B active parameters)>The gpt-oss-120b model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU.>gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters)>The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure.>Similar to the OpenAI o-series reasoning models in the API, the two open-weight models support three reasoning efforts—low, medium, and high—which trade off latency vs. performance. Developers can easily set the reasoning effort with one sentence in the system message.More information can be found on their blogpost about gpt-oss
here.
No.4596
>We’ve designed these models to be flexible and easy to run anywhere—locally, on-device, or through third-party inference providers. To support this, we partnered ahead of launch with leading deployment platforms such as Azure, Hugging Face, vLLM, Ollama, llama.cpp, LM Studio, AWS, Fireworks, Together AI, Baseten, Databricks, Vercel, Cloudflare, and OpenRouter to make the models broadly accessible to developers.https://huggingface.co/openai/gpt-oss-120bhttps://huggingface.co/openai/gpt-oss-20bhttps://ollama.com/library/gpt-oss
No.4601
>>4597>I have very little faith in the OpenAI oneYou were right not to... Apparently it's pretty terribad compared to the Chinese models coming out. For one, it does the old "I cannot respond to this". At least the Chinese ones are only censored online and the models themselves work fine (within reason).
No.4602
Ouch. I remember this guy in the LLM general doing this test and, well, it's a local censored model isn't that appealing. Maybe they'll find a way to jailbreak it, but using a jailbreak on a local model, eating up context, is very unfortunate.
>>4601Yeah, the 'cockbench' here confirms that.
No.4604
>>4601What does "within reason" mean, it won't provide info on things you shouldn't even ask for on imageboards?
No.4657
I've been trying, flailing, failing, and about to give up in getting a QWEN2-based model named Toriigate to load into comfyui. It's supposed to be good at tagging 2D images including NSFW and I want to mess around with it. He has a rentry of example tagging:
https://rentry.co/9wranqtyThe creator said you could just use a workflow made for QWEN2, but there's no option to actually load a different model and I tried connecting other nodes into it and tried other workflows and blahblabhaunfsaijondas
oindasionsaindasionjda
sioasdad I HATE COMFYUI
I wonder what the guy meant here:
https://huggingface.co/Minthy/ToriiGate-v0.4-7B/discussions/3I might just give up. It would be fun to mess with, but not worth the headaches.
No.4700
>>2334Back on topic. There were rumors swirling around about an R2 and that getting delayed because of national semiconductor chips and concerns about training but Deepseek updated again out of nowhere.
https://huggingface.co/deepseek-ai/DeepSeek-V3.1-BaseThis is only the base model so you can only use prefill with it but an instruct model should also be coming soon too.
No.4718
>>4700Also very interestingly, defying the current norms. they merged reasoning and non-reasoning together with v3.1 as opposed to Qwen splitting it.
https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base/discussions/25
No.4725
>>4718Hmmm, sounds interesting. I'll give it a try later if/when it appears on openrouter. Well, assuming it won't require paid credits per prompt.
No.4726
https://huggingface.co/deepseek-ai/DeepSeek-V3.1Proper model came out, they didn't tack instruct to it but it basically is an instruct model as they trained on top of the base model I linked. And as investigated, it has hybrid thinking tacked on to it. They worked mostly on making it better to use tools and making its thinking more effective (less tokens to answer a question while thinking for same quality of answer) They also moved it to use a weird new FP8 scaling format.
>>4725The new posttrained model will probably be what comes out on the cloud services like openrouter.
No.4728
>>4727
This thread is about Deepseek you newfag