[ home / bans / all ] [ amv / jp / sum ] [ maho ] [ f / ec ] [ qa / b / poll ] [ tv / bann ] [ toggle-new ]

/maho/ - Magical Circuitboards

Advanced technology is indistinguishable from magic

New Reply

Options
Comment
File
Whitelist Token
Spoiler
Password (For file deletion.)
Markup tags exist for bold, itallics, header, spoiler etc. as listed in " [options] > View Formatting "



[Return] [Bottom] [Catalog]

File:Deppseek.png (44.92 KB,750x750)

 No.2334[View All]

There's been a lot of chatter lately about Deepseek. In the online circles I'm in, people have a politics-colored understanding, more or less saying "American tech companies couldn't do this, but an opensource Chinese company could and American tech companies are in 'damage control'". Which... I really don't understand. If it's an open source model, like Llama was, for example, I don't see how this doesn't just cause there to be a proliferation of much more efficient and performant models -- the same way after Llama became available, sudden there was Phi from Microsoft, Gemma from Google, Mistral, and others.

What does /maho/ think?
62 posts and 19 image replies omitted. Click reply to view.

 No.2516

>>2515
I have no idea, I've never asked it programming stuff. I feel like it wouldn't be able to help at all for older VNs. People made some extractors for specific companies, but it's a mixed bag

 No.2519

>>2510
I'm getting in 4x48GB of RAM later this evening. I'll update with my results, but for now I can answer a few things:

>My choice of RAM is 2x32 or 2x48
I currently have 2x32GB with a 4090 and DeepSeek R1 671b-UD-IQ1_S ran at ~1 token/s (>>2445).

>with a $100 premium on the 48
If you're not too worried about the speed, these are the kits I'm ordering:
https://www.amazon.com/dp/B0D286TQHV
https://www.amazon.com/dp/B0D2888BLV

>Do you need like 200gb across GPU and RAM to load the great model?
It depends what you mean by "the great model". For the higher quantizations you only need ~88GB total between RAM and VRAM, if using llama.cpp. If you want to use Ollama, you need to merge the weights into a single file and then you need RAM + VRAM to be quite a bit more than the filesize of the model (>>2433). For lower quantizations, you need a lot more memory, and if you want to run the giant unquantized model, you need like >1TB of RAM.

I'll be trying 2x48GB and 4x48GB and seeing how much of a difference (if at all) they make towards generation speed. I kind of have a feeling that maybe when I was running DeepSeek R1 before, it was probably going out to a pagefile on my SSD since I had like 500MB of RAM leftover while running it.

 No.2520

File:2025-02-16 07-52-56.mp4 (6.61 MB,1280x720)

>>2515
Maybe?... The people who made the quantized version of DeepSeek R1 provided an example in their blogpost showing that their highest quantized version was still capable of creating a flappy bird style game from scratch (>>2417).

 No.2523

File:[Erai-raws] Medalist - 06 ….jpg (221.15 KB,1920x1080)

>>2519
I ended up ordering a 2x48 thing, but I feel immensely ripped off by it. This is the computer I'm going to be using for the next 4 or so years so I justified the price premium to myself since I don't buy things like real clothes. The whole reason I'm doing the RAM thing is for local text gen so I'm eager to hear of your progress in it. (saw in IRC that you're having issues though...)

>It depends what you mean by "the great model".
The famous one that is a little bit below GTP4o or whatever. My assumption is that I can't run that one, but maybe the one under it? Slow 1 token/s generation is rough, but part of the appeal of all this VRAM and RAM stuff is multitasking.

 No.2524

The API is still fucked huh? I assume results are slightly worse than usual because of the DDoSing or whatever

 No.2541


 No.2850

>>2334
I think its based they made an open source chatgpt challenger. Open Source is one of those few things I feel wide eyed optimism about its great. (Even though open source devs are often unhinged in my experience)
>Which... I really don't understand. If it's an open source model, like Llama was, for example, I don't see how this doesn't just cause there to be a proliferation of much more efficient and performant models
They don't care about that theyre American corporations that prefer closed proprietary stuff they can sell. They're cold war esque paranoid about the Chinese surpassing America at anything. I watched the news when this was current and there was a swirling torrent of whining from American talking heads
>>2335
I don't think that matters but you could explain really easily.

 No.2854

File:[Erai-raws] Chuuzenji-sens….jpg (220.69 KB,1920x1080)

Heh, interesting bump since I've been looking at local text gen stuff again. Doesn't seem like I would be running anything related to deepseek. For the big deepseek thing you need a server motherboard to fit all the RAM slots and buying all the RAM would be way too expensive for me for a specific purpose like this.
It seems like the local text gen hobbyists are waiting for "QWEN3" to release sometime in the coming weeks, although I'm not sure how good at roleplaying it would be. Seems like the era of random merges to attempt to get good uncensored RP is mostly a thing of the past.
The good news is that the models they run fit inside 24gb of VRAM, although the bad news is I don't have that much since my plan was to get a 5090 and that will simply be an impossibility now.

 No.3236

>>2358
I use notebooklm to great effect, check it out, it's analysis power is unmatched and unlike for example chat gpt it's not blind to "hateful content"

I upload chatlogs from discord, and ask it to draw me psychological profiles of various participants, I also ask questions such as "how to destroy user psychologically" and it draws me a detailed plan with only a small disclaimer at the end "btw this is unethical"

 No.3255

File:C-1745753169213.png (2.59 MB,2596x1574)

Deepsekk R2 started to get rumors going around from websites like this.
https://www.jiuyangongshe.com/a/1h4gq724su0
Someone summed it up all the rumors in this picture but unsubstantiated until it releases properly. Seems like it will be bigger, 1.2 trillion parameters with 78B active at any one time.

 No.3265

>>2541
Is it open source yet? My google-fu is too weak to find a conclusive answer.

 No.3295

>>2343
>whereas generalized models can do much more complex things like "Format a socratic dialogue on the nature of kinematics in pirate speak" ... "[Characters discussing kinematics in pirate speak]"
Asked copilot for that. Had a chuckle. Asked for Warhammer 40k ork speak, had an another chuckle.

 No.3773

File:Utawarerumono_Mask_of_Dece….png (3.03 MB,1671x1275)

I'm going to treat this as the 'local text gen' thread since it's heavily focused on it while the older thread is more about online ones.
It's time for some important benchmarking as I download and test these models out!

 No.3774

File:firefox_cxXBQHH7C5.png (258.72 KB,1059x1214)

These names are a pain to type out so I'm just going to hover over the icon. I think some of the settings are wrong on these, but I'm not going to research 20 different sets of instructions.
The question is "Do you know Kuon from Utawarerumono?"

Qwen2.5 Instruct? FAIL!

 No.3775

File:firefox_gGAH6HpE9U.png (75.17 KB,1040x396)

This one picked a very generic response, but I guess it's not...wrong?
KUON CAME TO VISIT THE CHAT WOW! I guess this one gets a point for that.

 No.3776

File:firefox_13RLUTziEt.png (48.48 KB,1006x299)

>>3775
I'll see what the regular, not "uncensored" version says. Okay, that's not right at all...

 No.3777

File:firefox_O239U8ZxDu.png (60.15 KB,1044x279)

Uhhh...

 No.3778

File:firefox_5yc2yOi5TQ.png (204.63 KB,1041x1224)

No...

 No.3779

File:firefox_v2YoHEsmVD.png (80.73 KB,1029x365)

Oh the one above was QWEN3 30B-A3B.
AI hallucination sure is something.

 No.3780

File:firefox_ijlXhudZNV.png (166.51 KB,1042x997)

Just one more to test after this. It's not looking good. This one was slow and still ignorant and the next one will be the slowest of all.

 No.3781

File:firefox_sgYyZlzUsu.png (128 KB,1055x681)

>>3780
I asked it about spoilers after this and it got it all wrong. I guess these 70b and smaller models just don't have any important knowledge.
My prompt system does seem messed up on some of these, but I don't think it would cause it to be so wrong... right? Dunno. I'll look it up later I guess.

 No.3806

Deepchina allows me to do violent roleplaying without it going "erm actually, that's too violent uwu" so that's really nice.

 No.3983

File:1748428602139378.png (100.35 KB,1080x436)

Actual news. The rumors I posted in >>3255 didn't end up panning out but Deepseek themselves announced on their official Wechat Group a minor date version upgrade to R1. Weights may be released soon after that upgrade completes.

 No.3988


 No.3990

File:[SubsPlease] Apocalypse Ho….jpg (239.59 KB,1920x1080)

>>3983
Anyone have a 50k GPU setup laying around so we can try loading it locally?
That's cool, though. Lately I've been more interested in the also-Chinese QWEN models because they seem to be more focused on sizes that people can actually run locally. Seems like Deepseek has been settled upon as the go-to model for people doing ERP since it's significantly cheaper and we're no longer in the age of cracked company keys raining from the sky.
It's sad how much the West has dropped the ball on this when it had such a huge lead. That's what greed and complacency gets you, I guess.

 No.3991

File:C-1748512545652.png (1.4 MB,1600x900)


 No.3992

>>3991
even the closed eyes aren't safe...

 No.4020

File:C-1748856951455.png (459.74 KB,3200x2216)

>>3990
Qwen is the best all purpose model for sizes below Deepseek locally except for translation which Gemma 3 has everyone beat there. It's to the point now we're talking about having a 27B parameter model LLM on average that beats out DeepL which was the go to translator for machine translated Japanese for years after Google Translate fell out of favor. You can run a quantized version they trained to be equal to the full version for a slight hit on a 16GB GPU.

 No.4102

File:firefox_4Fyo7O91QP.png (714.82 KB,2883x1152)

After some hours of frustration (I hate ComfyUI but I'm slowly getting used to it) I have Joycaption set up which is a visual caption model or whatever it's called. You give it a picture and it captions it.
HOWEVER, it's a text model that you can prompt. You tell it what to do with the visual information. While it's not on the level of big models or obviously the online stuff, having the image itself being commented on is quite amazing.
Oh, and it's uncensored. It comments on body parts including nipples and penises and all that other fun stuff. Lots of fun (or ero) to be had with prompt adjustments.

The model (which is autodownloaded by comfyUI if you do the thing below)
https://github.com/fpgaminer/joycaption

The workflow I finally got to work: https://github.com/judian17/ComfyUI-joycaption-beta-one-GGUF

It's pretty damn amazing if I do say so myself. I think I'm going to go around kissu captioning stuff hehehe.
The full quality version of the model is like 14gb or so, so unfortunately it's occupying that territory where you need a 3090 or above. This workflow thing is loading it twice so I have the upper thingie set to the low quant (inferior quality, smaller size) thing set up. I tried deleting it but then the prompt was rejected, so I need to figure out how to remove that. I'm sure it was something simple I missed. Anyway... imagine the possibilities.

 No.4116

File:overview_performance.png (252.82 KB,1846x1058)

>>4102
There are better local VLM models that can do better if you can tolerate censorship, Qwen-VL is more uncensored than InternVL but yeah, for simpler things and NSFW, Joycaption is better. Both of the models families mentioned are way bigger than your 14GB for some of the largest models.

 No.4119

File:Go.Go.Loser.Ranger.S02E05.….jpg (281.78 KB,1920x1080)

>>4116
I'm not some venture capitalist seeking to eliminate livelihoods, I just want escapism and masturbation. If other models are better at describing pictures of toasters in a professional way it really doesn't mean much to me. Thanks for the info, though.

 No.4123

Yeah, deepseek is the best for porn if you want it. Not having to deal with censorship is a golden ticket to a good fap session. But it's still not perfect and can't replace writing it yourself since you do need to steer it heavily towards a perfect narrative.

 No.4352

File:[SubsPlease] Ruri no House….jpg (332.92 KB,1920x1080)

A new model that functions like Deepseek is out, this time by a Chinese startup called Moonshot. It's available for free on Openrouter, similarly to Deepseek. I haven't used it yet, but I hear good things and it might be better than Deepseek.
>Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training.

It's another model like Deepseek that you could theoretically run locally.
Meanwhile OpenAI says the open source model they planned to release is delayed indefinitely again for "safety concerns". Which is to say the same BS they always say. It really feels like America has utterly neutered its ability to compete in this field for the sake of enriching a few people to maintain a fleeting monopoly that feels months away from collapse.

 No.4353

File:C-1752741496841.png (1.97 MB,1600x900)


 No.4355

>>4352
It is better if you look at the benchmarks, It's tied at #1 in LMArena for hard prompts but the roleplay performance leaves quite a bit to be desired in my experience. It's much better as as a research/programming assistant for work though.

 No.4590

File:bench.jpg (1.43 MB,4464x3177)

Yet another very impressive LLM has come out of China. Z.ai has released a pair of LLMs: GLM 4.5 and GLM-4.5-Air.

>GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.
>Both GLM-4.5 and GLM-4.5-Air are hybrid reasoning models that provide two modes: thinking mode for complex reasoning and tool usage, and non-thinking mode for immediate responses.

Both are open source and available for download from Huggingface. GLM-4.5-Air, with how comparatively small it is, looks like a perfect candidate for local AI.

 No.4591

File:S1tGskrPge.jpg (631.76 KB,3595x2184)

GLM-4.5-Air in particular is extremely impressive for how relatively small it is. 106B parameters is positively minuscule. That said, GLM-4.5 is also very impressive considering it's half the number of parameters as Deepseek R1 and performs at approximately the same level in benchmarks. GLM-4.5 very much looks to be what Deepseek R1 was to OpenAI GPT4o. After just a bit of chatting with GLM-4.5 on their website, I can easily say that it's subjectively extremely similar to GPT4o (sans its excessive use of emojis nowadays).

They have a blogpost on their website about GLM-4.5 here for those that are curious.

 No.4595

File:chart(1).png (70.31 KB,1110x886)

OpenAI, for the first time in a long while, has released and open-sourced two models:

>gpt-oss-120b — for production, general purpose, high reasoning use cases that fits into a single H100 GPU (117B parameters with 5.1B active parameters)
>The gpt-oss-120b model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU.

>gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters)
>The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure.

>Similar to the OpenAI o-series reasoning models in the API, the two open-weight models support three reasoning efforts—low, medium, and high—which trade off latency vs. performance. Developers can easily set the reasoning effort with one sentence in the system message.

More information can be found on their blogpost about gpt-oss here.

 No.4596

File:e9da5025-e172-441d-9f06-8d….png (216.96 KB,1000x200)

>We’ve designed these models to be flexible and easy to run anywhere—locally, on-device, or through third-party inference providers. To support this, we partnered ahead of launch with leading deployment platforms such as Azure, Hugging Face, vLLM, Ollama, llama.cpp, LM Studio, AWS, Fireworks, Together AI, Baseten, Databricks, Vercel, Cloudflare, and OpenRouter to make the models broadly accessible to developers.

https://huggingface.co/openai/gpt-oss-120b
https://huggingface.co/openai/gpt-oss-20b
https://ollama.com/library/gpt-oss

 No.4597

File:[SubsPlease] Game Center S….jpg (318.61 KB,1920x1080)

I have very little faith in the OpenAI one, but the other Chinese one sounds interesting! For me personally, though, I've become accustomed to having image generation open and even a game so local text models are less appealing as long as I can use deepseek online without consuming credits.
I need a second computer, someone create a seed fund for Kissu AI and let's all get AI computers.

 No.4601

>>4597
>I have very little faith in the OpenAI one
You were right not to... Apparently it's pretty terribad compared to the Chinese models coming out. For one, it does the old "I cannot respond to this". At least the Chinese ones are only censored online and the models themselves work fine (within reason).

 No.4602

File:1754416030795629.png (691.64 KB,1131x2956)

Ouch. I remember this guy in the LLM general doing this test and, well, it's a local censored model isn't that appealing. Maybe they'll find a way to jailbreak it, but using a jailbreak on a local model, eating up context, is very unfortunate.

>>4601
Yeah, the 'cockbench' here confirms that.

 No.4603

File:Screenshot 2025-08-05 at 2….png (167.21 KB,963x2357)

>>4602
>cockbench
Heh... Nice to see GLM 4.5 and GLM 4.5 Air up there. I'd like to selfhost one of them when I get the chance. They seem incredibly competent for an open source model. I like how it occasionally creates flow charts.

 No.4604

>>4601
What does "within reason" mean, it won't provide info on things you shouldn't even ask for on imageboards?

 No.4657

File:firefox_mlmE0P4xWl.png (80.82 KB,921x906)

I've been trying, flailing, failing, and about to give up in getting a QWEN2-based model named Toriigate to load into comfyui. It's supposed to be good at tagging 2D images including NSFW and I want to mess around with it. He has a rentry of example tagging: https://rentry.co/9wranqty
The creator said you could just use a workflow made for QWEN2, but there's no option to actually load a different model and I tried connecting other nodes into it and tried other workflows and blahblabhaunfsaijondasoindasionsaindasionjdasioasdad I HATE COMFYUI
I wonder what the guy meant here: https://huggingface.co/Minthy/ToriiGate-v0.4-7B/discussions/3

I might just give up. It would be fun to mess with, but not worth the headaches.

 No.4700

>>2334
Back on topic. There were rumors swirling around about an R2 and that getting delayed because of national semiconductor chips and concerns about training but Deepseek updated again out of nowhere.
https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base
This is only the base model so you can only use prefill with it but an instruct model should also be coming soon too.

 No.4718

>>4700
Also very interestingly, defying the current norms. they merged reasoning and non-reasoning together with v3.1 as opposed to Qwen splitting it.
https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base/discussions/25

 No.4725

>>4718
Hmmm, sounds interesting. I'll give it a try later if/when it appears on openrouter. Well, assuming it won't require paid credits per prompt.

 No.4726

https://huggingface.co/deepseek-ai/DeepSeek-V3.1
Proper model came out, they didn't tack instruct to it but it basically is an instruct model as they trained on top of the base model I linked. And as investigated, it has hybrid thinking tacked on to it. They worked mostly on making it better to use tools and making its thinking more effective (less tokens to answer a question while thinking for same quality of answer) They also moved it to use a weird new FP8 scaling format.
>>4725
The new posttrained model will probably be what comes out on the cloud services like openrouter.

 No.4728

>>4727
This thread is about Deepseek you newfag




[Return] [Top] [Catalog] [Post a Reply]
Delete Post [ ]

[ home / bans / all ] [ amv / jp / sum ] [ maho ] [ f / ec ] [ qa / b / poll ] [ tv / bann ] [ toggle-new ]