>>2510I'm getting in 4x48GB of RAM later this evening. I'll update with my results, but for now I can answer a few things:
>My choice of RAM is 2x32 or 2x48I currently have 2x32GB with a 4090 and DeepSeek R1 671b-UD-IQ1_S ran at ~1 token/s (
>>2445).
>with a $100 premium on the 48If you're not too worried about the speed, these are the kits I'm ordering:
https://www.amazon.com/dp/B0D286TQHVhttps://www.amazon.com/dp/B0D2888BLV>Do you need like 200gb across GPU and RAM to load the great model?It depends what you mean by "the great model". For the higher quantizations you only need ~88GB total between RAM and VRAM,
if using llama.cpp. If you want to use Ollama, you need to merge the weights into a single file and then you need RAM + VRAM to be quite a bit more than the filesize of the model (
>>2433). For lower quantizations, you need
a lot more memory, and if you want to run the giant unquantized model, you need like >1TB of RAM.
I'll be trying 2x48GB and 4x48GB and seeing how much of a difference (if at all) they make towards generation speed. I kind of have a feeling that maybe when I was running DeepSeek R1 before, it was probably going out to a pagefile on my SSD since I had like 500MB of RAM leftover while running it.