[ home / bans / all ] [ qa / jp ] [ maho ] [ f / ec ] [ b / poll ] [ tv / bann ] [ toggle-new / tab ]

/maho/ - Magical Circuitboards

Advanced technology is indistinguishable from magic

New Reply

Options
Comment
File
Whitelist Token
Spoiler
Password (For file deletion.)
Markup tags exist for bold, itallics, header, spoiler etc. as listed in " [options] > View Formatting "


[Return] [Bottom] [Catalog]

File:Screenshot 2023-06-04 1850….png (366.59 KB,1598x1059)

 No.1260

I spent a really long time trying to get this working recently, so I figured I'd document what I did to get GPU passthrough working on my laptop. The steps I went through might be a bit different on other distros given that I am using Proxmox, but the broad strokes should apply. Bear in mind, this is with regards to using a Windows 11 virtual machine. Certain steps may be different or unnecessary for Linux-based virtual machines.

First, why might you want to do this? Well, the most obvious reason is that virtual machines are slooow. So, by passing through a GPU you can improve its speed considerably. Another possibility would be that you want to use the GPU for some task like GPU transcoding for Plex, or to simply use it as a render host, or you may want to use it for something like AI workloads that rely on the GPU. Alternatively, you may just want to use this to have a virtual machine that you can host Steam on or something like that (bear in mind, some games and applications will not run under virtual machines or run if you are using Remote Desktop).

0. Enable Virtualization-specific settings in the BIOS such as Intel VT-x and VT-d or AMD IOMMU and AMD-V, and disable Safe Boot (After installing your OS of choice if it requires UEFI)
1. Create a virtual machine
- BIOS should be OVMF (UEFI)
- Machine type should be q35
- SCSI Controller should be VirtIO SCSI or SCSI Single; others may work these are just what I have tested
- Display should be VirtIO-GPU (virtio); other display emulators will not work for Proxmox's built-in console VNC, or otherwise cause the VM to crash on launch.
- CPU may need to be of type host and hidden from the VM
2. Edit GRUB config line beginning with "GRUB_CMDLINE_LINUX_DEFAULT"
- These settings worked for me: "quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset"
- For AMD CPUs, change 'intel_iommu' to 'amd_iommu'
- Save the changes and then run 'update-grub'
- Reboot
3. Run 'dmesg | grep -e DMAR -e IOMMU'
- You should see a line like "DMAR: IOMMU enabled"
4. Add the following to /etc/modules :
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

5. Run "dmesg | grep 'remapping'"
- You should see something like the following:
"AMD-Vi: Interrupt remapping enabled"
"DMAR-IR: Enabled IRQ remapping in x2apic mode" ('x2apic' can be different on old CPUs, but should still work)

5.1 If not, run "echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf"
6. Run "dmesg | grep iommu"
- You need proper IOMMU groups for the PCI device you want to assign to your VM. This means that the GPU isn't arbitrarily grouped with some other PCI devices but has a group of its own. In my case, this returns something like this:
5.398008] pci 0000:00:00.0: Adding to iommu group 0
[ 5.398019] pci 0000:00:01.0: Adding to iommu group 1
[ 5.398028] pci 0000:00:02.0: Adding to iommu group 2
[ 5.398038] pci 0000:00:08.0: Adding to iommu group 3
[ 5.398054] pci 0000:00:14.0: Adding to iommu group 4
[ 5.398062] pci 0000:00:14.2: Adding to iommu group 4
[ 5.398076] pci 0000:00:15.0: Adding to iommu group 5
[ 5.398088] pci 0000:00:16.0: Adding to iommu group 6
[ 5.398097] pci 0000:00:17.0: Adding to iommu group 7
[ 5.398108] pci 0000:00:1b.0: Adding to iommu group 8
[ 5.398120] pci 0000:00:1c.0: Adding to iommu group 9
[ 5.398136] pci 0000:00:1c.2: Adding to iommu group 10
[ 5.398148] pci 0000:00:1c.4: Adding to iommu group 11
[ 5.398160] pci 0000:00:1d.0: Adding to iommu group 12
[ 5.398172] pci 0000:00:1d.4: Adding to iommu group 13
[ 5.398197] pci 0000:00:1f.0: Adding to iommu group 14
[ 5.398207] pci 0000:00:1f.2: Adding to iommu group 14
[ 5.398215] pci 0000:00:1f.3: Adding to iommu group 14
[ 5.398224] pci 0000:00:1f.4: Adding to iommu group 14
[ 5.398233] pci 0000:00:1f.6: Adding to iommu group 14
[ 5.398245] pci 0000:01:00.0: Adding to iommu group 15
[ 5.398256] pci 0000:01:00.1: Adding to iommu group 16
[ 5.398267] pci 0000:02:00.0: Adding to iommu group 17
[ 5.398279] pci 0000:04:00.0: Adding to iommu group 18
[ 5.398290] pci 0000:05:00.0: Adding to iommu group 19
[ 5.398313] pci 0000:06:00.0: Adding to iommu group 20
[ 5.398336] pci 0000:06:01.0: Adding to iommu group 21
[ 5.398358] pci 0000:06:02.0: Adding to iommu group 22
[ 5.398382] pci 0000:06:04.0: Adding to iommu group 23
[ 5.398415] pci 0000:3b:00.0: Adding to iommu group 24
[ 5.398427] pci 0000:71:00.0: Adding to iommu group 25

6.1 If you don't have dedicated IOMMU groups, you can add "pcie_acs_override=downstream" to your GRUB launch arguments if you didn't already do that.
7. Run lspci to determine the location of your GPU or other PCI device you want to pass through. It should generally be "01:00.0"
8. Run "lspci -nnk -s 01:00"
- You should see something like this:
01:00.0 3D controller [0302]: NVIDIA Corporation GP104GLM [Quadro P4000 Mobile] [10de:1bb7] (rev a1)
Subsystem: Lenovo GP104GLM [Quadro P4000 Mobile] [17aa:224c]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau
01:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel

- The first 4 characters designate the Vendor ID, in this case "10de" represents Nvidia. The second 4 characters after the colon represent the Device ID, in this case "1bb7" represents an Nvidia Quadro P4000
9. (Proxmox-specific, but generally applies) Add a PCI Device under Hardware for you virtual machine
- Select the ID for your Device, enabling "All Functions", "Primary GPU", "ROM-Bar", and "PCI-Express"
- Fill in the Vendor ID, Device ID, Sub-Vendor ID, and Sub-Device ID. In my case, the Vendor ID and Device ID are: "0x10de" and "0x1bb7" and the Sub-Vendor ID and Sub-Device ID are: "17aa" and "224c"
- If you edit the virtual machine config file located at "/etc/pve/qemu-server/vmid.conf" (replace vmid.conf with your Virtual Machine ID, like 101.conf), that would look like hostpci0: 0000:01:00,device-id=0x1bb7,pcie=1,sub-device-id=0x224c,sub-vendor-id=0x17aa,vendor-id=0x10de,x-vga=1
10. Run the following, making sure to replace the IDs with the IDs for your specific GPU or PCI device.
echo "options vfio-pci ids=10de:1bb7,10de:10f0 disable_vga=1" > /etc/modprobe.d/vfio.conf
11. Disable GPU drivers so that the host machine does not try to use the GPU by running the following:
echo "blacklist amdgpu" >> /etc/modprobe.d/blacklist.conf
echo "blacklist radeon" >> /etc/modprobe.d/blacklist.conf
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf

12. (Nvidia-specific) Run the following to prevent applications from crashing the virtual machine:
echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf
12.1 You may want to add "report_ignored_msrs=0" if you see a lot of warnings in your dmesg system log
12.2 Kepler K80 GPUs require the following in the vmid.conf:
args: -machine pc,max-ram-below-4g=1G
13. Run the following:
echo "softdep nouveau pre: vfio-pci" >> /etc/modprobe.d/nvidia.conf
echo "softdep nvidia pre: vfio-pci" >> /etc/modprobe.d/nvidia.conf
echo "softdep nvidia* pre: vfio-pci" >> /etc/modprobe.d/nvidia.conf

14. [Skip this step unless you have errors beyond this point] Note: At this point, you may read that you might require dumping your GPU's vBIOS. In my experience, this was completely unnecessary and above all did not work. Specific instructions in other guides may be like the following:
cd /sys/bus/pci/devices/0000:01:00.0/
echo 1 > rom
cat rom > /usr/share/kvm/vbios.bin
echo 0 > rom

In my experience, attempting to run "cat rom > /usr/share/kvm/vbios.bin" would result in an Input/Output error and the vBIOS would not be able to be dumped. If you really do end up needing to dump the vBIOS, I would strongly recommend installing Windows onto your host machine and then installing and running GPU-z. GPU-z has a "share" button that allows you to easily dump the vBIOS for your GPU.
To add the vBIOS to your virtual machine, place the place your vBIOS file that you dumped at "/usr/share/kvm/" and then add ",romfile=vbios.bin" to your vmid.conf for your PCI device (replacing vbios.bin with the name of your vBIOS file you dumped). That would look something like the following:
hostpci0: 0000:01:00,device-id=0x1bb7,pcie=1,sub-device-id=0x224c,sub-vendor-id=0x17aa,vendor-id=0x10de,x-vga=1,romfile=vbios.bin
15. Reboot. At this point, when you start your virtual machine you should be able to see in Windows Device Manager that your GPU was detected under display adapters. At this point, try installing your GPU device drivers and then reboot your virtual machine once they've installed. If all goes well, you should have a functioning GPU passed through to your virtual machine. If not... You'll likely see "Code 43" under the properties for your GPU in Device Manager.
16. Going back to your vmid.conf add the following to your cpu options ",hidden=1,flags=+pcid", you should have a line that looks like this:
cpu: host,hidden=1,flags=+pcid
17. Nvidia drivers can be very picky. You may need to add an ACPI table to emulate having a battery. You can do this by downloading this and then adding it to your vmid.conf by adding a line like so:
args: -acpitable file="/root/ssdt1.dat"
18. If you're still having a code 43 issue, you can go back to step 14 and try adding your vBIOS.


At this point, you're done. Your virtual machine should be successfully detecting your GPU or PCI device and you should be able to use it mostly normally. For obvious reasons, you may still not be able to run all programs as you would like due to running them under a virtual machine, however, the main core functionality of the GPU or PCI device should be fully accessible to the virtual machine.


A few of my resources:
https://pve.proxmox.com/wiki/PCI_Passthrough
https://gist.github.com/Misairu-G/616f7b2756c488148b7309addc940b28#update-attention-for-muxless-laptop
https://lantian.pub/en/article/modify-computer/laptop-intel-nvidia-optimus-passthrough.lantian/
https://forum.proxmox.com/threads/successful-experience-with-laptop-gpu-passthrough.95683/

 No.1261

File:ff4413ba8878f9db9b2a27342….jpeg (155.04 KB,1731x381)

Now, you might wonder, "Why should I care?" or "Why is this useful?" Well, laptop GPUs do some MUXing trickery. In theory, the GPU would show up as a VGA controller on a MUXed laptop, and as a 3D controller on MUXless laptops. From what I understand, in laptops that are MUXless, the discrete GPU passes information to the iGPU frame buffer and then that goes to the display out, which in some cases may result in issues because then you may not be able to use the GPU as a GPU... However, on my laptop, which I believe is MUXless given that the GPU shows up as a 3D controller instead of as a VGA controller, I was able to succeed in passing through the GPU and using it.

 No.1262

File:Screenshot 2023-06-04 2232….png (243.41 KB,1038x706)

Just to use the GPU, you may need to emulate having a battery in the VM, pass the GPUs vBIOS to the Virtual Machine, hide that the virtual machine is in fact a virtual machine, and you may even potentially require that the VM GPU PCI address match what it should be on a laptop. And this is all on a workstation laptop with a Quadro GPU that should support hardware virtualization out of the box, if Nvidia's marketting it to be believed. Supposedly, they updated their drivers to stop requiring a lot of this bullshit, but that was a lie.

But you know what? Despite all of this run around, it works. Fuck you Nvidia.

 No.1263

>>1261
>However, on my laptop, which I believe is MUXless given that the GPU shows up as a 3D controller instead of as a VGA controller
Granted... Lenovo may have fucked the BIOS at some point. There's a setting in the BIOS to enable display out during BIOS, which would make sense for a MUXed laptop setup, but this setting does not work in my case. So, although it's possible that workstation laptops from the likes of HP/Dell/Lenovo may use MUXed designs, I don't think this is a guarantee given my own experience and albeit limited understanding.

 No.1264

File:[MoyaiSubs] Mewkledreamy -….jpg (392.82 KB,1920x1080)

Man, this stuff is way over my head. I get the gist of the complaint in these two posts, though >>1261
>>1262
Basically you've saved yourself a bunch of money by bypassing some artificial restriction from one of the tech monopolies?

 No.1265

>>1260
>or you may want to use it for something like AI workloads that rely on the GPU
A bit of a question with this, but can't you run AI on non-virtual machines through linux? Is there some special reason to do it on a virtual machine, or is there something about GPU passthrough that I don't understand. From what I gathered it seems like nvidia blocks vm's so this is how to bypass that block and have a fully functional GPU for the VM. Does that mean you can pretty much run windows with all the features while having a linux host machine? Or is there some other issue you'd be running in to.

 No.1266

>>1265
>Does that mean you can pretty much run windows with all the features while having a linux host machine?
Yes, exactly. If you care about gaming in particular, one of the use cases could be playing games in a Windows virtual machine when they won't run under wine.

 No.1267

>>1264
Yep. Going to try later and see if I can use vGPU unlock as well so that I can split my GPU across multiple virtual machines as well.

https://gitlab.com/polloloco/vgpu-proxmox

 No.1268

>>1266
Oh well that's pretty amazing then. Keeping the best of both worlds has always seemed like the dream scenario for linux, but I didn't think it was possible.

Huh, this kinda actually makes windows obsolete then.

 No.1269

File:2ed63a0a619da09f2d7729b261….jpg (816.63 KB,1000x800)

Can you play eroge on it?

 No.1270

what are you using it for nowadays?

 No.1271

File:Dell VRTX.jpg (141.93 KB,945x709)

>>1270
Passing through GPU to virtual machines and then further passing it to docker containers for things like GPU transcoding for Jellyfin. Also, I managed to get vGPU working (>>1267) and it works fine. The main kinda bleh part about it is that my GPU only has 8GB of VRAM and you can only create vGPU profiles of equal memory size. e.g. 2x 4GB vGPUs, 4x 2GB vGPUs, 8x 1GB vGPUs, and so on.

Because of this I've been looking into migrating from my laptop setup into an actual server chassis. The benefits from doing that would be pretty multi-faceted: ability to use different PCIe devices like GPUs, ability to use lots of RAM, far more cores (though typically at the cost of worse single threaded performance), and far greater storage options. Downsides are pretty obvious though... Increased power consumption (~230W vs ~500W to ~2000W), loudness (35 dB vs. 45dB or greater), and increased use of space.

I've been looking at one blade server in particular very intently, the Dell VRTX, as it would allow me to run up to four half-width server blades at once each with dual socket CPUs, or use two larger full-width blades with quad socket CPUs, or any combination thereof. Unlike most other blade server chassis, it doesn't require 240V 3-phase AC power, and can run off of standard 120V AC which is very nice. Likewise, it should be fairly quiet since it was intended to be used within branch offices. 4 server, 25x SFF drive bays, GPU and PCIe support all sound great but it's got some drawbacks due to its age. What's been holding me back is that the PCIe slots are limited to PCIe 2.0 x8... For reference, that's ~1/4 of the bandwidth of PCIe 3.0 x16 and ~1/8 of the bandwidth of PCIe 4.0 x16. So... I'm kind of seesawing back and forth between "should I?" and "should I not?". For typical, not very demanding workloads, such as GPU transcoding or hardware acceleration in virtual machines, I don't think the bandwidth limitation would be such a big deal, but... Ideally, I would want to run at least 1 Windows virtual machine to run games off of, but I don't think it's going to really be possible to do with such limited bandwidth... In terms of GPUs, the chassis is limited to either 3x 150W single slot GPUs or 1x 250W dual slot GPU. I've given it a bit of thought and come to three main options: Nvidia Tesla P4 (75W, single slot, 8GB, ~$110, 9120 Passmark score), Nvidia Tesla M40 24GB (250W, double slot, 24GB, ~$130, 10475 Passmark score), and Nvidia Tesla P40 (250W, double slot, 24GB, ~$230, 18514 Passmark score). Honorable mention would be the RTX A4000 (150W, single slot, 16GB, ~$600, 19238 Passmark score). The RTX A4000 would be the clear choice out of the bunch, but it's price is just too high. Between the P4, M40 24GB, and P40, I can't decide what's best: 3x okay performance GPUs with only 8GB of VRAM, 1x GPU with decent performance and 24GB of VRAM, or 1x GPU with pretty good performance and 24GB of VRAM. I'm learning towards the Tesla P4 because it would allow me to dedicate an entire GPU to a single blade; I'm unsure whether the GPUs in question would support being used by multiple blades at once, but my guess is absolutely not.

Anyways, the VRTX is also just kind of cool as an all-in-one solution since it can double as a SAN for the blades, but is kind of knee-capped by the fact that A. it requires SAS drives, and B. The VRTX chassis does not allow passthrough of the disks to the blades. You can only pass "virtual disks" to the blades. At best, you can create RAID0 virtual disks that contain only a single disk so that each virtual disk maps to a single logical disk, but this still kind of sucks. I think this could probably be gotten around by getting a PCIe HBA and then re-routing the SAS cables coming off of the hard drive backplane into it instead, and then pass the PCIe HBA to a single server blade, but I have a feeling that the CMC (chassis management system, the online interface for managing the whole enclosure) would start screaming about being disconnected from the hard drive backplane. Anyways, the requirement for SAS over SATA, from what I've gathered from forum posts, has to do with the fact that SATA does not multi-node requests. Dell probably could have gotten around this by using a custom RAID card, but they didn't, so that's the state of things...

At least in regards to CPU performance, however, the VRTX has a lot of breathing room for growth. You can use hardware that's nearly a decade old, such as the PowerEdge M520 which has rather old Xeon E5-2400 CPUs and DDR3 memory, all the way up to blades like the M640 which supports up to second generation Xeon Scalable CPUs with DDR4. Granted, the M640/M840s are very expensive in comparison. Roughly $500 for a barebones M640 blade compared to ~$100 for a M630/M620 blade that may include a CPU and RAM.

There is one other thing... If I were get one, I would absolutely want the wheel assembly as shown in pic related, but it seems like an incredibly rare accessory. I could find only two references to its existence: the manual describing putting it on and taking it off, and a broken Mexican website that claims to have 1x in stock, but lists its price as "$0 MXN" and the checkout form has Mexico-specific questions. I thought about emailing them, but I kind of doubt I would get a reply, or would otherwise get a reply in 2 weeks that says, "Sorry, the item you were looking we no longer have and was showing up due to a stocking error."

 No.1272

very silly that it's easy to allocate specific CPU and storage resources, but GPU is a big black box... Probably has to do with how GPU use a different architecture than CPU I guess..

 No.1273

File:Screenshot 2023-06-27 1448….png (146.55 KB,1035x531)

>>1271
>What's been holding me back is that the PCIe slots are limited to PCIe 2.0 x8... For reference, that's ~1/4 of the bandwidth of PCIe 3.0 x16 and ~1/8 of the bandwidth of PCIe 4.0 x16.
Huh. I take it back. Apparently newer PCIe generations don't really drastically improve performance, which I guess kind of makes sense since the connector is physically the same for each generation, so the improvements have to be with signalling rather than raw throughput.

 No.1274

>>1273
You meant gaming performance. For AI stuffs PCIe speed is a major bottleneck and the reason why direct interconnects like NVlink exist.

 No.1275

>>1274
We shall see. I have now acquired a VRTX and a Tesla P4, like I was mentioning in >>1271. Gonna use it to learn about high availability hyper-converged server environments. Hopefully the power consumption isn't too brutal... It has some blades with older Xeons that are rather power hungry. 95W max TDP per socket, multiplied by 2 per blade, and 3 blades total. Granted, I doubt I'll have much reason to use more than 1 or 2 at once in general. If they're too much, I'll have to see if I can somehow sell the older blades (although I'm not sure who would want them...) and then upgrade to M620's or M630's with more efficient Xeon v3's or v4's. Unfortunately it can't share PCIe devices with multiple blades. I was at first thinking about getting 2 SFP+ 10Gb cards, but then I realized I would need like 3 or 4 if I wanted each one to have a full 10Gb speed. Annoyingly the 10Gb SFP+ switch module for the VRTX goes for like $1300 ish :<

 No.1276

File:858.gif (2.04 MB,448x252)

>>1275
The seller put the 134 lbs VRTX into a box with only two layers of bubble wrap. Servers of this size and weight should ideally be shipped with 6 inches of expanding foam all around. The bubble wrap was half an inch at most.

The chassis is basically unusable. Does not close, drive bays are shifted, 3/4 corners got hit, and one got hit extremely badly. 2/4 PSU handles are damaged and one no longer can be re-inserted due to the damage at the corner. GPU door is broken. Built-in switch is damaged. 2/4 blowers are stuck due to the corner damage. Drive cage backplane is warped from the corner damage. Fan assembly does not seat because the side of the chassis is bent outwards. 2/3 Blade server handles are damaged -- 1 warped, and 1 broken -- and 2/3 are jammed shut and will not open.

I'm expecting half of the functionality of the VRTX to be completely unusable because of the damage to the drive bays. They're all shifted so I'm expecting that the SAS connectors will not line up.

 No.1277

>>1276
Did you make sure to get the evidence on photo/video and send it back for a refund?

 No.1278

>>1276
That sounds absolutely atrocious. Is it possible to get a refund?

 No.1279

>>1277
Yeah. Door cam footage and pictures showing that the box was already in a damaged state and many photos showing the various spots it was damaged. The seller does not accept returns so I'm hoping I get a pretty significant refund. A full refund would be best, but I'm not going to get my hopes up.

>>1278
From what I understand, ebay has a money back guarantee if the item received was materially different from what was shown. Considering it was very damaged, if the seller does not respond I will automatically get a full refund from ebay I believe.

 No.1280

>>1279
>Door cam footage
Very nice, should be no debate there with that on your side. Man, those things have been a blessing for people buying off online retailers since they just catch everything. Maybe my memory is hazy but they used to fight you on every little thing to try and not give refunds but now I almost never have an issue getting one.

 No.1281

>>1278
Was able to get a full refund without having to ship it back, but geez... I would have much rathered getting a server that wasn't damaged in shipping instead of getting a refund.

 No.1282

File:Screenshot 2023-07-05 1826….png (35.64 KB,1295x374)

It is alive! Also, not _that_ power hungry considering it's got 3 servers in it. Neat that it tells you what the BTU/h equivalency of the input power is. Can tell how much of a space heater it is! I was able to buff out some of the damage, but the drive cages are still kinda foobar'ed. Of everything, it seems to only have one hardware issue: one of the blower fans does not appear to work, but I'm unsure if I can fix it because it's jammed in and I can't get it out (it's near one of the corners that got bent in).

So far I've mostly just played with some basic stuff like updating the system firmware, resetting the iDRAC controllers on the blades, as well as testing real-time failover of the PSUs and CMC. It felt very wrong being able to just suddenly unplug the power supply cable and everything be fine. In the coming days, I should get in hard drives for it, and I'll try banging it up a bit more to see if I can fix the drive bay issues.

I did notice one thing that really pisses me and makes me wish I could change my ebay feedback from neutral to negative though... The seller had mentioned taking the rack ears off to protect them from damage. Well, they did, but they did so by literally breaking the rack ears off of the chassis. They literally broke one metal rack ear off, didn't think anything of it and then did it a second time. That's malicious levels of incompetence.

 No.1283

>>1282
Actually, what are you planning to do with this server setup? Don't think I saw any hints as to your intentions in this thread so far and I'm pretty interested.

 No.1284

>>1283
I'm planning on having one blade dedicated to TrueNAS, for managing the software RAID for the system, and the other two dedicated to Proxmox for running virtual machines.

Assuming I get the hard drives to work, my plan is to create a ZFS pool that is comprised of a stripe of two RAIDz3 vdevs of 11 disks each, with the remaining 3 (25 drives total) as hotspares in case of disk failure. In theory, a standard 10K HDD should have read/writes of around 150MB/s in typical usage, so theoretically that means the array will have a speed of around 2.4GB/s. Needless to say, that's much faster than standard gigabit ethernet, so I've been planning on upgrading my home network to either 10 or 40Gbps. Probably 40Gbps on the switch side, and just standard 10Gbps on my PC since that's what my motherboard supports.

The proxmox blades will (hopefully) each have one Nvidia Tesla P4 dedicated to them, with both using Nvidia vGPU to create virtual vGPUs to split among the virtual machines. I'd maybe like to get a third blade so that I can play with Proxmox hyperavailability, which allows a node (a server) with a cluster fail (turns off, dies, whatever) and then another node within that cluster will be able to pick up those virtual machines and start running them so that there is minimal downtime.

As far as what virtual machines I'll actually be running, I plan on mostly just running a bunch of web services for my home network.
- Jellyfin (Essentially self-hosted Netflix alternative)
- Sonarr + Radarr + Lidarr (Torrent grabbers)
- Pihole + Unbound (Blocks blacklisted domains like advertising URLs, and acts as a self-hosted DNS server so that I'm not tracked by Google or my ISP from their DNS resolvers)
- Uguu (Literally just Uguu on my home network so that I can share files between devices easily)
- An email server (Sorta undecided which)
- Portainer (GUI for managing Docker containers -- Linux jails that are somewhat like virtual machines, but are lightweight and rely on the host kernel)
- ZNC IRC bouncer
- AI stuff
- Vaultwarden (self-hosted Bitwarden for autogenerating passwords and storing them)
- Home Assistant (manages IOT devices)
- Obligatory game servers for things like Minecraft / Terraria / Etc.

Beyond that, I'll probably spin up virtual machines for experimenting with things like Windows virtual machines and seeing if I can run games on them, just for fun.

The thing that drew me to the VTRX in particular, which I think I sort of mentioned before, is that unlike with typical servers that are aged beyond their usefulness, instead of having to get rid of the old server and upgrade to a new platform you can just slot in a blade with newer hardware! The VRTX is also pretty neat in that it: it's fairly quiet compared to a typical server, and you can use it in a typical tower format. I had been thinking about using it as like an under-desk server, but it's kind of too big that role given my PC is already taking up my leg space beneath my desk. Also nice is that it runs off of standard 120V AC. Most blade servers run off of 3-phase 240V AC. I'm also just far more endeared to rackmount stuff than I was before. Now that I've got this, I'm kind of thinking of expanding my ambitions quite a bit more and building out a rack with dual redundant UPS's, and other things.

Anyways, it makes my room quite warm after a while and it's certainly not cheap to run continuously so it's more of a hobby thing than anything.

 No.1285

File:[SubsPlease] Megami no Caf….jpg (345.86 KB,1920x1080)

Yeah, I'm curious to. Seems like a lot of work, or is the work itself your hobby?
Oh, right as I'm typing this your post appears. Weird.

 No.1286

>>1285
Best screenshot in years

 No.1287

File:f_004247.gif (18.08 KB,48x48)

Got in a Brocade ICX6610 so that I now have a POE gigabit switch that's also capable of 10Gb/s SFP+ and 40Gb/s QSFP+. It is quite loud on startup. Need to use my Kill A Watt later and see how much power it draws. People online suggested ~110W even without any POE devices connected, which isn't great. Contemplating getting another smaller QNAP switch that I can put near my consoles and PC and connecting that with a 10Gb/s fiber connection between it and the Brocade.

 No.1288

File:b0999cb9817cc2721c16f05813….png (1.09 MB,1600x1200)


 No.1289

File:413AUZvNWfL._AC_SL1320_.jpg (18.39 KB,760x276)

>>1287
>People online suggested ~110W even without any POE devices connected
Decided to take a measurement. It's more like ~80W, which is more reasonable. With two PSUs it probably is closer to ~110W though. I managed to score one of the QNAP switches I mentioned. A QSW-M408-4C. It's a little funky because it combines SFP+ and RJ45 ports, meaning you can only use one or the other at a time per pair, but that's fine by me. The power adapter is a little weird also.

In other news, I decided to get a replacement for the VRTX I already have. Unfortunately, the one I have is just too banged up for me to use. What put me over the edge was that I finally got in hard drives, but the hard drive backplane itself isn't even being detected. Now, I could have gotten a replacement backplane, but the damage to the drive bays meant that a large number of the drive bays weren't usable anyways because the drives are misaligned and won't slot into the SAS connectors on the backplane. Not sure whether I'll keep it as a spare parts machine, or if I should put up a listing for it and let someone else have a crack at it. Anyways, good news is that the replacement one I got, I was able to get for the exact same price as the previous one.

In theory, this new one should also have the Enterprise CMC instead of an Express CMC as well. That means a few more features will be unlocked, like being able to set a chassis power limit and the ability to allocate more than 2 PCIe devices to a single blade, and also FlexAddress (I think this is something for setting IP routes? I can't quite remember). I'm hoping this one will be fine. I was able to get one from a seller that's in the same state and who had a much more extensive selling record, with many more listings in server-related hardware, whereas the other seemed more like an upstart liquidation company.

Perhaps I should make a network diagram later so that people can better understand what my setup looks like

 No.1290

File:New Chassis-1.jpg (3.6 MB,4032x3024)

New chassis arrived! You can see the old, busted one is beneath the new one. Slightly concerning is that none of the drives are showing up... I think that may be happening because the drives I got are Netapp drives and they may be formatted for 520 byte sectors instead of the standard 512 byte sectors. Although... In theory, they should at least show up as a physical disk, just an errored one, I think? Although the CMC is reporting that the SAS cables are disconnected even though they're not. Not sure what's up with that. I have some replacement SAS cables coming if they're in fact bad. We shall see. That said, no drives were being detected on one of my blades too. I may need a SAS adapter so I can reformat the sector size on my desktop one by one... That would suck since I would need to do that for 36 drives, potentially... At any rate, this one is practically immaculate by comparison! The packaging was absolutely stellar. So long as I don't run into any hardware-related issues within the next week or so, I'm definitely going to leave a positive review for the seller. The blade was wrapped in about an inch and half of bubble wrap, and then further surrounded by about 2 inches worth of packing peanuts. They even took out some of the components and put them in a second package to reduce the weight of the main package. Just a perfect job. Couldn't have hoped for anything better.

 No.1291

>>1290
holy shit that's a BIG sever thank god this one came in well
is this all wired up or not yet? if it's the latter, i'd like to see how the finished thing looks

 No.1292

File:[SubsPlease] Shiro Seijo t….jpg (241.27 KB,1920x1080)

>>1290
Wow. Yeah, that looks beastly. I have no idea what all of that stuff is, but it looks pretty cool.

 No.1293

File:ac8wu4.jpg (4.88 MB,4032x3024)

>>1291
This is pretty much the way it's probably going to stay for a long while unless I decide to migrate everything to a rack, but that'd be rather expensive. The only other change is that I can now put the security bezel on the front which I think looks pretty neat, but it mostly hides the hard drive indicator lights.

Today was pretty eventful. I got in some networking stuff in today and the SAS cables I needed.

Good news first: the SAS cables worked perfectly and every single drive was detected, unlike with the included SAS cables. I also got 4x 10/40/56GbE Mellanox ConnectX-3 NICs and the QNAP switch I had mentioned. Currently, the Mellanox NICs are all connected at 10Gb/s because networking gear faster than that increasingly approaches very expensive territory. I was also able to connect my Brocade switch (the one on top of the server in the previous image) to my QNAP switch with a fiber optic cable for 10Gb/s speed between them, and now my PC also can take advantage of its 10Gb/s RJ45 port that's on the motherboard.

Bad news: although all 25x hard drives show up, they all show up as unusable! I believe this is because they're Netapp drives and formatted with 520 byte sectors instead of 512 byte sectors. In theory, I should be able to use them once they're reformatted?... I'll probably need to buy an HBA and manually reformat them to 512 byte sectors one by one, which will probably take quite a while. There's also an issue with the networking... For some reason, the fiber optic cable between the QNAP switch and the Brocade switch constantly flips back and forth between link up and link down. I tried quite a few settings that I think may have been the culprit, but I couldn't figure it out... Essentially, I'm able to connect between the two switches, but because the connection continually drops out, it's very, very slow to load anything. Interesting part is neither the Brocade switch nor QNAP switch reports any errors on the port. Slightly worried that that means that the fiber optic cable is bad. I tried to make sure I handled it carefully and made sure not to bend it too harshly, since fiber optic cables use glass fibers that can break if you bend them too much. I suppose it could have been broken before I received it though... Very sad about that. Fortunately, new cables aren't too expensive, but I think I'll mess around with my switches some more before deciding to get another. I tried out an RJ45 SFP+ transceiver to see if it was a transceiver issue and that was a total dud. Showed up fine on the Brocade and noted 10g full duplex, but the QNAP switch saw it as auto negotiating at 1g, and there was no connection between the switches, so I'm not really sure what that means. Tried using one of the SFP+ ports on the QNAP switch and the RJ45 transceiver worked fine when used with my PC. I feel like there's probably either some obscure networking setting that's making it not work. Very confusing.

 No.1294

File:Chassis Overview.png (57.19 KB,444x220)

Something neat that I like about the VRTX is that you get a nice little visualization of all of the hardware installed in the chassis, and can hover over things for the details at a glance. So, you can see that there's 3 server blades installed, a bunch of fans, the PSUs, a whole bunch of drives, and so on. If there's any errors or warnings, the device will show up with red box or a yellow box respectively. You can also tell when a device is powered on because a small green dot will be on top of it. In the image in question, the chassis is powered down, so the only things that are on are a single CMC, a fan to cool it, and the PSUs.

>>1293
Did a bit more investigating my networking woes... It would seem that one of the 10Gb RJ45 connectors is literally just broken, so that's kinda sad. That being said, I was able to use my RJ45 SFP+ transceiver and ran some Cat6A to my switch instead and that worked flawlessly. I did some testing with iperf3 and was able to confirm that I am indeed getting 10Gb/s speeds with everything. Well... Sort of. The servers themselves can talk to each other at around 9.4Gb/s, but between my PC and the servers it's limited to more like 8.2Gb/s. The fact that it's not exactly 10Gb/s likely has to do with the fact that the servers are connected via a copper DAC cable instead of fiber optic, and instead of having a 10G fiber cable between the switches like I wanted to use, I'm instead using a ~50ft Cat 7 (Probably in name only) cable, and there's an additional 50 ft Cat 7 cable between my PC and the QNAP switch.

In other news, I ordered an HBA that will come eventually so that I can format all of the hard drives. I have a whole bunch of hard drives to format, 36x in total, but they're 10K 2.5" drives and they're only 900GB (30x) and 600GB (6x), so in theory formatting them all should be pretty quick to do. Rough guesstimate is ~30 minutes per drive, maybe 45 minutes worst case scenario. The HBA supports up to 8 drives, so in theory I should be able to format 8 drives at a time. All goes to plan, the formatting operation should only take an afternoon.

Also, I've been thinking about the sort of real-world performance I'll be able to expect out of the RAID array. The RAID controllers within the VRTX are PCIe 2.0 x8, which means a total bandwidth of 4GB/s. The NICs, however, are connected at 10Gb/s, but it's more like 8Gb/s by the time it reaches my PC. So, 8Gb/s, or around 1GB/s is probably the best sort of transfer speed I can expect out the array. If I were to upgrade to 40Gb/s networking (Maybe possible since the Mellanox SX6036 can potentially be used as an ethernet gateway, and then do 10/40/56Gb/s ethernet, and can be found for ~$200), that could be achievable -- eventually. The theoretical maximum with all of the drives is 5.5GB/s (22 * 250MB/s), but the RAID controller being PCIe 2.0 x8, will limit that to 4GB/s. So, on the server itself, I may be able to achieve ~4GB/s locally, but there's not really much application for that speed since it's mostly going to be for serving media.

 No.1295

>>1290
I can hear the tinnitus from here.

 No.1296

File:edzv09.mp4 (25.4 MB,1440x1440)

>>1295
It's only bad during start up, and most of the noise is from the switch. Once it idles, it's okay but still kind of annoying to listen to if you're right next to it or not wearing noise cancelling headphones.

 No.1297

File:Screenshot 2023-07-26 1821….png (14.44 KB,226x118)

Got all of the hard drives formatted. Set up TrueNAS Scale (Core did not like my RAID controller) on one of the blades and have ~14TB of usable capacity. The drives seem to be a mix of 58K hours, and 4K hour drives. Presumably, the 4K hour drives are ones that were swapped in to replace failed drives.

Not really much else to say. Slowly copying over files to it, but am limited by the speed of the external drives I'm copying stuff off of. Considering getting more RAM. Apparently OpenZFS limits the ARC cache to half of your total RAM capacity, whereas ZFS uses as much as possible.

 No.1298

File:Utawarerumono.S02E05.False….jpg (168.4 KB,1920x1080)

How many Kuon images do you think it can hold? You want to make sure not to lose them, so you'll need at least 5 copies on different drives

 No.1299

File:2022-11-06-133732.png (788.65 KB,960x544)

>>1298
Assuming a standard Kuon image size of 1MiB, at least 13,725,696.

 No.1300

Haven't used the server in quite some time, mostly because it uses a lot of energy and creates a lot of heat. Now that the weather is getting cooler, I decided to take a look at it again. A while ago I had migrated my Ubuntu server VM from my laptop to the blade server. I had a bunch of services already set up in containers, but they were using the old IP of the laptop instead of the blade server, so I went in and changed the DNS records in Pihole and updated them to the blade server IP, as well as set up port redirects with Nginx Proxy Manager so that I can go to "cytube.home.arpa" instead of having to type "cytube.home.arpa:8080" or "192.168.1.2:8080". Unbound DNS is nice, but the uncached response time is ~100ms, which is ~3x slower than most DNS servers. That said, cached response times are 0ms, which is considerably faster than all other DNS servers since although they also have a cached response time of "0ms" this is blunted by the ping response time. In effect, for websites you visit regularly, the response time becomes instant.

Anyways, now I'm looking at setting up some more services.
The ones I'm looking at setting up are:
Postal as an email server, so that I can collect status emails and stuff from various services
Roundcube as an email client
LocalAI which is for hosting AI stuff with an OpenAI compatible API
TavernAI or maybe SillyTavern -- I'm not really sure what the differences are and whether there's particular benefit to one over the other.
Stable Diffusion WebUI
Vaultwarden (FOSS password manager that is compatible with Bitwarden or something)
Home Assistant -- I have a few smart LEDs but I'd like to probably replace them with a brand that I don't need to use a stupid proprietary app for...

From what I remember, none of these have docker containers setup which was why I never set them up when I had done the rest of them. Making docker containers isn't too difficult but it's kind of a pain in the ass setting them up and fixing random things compared to being able to use something that just works.

 No.1301

>>1300
>uses a lot of energy and creates a lot of heat
Hmmmm, would it be possible to downscale it by turning off some parts or is it an all or nothing kinda deal?
>cached response times are 0ms
Nice.

 No.1302

File:1477250187862.jpg (851.78 KB,3537x3212)

>>1301
>Hmmmm, would it be possible to downscale it by turning off some parts or is it an all or nothing kinda deal?
The total power draw is only ~660W, which isn't much worse than a single high-end gaming PC under a heavy workload. The issue is just that my room has absolutely awful air circulation so heat builds up very easily; without anything else on, just playing games on my PC tends to heat up my room quite badly.

 No.1303

Forgot to mention: I got Postal kind of working, but it was a pain. Had to do quite a bit of digging around trying to get it to work. Now I'm a bit stumped when it comes to the DNS records it wants me to set up.

I tried creating a file under dnsmasq.d with my Pihole container containing the TXT records, MX record, and CNAME record it needs, but when I go into the web portal it shows that there's some error and its not detecting the DNS records... If I do a "dig TXT email.home.arpa", I'll see the TXT record so I'm not really sure what its problem is... I can't really tell if I'm a moron, or if it's unhappy because it's expecting everything to be on the open internet with SSL certificates and I'm too lazy to set up SSL certificates for stuff that's literally only ever going to be accessed within my own internal network.

 No.1304

Rereading this with a bit of a better server knowledge and I don't understand why you would want to virtualize a GPU across VMs. It's not like you'd ever want one VM to have half the GPU and one VM the other. All or nothing

 No.1305

Not to say there's no niche value or the desire to open it up as a possibility is invalid... I just don't see where I would encourage the use of this

 No.1306

>First, why might you want to do this? Well, the most obvious reason is that virtual machines are slooow. So, by passing through a GPU you can improve its speed considerably.
nvm, for getting the screen to be displayed faster. I get it's worth a bit. But this isn't a server specific problem since you'd do it on a CLI anyways

 No.1307

>>1304
In general, I would agree. It's not necessarily something that makes sense outside of use-cases where you need to provision a large number of virtual machines with a GPU. The most directly applicable situation would be in a datacenter environment where you are provisioning GPUs to multiple customers; most often you are dealing with GPUs that have a larger amount of VRAM than the individual needs of any given customer, and so you can use a vGPU profile to then split the GPU into multiple vGPUs. For example, if you have a 48GB GPU, you can split that into 6x 8GB vGPUs, 3x 16GB vGPUs, 2x 24GB vGPUs, and so on.

In my case, I obviously don't need this, so it's largely just for academic purposes and for fun, but it is valuable in a few areas. Namely, being able to split the GPU between virtual machines allows me to -- for example -- run a Windows VM with GPU acceleration, and run a bunch of containers on another VM and passthrough the GPU to those containers as necessary, both at the same time. Using vGPU profiles also allows for platform agnostacy (I'm not sure that's a word...), so that any given VM can be migrated to a completely different GPU without having any fundamental changes in GPU drivers. This is the same reason why VMs try to hide the CPU info from the VM (although in certain circumstances it can be beneficial to do this). By keeping as much information about what hardware the VM is running on as possible, you ensure that the VM is completely blind from any underlying hardware changes that would otherwise necessitate driver changes which may or may not cause instability.

 No.1308

Jesus, words words words lol. Setting up GPU passthrough is -not- that complicated, especially with integrated + dedicated. I'll have a better read through later but I'm assuming all of that headache is from using libvirt instead of straight qemu.

>>1304
(3D-accelerated) GPU virtualisation with a Windows guest is currently impossible

 No.1309

>>1308
>Setting up GPU passthrough is -not- that complicated, especially with integrated + dedicated.
With a PC, sure. It's fairly simple. Most of the instructions are settings that need to be performed on the host hypervisor, not settings to apply to a virtual machine, to prevent it from loading any GPU drivers and taking control of the GPU. In Proxmox, it's really just a matter of selecting the PCIe device address of the GPU. Any additional headaches were due to this being on a laptop, which added additional complexity towards the end.

>I'll have a better read through later but I'm assuming all of that headache is from using libvirt instead of straight qemu.
I'm fairly certain Proxmox uses QEMU.

 No.1310

File:Screenshot 2024-02-21 2008….png (60.88 KB,851x540)

Hard drives are painful as primary storage... I wish used SSDs were as cheap and as plentiful as used hard drives...

 No.1311

File:[SubsPlease] Isekai de Mof….jpg (223.38 KB,1920x1080)

>>1310
I think it depends on what you mean by "primary storage". All my anime and game ROMs/ISOs (when not being played) are on HDD since it doesn't need "performance" and 99.9999% of the time it sits there doing nothing. I have AI models on SSD since they get loaded and unloaded and stuff and I read that can be bad for HDDs to move that much data around regularly. SSDs and NMVe prices seem a bit better lately, but I think they're expected them to stagnate or increase unfortunately.

 No.1312

File:Screenshot 2024-02-21 2159….png (53.72 KB,972x329)

>>1311
Well... I'm making a backup of a virtual machine's filesystem because the virtual machine was acting erratically and not starting properly. It's virtual drive is only 80GB, but because it's on a rather slow hard drive and filesystems are often filled with tons of tiny files, in the image there it slowed down to 7.4MB/s. If the remaining ~59% go at that speed, it'll take about 2 more hours from when that screenshot was taken. The task info stopped updating though and I noticed earlier a few messages saying "Bad block medium error"... I'm hoping some hard drive errors aren't the cause of all this... My server seems to suggest that the hard drive is alright though and not likely to fail.

 No.1313

sigh... I think the drive really did corrupt the VM... I tried copying the .raw file that stored the VM's filesystem manually, but it stalled after having only moved 26GB over ~8 hours. I hope I can recover some of the files at least, but it seems like after ~2 minutes the VM freezes up and gets itself into such an unresponsive state that the VM process cannot be killed, requiring a full restart of the host machine. It must be corrupted in a very peculiar way because it boots fine, but then after some time, nothing can be launched and any attempt to do so results in the process hanging with no ability to force quit via key macro... Probably the worst part of all is that I cannot even SSH into the VM, if I could, I could at least try to retrieve some of the files via SFTP.

Uggggghhhhh

 No.1412

>>1313
on topic sager




[Return] [Top] [Catalog] [Post a Reply]
Delete Post [ ]

[ home / bans / all ] [ qa / jp ] [ maho ] [ f / ec ] [ b / poll ] [ tv / bann ] [ toggle-new / tab ]