Theres a lot more to this stuff than I thought there would be when starting out. I spent the day familiarizing with how to take apart my pc and swap gpus .Trying to piece everything together.
Apparently in order for PC to startup right it needs a graphical driver. I thought the existance of a HDMI port on the motherboard implied the existance of onboard graphics but apparently only special CPUs have that capability. My ryzen 5 2600 doesnt. The p100 Tesla does not have graphical display capabilities. So ive hit a snag where the PC isnt starting up due to not finding a graphical interface output.
I'm going to try to run multiple GPU cards together on pcie. Hope I can mix amd Rx 580 and nvidia tesla on same board fingers crossed please work.
My motherboard thankfully supports 4x4x4x4 pcie x16 bifurcation which isa very lucky break I didnt know going into this ๐
Strangely other configs for splitting 16x lanes like 8x8 or 8x4x4 arent in my bios for some reason? So I'm planning to get a 4x bifurcstion board and plug both cards in and hope that the amd one is recognized!
According to one source The performance loss for using 4x lanes for GPUs doing the compute i m doing is 10-15 % surprisingly tolerable actually.
I never really had to think about how pcie lanes work or how to allocate them properly before.
For now I'm using two power supplies one built into the desktop and the new 850e corsair psu. I choose this one as it should work with 2-3 GPUs while being in my price range.
Also the new 12v-2x6 port supports like 600w enough for the tesla and comes with a dual pcie split which was required for the power cable adapter for Tesla. so it all worked out nicely for a clean wire solution.
Sadly I fucked up a little. The pcie release press plastic thing on the motherboard was brittle and I fat thumbed it too hard while having problems removing the GPU initially so it snapped off. I dont know if that's something fixable. It doesnt seem to affect the security of the connection too bad fortunately. I intend to grt a pcie riser extensions cable so there won't be much force on the now slightly loosened pcieconnection. Ill have the gpu and bifurcation board layed out nicely on the homelab table while testing, get them mounted somewhere nicely once I get it all working.
I need to figure out a external GPU mount system. I see people use server racks or nut and bolt meta chassis. I could get a thin plate of copper the size of the desktops glass window as a base/heatsink?
Yep. Vulkan is recommended for cross-vendor setups, more commonly where there's integrated graphics.
I actually had ti and xtx variants, so vram was 12+24GB = 36 GB. Vulkan is implemented cross-vendor and running vulkan-based llama.cpp yielded similar (though slightly worse) performance than CUDA on the 3080ti as a point of reference.
I don't have this well documened but, from memory, Llama3.1 8B_k4 could reliably get arund 110 tk/s on CUDA and 100 on Vulkan on the same computer
I used this setup specifically to take advantage of the vastly increased VRAM of having two cards. I was able to use 32B_k4 models which were outside of the VRAM of either card and tracked power and RAM uasage with Lact. Performance seemed pretty great compared to my friend running the same models on a 4x4060ti setup using just CUDA.
If this is interesting to a lot of people, I could put this setup together to answer more questions / do a separate post. I took the setup apart because it physically used more space than what my case could accommodate and I had the 3080ti literally hanging out of a riser.
Wow, I had no idea! Nor did I know that Vulkan performs so well. I'll have to read more, because this could really simplify my planned build.
Count me as someone who would be interested in a post!