In my case it’s performance and sheer RAM need.
GLM 4.5 needs like 112GB RAM and absolutely every megabyte of VRAM from the GPU, at least without the quantization getting too compressed to use. I’m already swapping a tiny bit and simply cannot afford the overhead.
I think containers may slow down CPU<->GPU transfers slightly, but don’t quote me on that.