No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

NVidia GRID abnormal behavior in FusionCompute (driver install fails in virtual machine)

Publication Date:  2017-03-29 Views:  434 Downloads:  0
Issue Description

During the deployment of vGPU feature in FusionCompute (V100R005C10U1, but seems other versions are also affected) with NVidia GRID K2 card (or any other NVidia GRID card) installed in RH2288H (or any other) server we can attach a vGPU to a VM in FusionCompute portal. But when we try to install the driver into VM we face different problems, depending on vGPU mode.

1. If vGPU mode is “passthrough”, then the GPU is detected by OS, but it is still not working after driver installation and some unknown devices are present in Device Management:

2. If vGPU mode is not “passthrough” (i.e. “K200” or “K240q” etc), then the VM console goes black after attaching the vGPU to a VM:

And if we use RDP to connect t o VM, we see that no graphic adapter is installed at all:

In this case we can’t even install the GPU driver – the installation fails because the appropriate hardware is not detected.

Handling Process

The correct procedure of vGPU feature deployment is as follows.

1. Obtain the latest version of driver. At the moment the link is following: http://international.download.nvidia.com/Windows/Quadro_Certified/GRID/361.45/NVIDIA-GRID-vGPU-kepler-UVP-361.45.09-362.56.zip 

In Passthrough mode you should download a driver from NVidia website.

2. Connect to host with installed K2 card via SSH and install .rpm package (you have to upload it to a host with WinSCP first).
We have to connect to a host management port via SSH (with Putty, for example), log in using system account gandalf, switch to a root account (su - root), and turn off the inactivity timeout (TMOUT=0):

 

Check other driver version presence with nvidia-smi command (we see no driver installed):

Install the appropriate .rpm package with rpm –ivh command (you have uploaded the package to a host with WinSCP first):

3. Reboot the host with reboot command.
4. Attach a vGPU to a VM and install the driver into it. This is where we stuck.

First three steps were finished successfully, we can tell that because the video card is detected by FusionCompute.

So let’s check the video card status.
Again, we have to connect to a host management port via SSH. See the Step 2 above.

Now we can check the video card status with nvidia-smi –L (shows all video cards) command:

Now we see the reason – “not all required external power cables are attached”.
Also you can see the similar message in host OS logs (/var/log/messages file), it says “GPU does not have the necessary power cables connected”:

So, the installation of NVidia K2 (and a lot of other video cards) suggest the connection of external power cables to it. You can see that, for example, in server manual:

When we check the physical presence – we can see the power cables are missing. After connecting the cables we can successfully install drivers in VM and confirm the vGPU function:


Root Cause

Insufficient power supplied to a video card. NVidia GRID K2 needs two power cables connected. Without that, the card is being detected by system, but can’t work properly.

Solution

Connect necessary power cables to a video card.

Suggestions

You are advised to check the proper installation of hardware, i.e. power cables connection.

END