apt-get升级后Nvidia图形驱动程序和CUDA的问题

我以前使用Nvidia的“deb(网络)”安装在Ubuntu 14.04上安装了CUDA 7.5。 它已经工作了几个月,直到我今天运行sudo apt-get upgrade 。 这样做之后,我遇到了以下情况

 $ nvidia-smi modprobe: ERROR: ../libkmod/libkmod-module.c:809 kmod_module_insert_module() could not find module by name='nvidia_352' modprobe: ERROR: could not insert 'nvidia_352': Function not implemented NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. 

运行sudo nvidia-smi也不sudo nvidia-smi 。 我无法以GUI模式登录(输入密码后只返回登录界面),但我可以访问终端。

我已经能够恢复图形function,但是之后我很难重新安装CUDA。 你能帮我么?

恢复图形

我发现我可以通过这样做让图形再次起作用

 $ sudo apt-get remove --purge nvidia* $ sudo apt-get autoremove 

然后编辑/etc/apt/sources.list.d/cuda.list删除所有行,然后执行

 $ sudo apt-get install nvidia-352 

并重新启动系统。 在此之后, nvidia-smi再次工作。 但是,我仍然需要重新安装CUDA。

试图重新安装CUDA

我尝试恢复/etc/apt/sources.list.d/cuda.list的内容,然后执行sudo apt-get install cuda 。 我注意到这个错误消息:

 Loading new nvidia-352-352.93 DKMS files... Building only for 3.13.0-68-generic Building for architecture x86_64 Building initial module for 3.13.0-68-generic ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/nvidia-352.0.crash' Error! Bad return status for module build on kernel: 3.13.0-68-generic (x86_64) 

执行此操作后,系统将在开始时返回其行为。 例如, nvidia-smi打印上面的错误消息,在构建和运行deviceQuery之后,我得到一个类似的错误:

 ./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) modprobe: ERROR: ../libkmod/libkmod-module.c:809 kmod_module_insert_module() could not find module by name='nvidia_352' modprobe: ERROR: could not insert 'nvidia_352': Function not implemented cudaGetDeviceCount returned 38 -> no CUDA-capable device is detected Result = FAIL 

我似乎记得当我第一次安装CUDA时,它只有在我没有从Nvidia存储库更新nvidia-352软件包的情况下才能工作。 但是,现在我似乎没有选择这样做,因为当我运行sudo apt-get install cuda它会自动升级nvidia-352包:

 Unpacking nvidia-352 (352.93-0ubuntu1) over (352.63-0ubuntu0.14.04.1) ... 

如果我尝试明确设置版本,我会得到

 $ sudo apt-get install cuda-drivers nvidia-352=352.63-0ubuntu0.14.04.1 nvidia-352-dev=352.63-0ubuntu0.14.04.1 Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation: The following packages have unmet dependencies. cuda-drivers : Depends: nvidia-352 (>= 352.93) but 352.63-0ubuntu0.14.04.1 is to be installed Depends: nvidia-352-dev (>= 352.93) but 352.63-0ubuntu0.14.04.1 is to be installed E: Unable to correct problems, you have held broken packages. 

事实上,如果我尝试使用版本352.63-0ubuntu1而不是352.63-0ubuntu0.14.04.1

 $ sudo apt-get install nvidia-352=352.63-0ubuntu1 

那么这足以打破图形登录并导致nvidia-smi显示上面的错误消息。

诊断

 $ lspci | grep -i vga 01:00.0 VGA compatible controller: NVIDIA Corporation GM200 [GeForce GTX TITAN X] (rev a1) $ dpkg -l | grep -i nvidia ii bbswitch-dkms 0.7-2ubuntu1 amd64 Interface for toggling the power on nVidia Optimus video cards ii libcuda1-352 352.93-0ubuntu1 amd64 NVIDIA CUDA runtime library ii nvidia-352 352.93-0ubuntu1 amd64 NVIDIA binary driver - version 352.93 ii nvidia-352-dev 352.93-0ubuntu1 amd64 NVIDIA binary Xorg driver development files ii nvidia-352-uvm 352.93-0ubuntu1 amd64 Transitional package for nvidia-352 ii nvidia-modprobe 352.93-0ubuntu1 amd64 Load the NVIDIA kernel driver and create device files ii nvidia-opencl-icd-352 352.93-0ubuntu1 amd64 NVIDIA OpenCL ICD ii nvidia-prime 0.6.2 amd64 Tools to enable NVIDIA's Prime ii nvidia-settings 352.93-0ubuntu1 amd64 Tool for configuring the NVIDIA graphics driver 

我有类似的问题。 能够通过安装推荐版本的nvidia驱动程序来解决这个问题。

 sudo apt-get install ubuntu-drivers-common sudo ubuntu-drivers devices sudo apt-get install  

朋友能够为我解决它!

他向我展示的解决方案是(在删除所有nvidia包之前)

 $ sudo add-apt-repository ppa:graphics-drivers/ppa $ sudo apt-get install nvidia-364 

然后从Nvidia下载.run CUDA安装程序(对我来说是cuda_7.5.18_linux.run),当被问及是否要安装与CUDA一起打包的驱动程序时,请小心选择“否”。