centos7安装nvidia driver+cuda+cudnn:
(1)从nvidia官网下载tesla t4 driver;
(2)./[driver-name].run
不使用dkms,报错在/lib/module/3.10.0-514.el7.x86_64/找不到kernel tree,安装需要当前运行kernel source;
(3)指定kernel tree位置:
./[driver-name].run --kernel-source-path /usr/src/kernels/3.10.0-1062.4.1.el7.x86_64/
不使用dkms,生成了nvidia.ko,但不能load,报错信息如下:
ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most frequently when this kernel module was built against the
wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target
kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of
the NVIDIA GPU(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.
Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file
'/var/log/nvidia-installer.log' for more information.
分析原因:编译生成nvidia.ko过程中,需要使用与当前运行内核一致的kernel source,但现有kernel source版本不一致。
(4)下载kernel-devel-3.10.0-514.el7.x86_64.rpm
安装后重新run并指定kernel path,安装成功。
输入nvidia-smi,可正常显示则表明安装无误。
安装时若提示已有newer版本,则先卸载newer版本:
rpm -e --nodeps kernel-devel-3.10.0-1062.4.1.el7.x86_64
(5)运行cuda_10.0.130_410.48_linux.run,选择不安装推荐driver、不安装opengl,安装samples。生成目录:/usr/local/cuda-10.0/,samples目录/root,
创建软链接/usr/local/cuda/
(6)官网注册,使用的是qq邮箱,密码qq密码+,
下载cudnn-10.0-linux-x64-v7.6.5.32.tgz
解压后,cp cuda/include/cudnn.h /usr/local/cuda/include/
cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
将上述文件权限设为所有人可读:
chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*