Re.Vi
Re.Vi
发布于 2024-04-03 / 30 阅读
3
0

centos7安装深度学习环境

安装显卡驱动

查看显卡型号

lspci | grep -i vga
>>> 64:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
>>> 81:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1) # 显卡型号

下载显卡驱动

根据显卡型号选择对应的驱动

https://www.nvidia.cn/Download/index.aspx?lang=cn

复制下载链接,到终端里下载

复制的链接:https://cn.download.nvidia.com/XFree86/Linux-x86_64/550.67/NVIDIA-Linux-x86_64-550.67.run

wget @download_link
wget https://cn.download.nvidia.com/XFree86/Linux-x86_64/550.67/NVIDIA-Linux-x86_64-550.67.run

可以先cd进到要保存到路径

安装驱动

关闭X服务

X service是centos中的桌面服务,安装显卡驱动前需要先关闭

查找X服务进程

ps aux|grep X

根据pid结束进程

kill @X_service_pid # 通常是/usr/bin/X, /usr/bin/Xvnc,重启就会重新启动

安装

进到下载目录,找到下载的驱动文件: NVIDIA-Linux-x86_64-550.67.run

cd Downloads

安装驱动

sudo sh NVIDIA-Linux-x86_64-550.67.run
会遇到的选项
  • Install NVIDIA’s 32-bit compat ibility libraries?

是否安装32位兼容性库,安不安装都行

  • Would uou like to run the nvidia-xconfio utility to automatically undate your X configuration file so that the NVIDiA X driver will be used when you restart X? Any pre-existing X configuration file will be backed up.

Would uou like to run the nvidia-xconfio utility to automatically undate your X configuration file so that the NVIDiA X driver will be used when you restart X? Any pre-existing X configuration file will be backed up.

**除了上述选项需要选择外,其他用默认选项就行**

安装完成后,重启机器

sudo reboot

查看是否安装完成,查看cuda版本

nvidia-smi

同时看一下安装的cuda版本号是多少,后面安装torch要用到。

可能遇到的问题

如果已经安装,但是用着用着用不了了,用 nvidia-smi查看一下驱动。

出现这样的结果:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

则需要卸载后重新安装。

  1. Ctrl+Alt+F2(或F1)进入命令行界面

  2. 输入用户名和密码

  3. 找到安装的sh文件

    sudo sh NVIDIA-Linux-x86_64-535.98.run --uninstall
    

    如果前面备份了就选是,没有就选否,但安装的时候推荐选否。

    If you plan to no longer use the NVIDIA driver, you should make sure that no X screens are configured to use the MVIDIA Xdriver in your X confiquration file. lf you used nvidia-xconfig to conf igure X, it may have created a backup of youroriginal conf iquration. Would uou like to run nvidia-xconfig --restore-original-backup’ to attempt restoration of theoriqinal X conf iqurat ion file?
    
  4. 其他卸载方法

    sudo yum remove nvidia-*
    

    为了删除干净,可以相关组件都删掉

    rpm -qa|grep -i nvid|sort
    yum  remove kmod-nvidia-*
    
  5. 卸载完后重启服务器:sudo reboot

  6. 重复前面的安装步骤

安装conda

使用miniconda3管理python环境,参考官网的安装方法,最后安装完成后init bash

参考链接:https://docs.anaconda.com/free/miniconda/

下载与安装

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh

初始化

~/miniconda3/bin/conda init bash
~/miniconda3/bin/conda init zsh

检查是否安装成功

conda --version # 看有没有输出版本号

激活环境

重新打开终端,就可以看到前面有(base)

安装torch

创建并激活conda环境

conda create -n @env_name python=@python_version -y

以安装名字为dl,python版本为3.9的环境为例:

conda create -n dl python=3.9 -y

激活环境

conda activate @env_name

安装torch

参考链接:https://pytorch.org

先激活想要安装pytorch到环境,这里以dl为例

conda activate dl

根据cuda版本,使用pip安装pytorch

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu@cuda_version

以cuda版本为12.4为例,cuda_version应该对应的是124(就是去掉小数点),但是torch可能还没有对应的版本,这时候只要把cuda_version向下减1到可以pip安装为止,因为torch版本会向上兼容cuda,所以不是完全对应的也没有关系

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

安装完成后,进入python查看cuda是否可用

python
import torch

torch.cuda.is_available()
>>> True

评论