登峰造极境

  • WIN
    • CSharp
    • JAVA
    • OAM
    • DirectX
    • Emgucv
  • UNIX
    • FFmpeg
    • QT
    • Python
    • Opencv
    • Openwrt
    • Twisted
    • Design Patterns
    • Mysql
    • Mycat
    • MariaDB
    • Make
    • OAM
    • Supervisor
    • Nginx
    • KVM
    • Docker
    • OpenStack
  • WEB
    • ASP
    • Node.js
    • PHP
    • Directadmin
    • Openssl
    • Regex
  • APP
    • Android
  • AI
    • Algorithm
    • Deep Learning
    • Machine Learning
  • IOT
    • Device
    • MSP430
  • DIY
    • Algorithm
    • Design Patterns
    • MATH
    • X98 AIR 3G
    • Tucao
    • fun
  • LIFE
    • 美食
    • 关于我
  • LINKS
  • ME
Claves
长风破浪会有时,直挂云帆济沧海
  1. 首页
  2. Platforms
  3. LINUX
  4. 正文

R730+P100+Docker+GPU+Nvidia

2023-12-04
参考链接:
1.https://www.cnblogs.com/klvchen/p/17295624.html
https://zhuanlan.zhihu.com/p/664599034
2.https://www.nvidia.com/download/
3.https://www.cnblogs.com/devilmaycry812839668/p/17269217.html

NVIDIA官方:https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

一、环境

服务器:R730

显卡:P100、2080TIS

系统:UBUNTU 22.04

Docker+PyTorch安装驱动及环境部署

二、配置过程

参考链接:https://www.cnblogs.com/klvchen/p/17295624.html

2.1 安装 nvidia 显卡驱动

# 安装前先确定机器上的显卡型号
sudo lspci | grep -i nvidia

# 去官网下载 
https://www.nvidia.cn/Download/index.aspx?lang=cn

# 禁用 nouveau 驱动
# Ubuntu 系统集成的显卡驱动程序是 nouveau,它是第三方为 NVIDIA 开发的开源驱动,我们需要先将其屏蔽再安装 NVIDIA 官方驱动。
sudo vim /etc/modprobe.d/blacklist.conf 
# 在最后添加如下内容
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist rivatv
blacklist nvidiafb

sudo update-initramfs -u

# 重启机器后,执行如下命令,如果没有输出则证明禁用成功。
sudo reboot
sudo lsmod | grep nouveau

# 安装编译工具
sudo apt install gcc make -y

cd /data/software
sudo chmod +x NVIDIA-Linux-x86_64-525.105.17.run 
sudo ./NVIDIA-Linux-x86_64-525.105.17.run -no-x-check -no-nouveau-check -no-opengl-files

# 检查
nvidia-smi

2.2安装 nvidia-docker

sudo apt -y install docker.io

# Setup the package repository and the GPG key:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# 更新源和安装 nvidia-container-toolkit
sudo apt-get -y update
sudo apt-get install -y nvidia-container-toolkit

# 配置Docker守护进程以识别NVIDIA容器运行时
sudo nvidia-ctk runtime configure --runtime=docker

# 重启 Docker
sudo systemctl restart docker

# 测试
sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

三、常见问题

1、过了一段时间后,nvidia-smi报错:NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

解决方法:解决NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver_nvidia-smi has failed because it couldn't communic-CSDN博客

2、驱动安装错误

注意ubuntu22.04版本中,需要选择cuda12,选择较高版本的cuda。太低会有各种错误。

标签: 暂无
最后更新:2024-04-02

代号山岳

知之为知之 不知为不知

点赞
< 上一篇
下一篇 >

COPYRIGHT © 2099 登峰造极境. ALL RIGHTS RESERVED.

Theme Kratos Made By Seaton Jiang

蜀ICP备14031139号-5

川公网安备51012202000587号