Intro
According to https://pytorch.org/get-started/locally/ and https://github.com/NVIDIA/nvidia-docker the following has to be installed:
- python:
>= 3.6
- python package managers:
Anaconda
orpip
- docker:
> 19.03
NVIDIA software:
Please also note:
- CUDA
10.2
requires GCC<= 8
Install compatible version of gcc
Ubuntu 20.04 LTS comes with:
- gcc:
9
So the first thing we need to do is to install compatible versions of gcc
i.e. gcc 8
.
Install gcc's
. But let’s use gcc 9
for now as it will be used to install GPU Drivers.
> sudo apt -y install build-essential
> sudo apt -y install gcc-8 g++-8 gcc-9 g++-9
> sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 8 --slave /usr/bin/g++ g++ /usr/bin/g++-8
> sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 9 --slave /usr/bin/g++ g++ /usr/bin/g++-9
> sudo update-alternatives --config gcc
There are 2 choices for the alternative gcc (providing /usr/bin/gcc).
Selection Path Priority Status
------------------------------------------------------------
* 0 /usr/bin/gcc-9 9 auto mode
1 /usr/bin/gcc-8 8 manual mode
2 /usr/bin/gcc-9 9 manual mode
Press <enter> to keep the current choice[*], or type selection number: 0
update-alternatives: using /usr/bin/gcc-9 to provide /usr/bin/gcc (gcc) in auto mode
Install python and python package managers
There are multiple ways how to manage python
versions and envs.
I’ve selected pyenv + pyenv-virtualenv
> sudo apt-get install -y zlib1g-dev libbz2-dev libreadline-dev libssl-dev libsqlite3-dev libffi-dev
> pyenv install 3.8.2
> pyenv virtualenv 3.8.2 torch
> pyenv global torch
> python -V
Python 3.8.2
NVIDIA GPU drivers
Get and run installer.
> sudo bash NVIDIA-Linux-x86_64-440.82.run
ERROR: The Nouveau kernel driver is currently in use by your system. This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding. Please consult the NVIDIA
driver README and your Linux distribution's documentation for details on how to correctly disable the Nouveau kernel driver.
… follow the instructions to blacklist Nouveau kernel driver. Then run:
> sudo update-initramfs -u
> sudo reboot
Retry installation, ignore Xorg
warnings and configuration if you are working from command line only.
> sudo bash NVIDIA-Linux-x86_64-440.82.run
Note you still need gcc-9
to install drivers. At the end you should get
Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version: 440.82) is now complete.
Verify installation:
> nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 107... Off | 00000000:09:00.0 Off | N/A |
| 0% 59C P0 38W / 180W | 0MiB / 8118MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
CUDA Toolkit and cuDNN
Switch gcc
to gcc-8
> sudo update-alternatives --config gcc
There are 2 choices for the alternative gcc (providing /usr/bin/gcc).
Selection Path Priority Status
------------------------------------------------------------
* 0 /usr/bin/gcc-9 9 auto mode
1 /usr/bin/gcc-8 8 manual mode
2 /usr/bin/gcc-9 9 manual mode
Press <enter> to keep the current choice[*], or type selection number: 1
update-alternatives: using /usr/bin/gcc-8 to provide /usr/bin/gcc (gcc) in manual mode
gcc --version
gcc (Ubuntu 8.4.0-3ubuntu2) 8.4.0
Run installer, accept license, deselect Driver
> sudo bash cuda_10.2.89_440.33.01_linux.run
x CUDA Installer x
x - [ ] Driver x
x [ ] 440.33.01 x
x + [X] CUDA Toolkit 10.2 x
x [X] CUDA Samples 10.2 x
x [X] CUDA Demo Suite 10.2 x
x [X] CUDA Documentation 10.2 x
x Options x
x Install x
Please make sure that
- PATH includes /usr/local/cuda-10.2/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-10.2/lib64, or, add /usr/local/cuda-10.2/lib64 to /etc/ld.so.conf and run ldconfig as root
Unzip the cuDNN package
> tar -xzvf cudnn-10.2-linux-x64-v7.6.5.32.tgz
Copy the following files into the CUDA Toolkit directory, and change the file permissions.
> sudo cp cuda/include/cudnn.h /usr/local/cuda/include
> sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
> sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
I had to make the following changes:
> rm /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so
> sudo ln -s libcudnn.so.7.6.5 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7
> sudo ln -s libcudnn.so.7 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so
> sudo ldconfig
PyTorch
> pip install --upgrade pip
> pip install torch torchvision
Check GPU support is enabled, and you can access your GPU:
> python -c "from __future__ import print_function; import torch; print(torch.cuda.is_available())"
True
Also to benchmark your GPU you can do the following:
> git clone https://github.com/ryujaehun/pytorch-gpu-benchmark
> pip install psutil cufflinks plotly pandas matplotlib
> cd pytorch-gpu-benchmark/
> python benchmark_models.py
So you can compare your results with some well-known numbers.
You also can run nvidia-smi
in parallel to check GPU load.
> watch nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 107... Off | 00000000:09:00.0 Off | N/A |
| 39% 63C P2 179W / 180W | 7999MiB / 8118MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 84404 C ...ubuntu/.pyenv/versions/torch/bin/python 7989MiB |
+-----------------------------------------------------------------------------
Docker support on Ubuntu 20.04 LTS
There are number of docker
packages available
docker-ce
package from docker.comdocker.io
package provided by Canonicaldocker
package provided by Red Hat
Docker versions > 19.03
are supported by NVIDIA Container Toolkit.
At time of writing the official Ubuntu’s docker.io
is the best option to use. Just run
> sudo apt install docker-compose
Check docker
version
> docker version
Client:
Version: 19.03.8
API version: 1.40
...
Server:
Engine:
Version: 19.03.8
API version: 1.40 (minimum version 1.12)
...
NVIDIA Container Toolkit
Add the package repositories
> distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
> curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
> curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
At this point I had to clean up /etc/apt/sources.list.d/
folder.
> cd /etc/apt/sources.list.d/; rm docker.list*
Continue installation
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
Validating installation
Run nvidia-smi
in a docker
> docker run --gpus all --rm nvidia/cuda nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 107... Off | 00000000:09:00.0 Off | N/A |
| 0% 50C P0 35W / 180W | 0MiB / 8118MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Run PyTorch
in a docker
> docker run -it --rm --gpus all pytorch/pytorch python -c "from __future__ import print_function; import torch; print(torch.cuda.is_available())"
True
Run benchmark in a docker
> docker run --gpus all --shm-size=512M -it --rm -v /home/ubuntu/pytorch-gpu-benchmark:/pytorch-gpu-benchmark pytorch/pytorch
(in a docker) > pip install psutil cufflinks plotly pandas matplotlib
(in a docker) > python /pytorch-gpu-benchmark/benchmark_models.py
Fix the path /home/ubuntu/pytorch-gpu-benchmark
above to your local folder.
Comments