Intro
According to https://pytorch.org/get-started/locally/ and https://github.com/NVIDIA/nvidia-docker the following has to be installed:
- python: >= 3.6
- python package managers: Anacondaorpip
- docker: > 19.03
NVIDIA software:
Please also note:
- CUDA 10.2requires GCC<= 8
Install compatible version of gcc
Ubuntu 20.04 LTS comes with:
- gcc: 9
So the first thing we need to do is to install compatible versions of gcc i.e. gcc 8.
Install gcc's. But let’s use gcc 9 for now as it will be used to install GPU Drivers.
> sudo apt -y install build-essential
> sudo apt -y install gcc-8 g++-8 gcc-9 g++-9
> sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 8 --slave /usr/bin/g++ g++ /usr/bin/g++-8
> sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 9 --slave /usr/bin/g++ g++ /usr/bin/g++-9
> sudo update-alternatives --config gcc
There are 2 choices for the alternative gcc (providing /usr/bin/gcc).
  Selection    Path            Priority   Status
------------------------------------------------------------
* 0            /usr/bin/gcc-9   9         auto mode
  1            /usr/bin/gcc-8   8         manual mode
  2            /usr/bin/gcc-9   9         manual mode
Press <enter> to keep the current choice[*], or type selection number: 0
update-alternatives: using /usr/bin/gcc-9 to provide /usr/bin/gcc (gcc) in auto mode
Install python and python package managers
There are multiple ways how to manage python versions and envs. 
I’ve selected pyenv + pyenv-virtualenv
> sudo apt-get install -y zlib1g-dev libbz2-dev libreadline-dev libssl-dev libsqlite3-dev libffi-dev
> pyenv install 3.8.2
> pyenv virtualenv 3.8.2 torch
> pyenv global torch
> python -V
Python 3.8.2
NVIDIA GPU drivers
Get and run installer.
> sudo bash NVIDIA-Linux-x86_64-440.82.run
ERROR: The Nouveau kernel driver is currently in use by your system.  This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding.  Please consult the NVIDIA
         driver README and your Linux distribution's documentation for details on how to correctly disable the Nouveau kernel driver.
… follow the instructions to blacklist Nouveau kernel driver. Then run:
> sudo update-initramfs -u
> sudo reboot
Retry installation, ignore Xorg warnings and configuration if you are working from command line only.
> sudo bash NVIDIA-Linux-x86_64-440.82.run
Note you still need gcc-9 to install drivers. At the end you should get
Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version: 440.82) is now complete.
Verify installation:
> nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 107...  Off  | 00000000:09:00.0 Off |                  N/A |
|  0%   59C    P0    38W / 180W |      0MiB /  8118MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
CUDA Toolkit and cuDNN
Switch gcc to gcc-8
> sudo update-alternatives --config gcc
There are 2 choices for the alternative gcc (providing /usr/bin/gcc).
  Selection    Path            Priority   Status
------------------------------------------------------------
* 0            /usr/bin/gcc-9   9         auto mode
  1            /usr/bin/gcc-8   8         manual mode
  2            /usr/bin/gcc-9   9         manual mode
Press <enter> to keep the current choice[*], or type selection number: 1
update-alternatives: using /usr/bin/gcc-8 to provide /usr/bin/gcc (gcc) in manual mode
gcc --version
gcc (Ubuntu 8.4.0-3ubuntu2) 8.4.0
Run installer, accept license, deselect Driver
> sudo bash cuda_10.2.89_440.33.01_linux.run
x CUDA Installer                                                               x
x - [ ] Driver                                                                 x
x      [ ] 440.33.01                                                           x
x + [X] CUDA Toolkit 10.2                                                      x
x   [X] CUDA Samples 10.2                                                      x
x   [X] CUDA Demo Suite 10.2                                                   x
x   [X] CUDA Documentation 10.2                                                x
x   Options                                                                    x
x   Install                                                                    x
Please make sure that
 -   PATH includes /usr/local/cuda-10.2/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-10.2/lib64, or, add /usr/local/cuda-10.2/lib64 to /etc/ld.so.conf and run ldconfig as root
Unzip the cuDNN package
> tar -xzvf cudnn-10.2-linux-x64-v7.6.5.32.tgz
Copy the following files into the CUDA Toolkit directory, and change the file permissions.
> sudo cp cuda/include/cudnn.h /usr/local/cuda/include
> sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
> sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
I had to make the following changes:
> rm /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so
> sudo ln -s libcudnn.so.7.6.5 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7
> sudo ln -s libcudnn.so.7 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so
> sudo ldconfig
PyTorch
> pip install --upgrade pip
> pip install torch torchvision
Check GPU support is enabled, and you can access your GPU:
> python -c "from __future__ import print_function; import torch; print(torch.cuda.is_available())"
True
Also to benchmark your GPU you can do the following:
> git clone https://github.com/ryujaehun/pytorch-gpu-benchmark
> pip install psutil cufflinks plotly pandas matplotlib
> cd pytorch-gpu-benchmark/
> python benchmark_models.py
So you can compare your results with some well-known numbers. 
You also can run nvidia-smi in parallel to check GPU load.
> watch nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 107...  Off  | 00000000:09:00.0 Off |                  N/A |
| 39%   63C    P2   179W / 180W |   7999MiB /  8118MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     84404      C   ...ubuntu/.pyenv/versions/torch/bin/python  7989MiB |
+-----------------------------------------------------------------------------
Docker support on Ubuntu 20.04 LTS
There are number of docker packages available
- docker-cepackage from docker.com
- docker.iopackage provided by Canonical
- dockerpackage provided by Red Hat
Docker versions > 19.03 are supported by NVIDIA Container Toolkit.
At time of writing the official Ubuntu’s docker.io is the best option to use. Just run
> sudo apt install docker-compose
Check docker version
> docker version
Client:
 Version:           19.03.8
 API version:       1.40
...
Server:
 Engine:
  Version:          19.03.8
  API version:      1.40 (minimum version 1.12)
...  
NVIDIA Container Toolkit
Add the package repositories
> distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
> curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
> curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
At this point I had to clean up /etc/apt/sources.list.d/ folder.
> cd /etc/apt/sources.list.d/; rm docker.list*
Continue installation
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
Validating installation
Run nvidia-smi in a docker
> docker run --gpus all --rm nvidia/cuda nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 107...  Off  | 00000000:09:00.0 Off |                  N/A |
|  0%   50C    P0    35W / 180W |      0MiB /  8118MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Run PyTorch in a docker
> docker run -it --rm --gpus all pytorch/pytorch python -c "from __future__ import print_function; import torch; print(torch.cuda.is_available())"
True
Run benchmark in a docker
> docker run --gpus all --shm-size=512M -it --rm -v /home/ubuntu/pytorch-gpu-benchmark:/pytorch-gpu-benchmark pytorch/pytorch
(in a docker) > pip install psutil cufflinks plotly pandas matplotlib
(in a docker) > python /pytorch-gpu-benchmark/benchmark_models.py
Fix the path /home/ubuntu/pytorch-gpu-benchmark above to your local folder.
Comments