Recently I’ve been tasked with working on a machine learning environment using ROCM, and AMD.

I’ve never done this before… to the lab!

The Intro

I was able to purchase an AMD MI25 2nd hand card for 100$ USD (as of this writing). This device has plenty of horsepower for ML / AI lab work. This was a pretty interesting setup and was a journey to test and configure. Hopefully this helps someone. 🙂

This MI25 was installed in a Dell R720, and before you comment no this is not supported at ALL. I used BOTH 8 pin PCIE pins from the PCIE risers GPU Power outputs and plugged them into one card. I do NOT recommend doing this, but it worked for me during this testing.

My specs of the Dell R720 I used were:

2x E5-2640v2 | 128GB RAM | 2x 1100W PSUs | 2x 146GB R1 15k SAS HDD + 1.6TB SAS SSD | H710

The Card

What is the MI25 from AMD?

The AMD Radeon Instinct MI25 is a professional graphics card that is designed for use in machine learning and artificial intelligence applications. It is based on the Vega architecture and features 25 teraflops of half-precision floating point performance, which makes it well-suited for tasks such as training deep neural networks. The MI25 also has 16GB of HBM2 memory, which provides fast access to data needed for machine learning algorithms. It is intended to be used in servers and other high-performance computing systems and can be used in a variety of applications including image recognition, natural language processing, and video analytics. At $100 USD on the used market, it can be a great entry point for a HomeLab that wants a high memory capable card to their compute stack.

The Driver

First, I installed the ROCM drivers as specified in the documentation for my distribution:

(Ubuntu 20.04 LTS HWE)

Drivers & Source Material:

Instinct™ MI25 Drivers & Support | AMD

#Get the Driver
wget https://repo.radeon.com/amdgpu-install/22.20/ubuntu/focal/amdgpu-install_22.20.50200-1_all.deb

#Install the driver
sudo apt-get install ./amdgpu-install_22.20.50200-1_all.deb

#Bootstrap the driver with the required profile
amdgpu-install –accept-eula –usecase=workstation -y –vulkan=pro –opencl=rocr,legacy

#Reboot, yes you MUST reboot for this to apply
sudo reboot now

The Software Stack

Next, for my use case I installed Docker. The AMD ROCM team has a precompiled ROCM Docker Ubuntu container that can run ROCM workloads pre-setup. This worked for my use case.

Software & Source Material:

Enhance your ML research with AMD ROCm™ 5.1 and Py… – AMD Community

First, install docker:

sudo apt install docker.io -y

Obtain a base docker image with the correct user-space ROCm version installed from  https://hub.docker.com/r/rocm/dev-ubuntu-20.04 or download a base OS docker image and install ROCm following the installation directions. In this example, ROCm 5.1.1 is installed, as supported by the installation matrix from the pytorch.org website.

docker pull rocm/dev-ubuntu-20.04:5.1.1

Start the Docker container.

docker run -it –device=/dev/kfd –device=/dev/dri –group-add video rocm/dev-ubuntu-20.04:5.1.1

 Install any dependencies needed for installing the wheels inside the docker container.

apt update -y  && apt install libjpeg-dev python3-dev nano git curl wget htop

pip3 install torch torchvision torchaudio –extra-index-url https://download.pytorch.org/whl/rocm5.2/

Lastly, open a python3 terminal, and verify ROCM / pyTorch GPU acceleration.

>>> import torch

>>> torch.cuda.is_available()

True

>> torch.cuda.device_count()

1

>>> torch.cuda.current_device()

0

>>> torch.cuda.device(0)

<torch.cuda.device at 0x7efce0b03be0>

>>> torch.cuda.get_device_name(0)

‘AMD MI25 Instinct’


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *