PyTorch Drag Race | Tesla K80 (24GB VRAM) Performance

Greetings!

As my work to support PyTorch equipped researchers continues I started working on a benchmark to compare hardware for our researchers. This benchmark is intended to help researchers make informed decisions about hardware for development purposes. I hope that this benchmark will be useful to others and will provide answers to any questions they may have about the process of comparing hardware.

This code was not used in production and is an external development using non corporate hardware. 🙂

tesla K80 nvidia-smi output during the benchmark

What is The Tesla K80 (24GB VRAM)?

The NVIDIA TESLA K80 is a graphics processing unit (GPU) that was specifically designed for use in data centers and other high-performance computing (HPC) environments. It was released in 2014 and is no longer actively sold by NVIDIA. Currently, it can be found on eBay, at the time of writing, for as low as $70 USD. With a total of 24GB of VRAM, it can be relatively beneficial for large VRAM datasets.

The TESLA K80 is based on the NVIDIA Kepler architecture and features two graphics processing clusters (GPCs) with a total of 24 streaming multiprocessors (SMX). It has a total of 4,992 CUDA cores and is capable of performing 5.6 trillion floating point operations per second (TFLOPS) in double precision and 11.5 TFLOPS in single precision.

The TESLA K80 is well-suited for tasks such as scientific simulations, data analytics, and machine learning, as it is designed to deliver high performance and energy efficiency in these types of workloads. For a homelab it’s more than enough power to play with, but I don’t think it’s a fantastic choice for huge production workloads in 2022.

The Script

zveroboy152/zbc-pytorch-benchmarker: This repo contains the code to benchmark and compare pytorch calcuation runs. (github.com)

The Explanation

In this script, the benchmark is performed by running the model on some input data multiple times and averaging the elapsed time. The elapsed time is calculated by measuring the time before and after running the model on the input data and taking the difference between these two times.

The number of times the model is run is determined by the num_runs variable, which is set to 100000 in this script. The elapsed time for each run is accumulated in the total_time variable. At the end, the average elapsed time is calculated by dividing the total elapsed time by the number of runs.

To measure the elapsed time, the script uses the time module, which provides functions for measuring time in Python. The time.time function returns the current time in seconds since the epoch (the epoch is a predefined point in time, typically the beginning of the year 1970).

To run the model on the input data, the script calls the forward method of the model, passing it the input data as an argument. The forward method defines the forward pass of the model, which takes the input data and passes it through the convolutional layers and activation functions before returning the output.

To wrap the range object, the script uses the tqdm module, which provides a convenient way to display a progress bar for loops. The tqdm function takes an iterable object (such as a list or range object) as an argument and returns an iterator that displays a progress bar as the loop iterates over the object. This can be helpful for keeping track of the progress of the benchmark and providing some indication of how long it will take to complete.

The $70 Tesla K80 vs the RTX 3070 TI

Lastly, the numbers. What did we get with the Tesla K80?

Our K80 we got the result of: *

100000/100000 [ 2:34 648.19it/s] | Average elapsed time: 0.0015376823139190675

*However, this is with only 1 of the GPUs cores of the 2 total

However, my RTX 3070ti (EVGA FTW) have the result of:

100000/100000 [ 1:09 1445.76it/s] | Average elapsed time: 0.0006857663035392762

The Results

In 2022, the Tesla K80 remains a powerful graphics processing unit (GPU) with a large amount of video RAM (VRAM) for its price. If your PyTorch model is designed to take advantage of multiple GPUs and scales well across them, it can provide performance comparable to more expensive GPUs. Overall, the Tesla K80 is a good value for those looking to boost the speed of their machine learning model training.

If your PyTorch model is designed to utilize multiple GPUs and scales well across them, it can provide performance comparable to other models at a low price point.