Tutorial: running the MLPerf inference benchmark and preparing the submission

Click here to see the table of contents.

Tutorial: running the MLPerf inference benchmark and preparing the submission
Introduction
System preparation
The next steps
Authors
Acknowledgments

Introduction

This tutorial briefly explains how to run a modular version of the MLPerf inference benchmark using the cross-platform automation meta-framework (MLCommons CM aka CK2) with a simple GUI and prepare your submission.

Please follow this CM tutorial from the Student Cluster Competition for more details.

If you have questions, encounter issues or have feature requests, please submit them here and feel free to join our open taskforce on automation and reproducibility and Discord discussions.*

System preparation

Minimal system requirements

Device: CPU (x86-64 or Arm64) or GPU (Nvidia)
OS: we have tested CM automations on Ubuntu 20.04, Ubuntu 22.04, Debian 10, Red Hat 9 and MacOS 13
Disk space:
- test runs: minimal preprocessed datasets < ~5GB
- otherwise depends on a task and a dataset. Sometimes require 0.3 .. 3TB
Python: 3.8+
All other dependencies (artifacts and tools) will be installed by the CM meta-framework

CM installation

Follow this guide to install the MLCommons CM framework (CK2) on your system.

After the installation, you should be able to access the CM command line as follows:

$ cm

cm {action} {automation} {artifact(s)} {--flags} @input.yaml @input.json

Pull CM repository with cross-platform MLOps and DevOps scripts

Pull stable MLCommons CM repository with cross-platform CM scripts for modular ML Systems:

cm pull repo mlcommons@ck

CM pulls all such repositories into the $HOME/CM directory to search for CM automations and artifacts. You can find the location of a pulled repository as follows:

cm find repo mlcommons@ck

You can also pull a stable version of this CM repository using some checkout:

cm pull repo mlcommons@ck --checkout=...

You can now use the unified CM CLI/API of reusable and cross-platform CM scripts) to detect or install all artifacts (tools, models, datasets, libraries, etc) required for a given software project (MLPerf inference benchmark in our case).

Conceptually, these scripts take some environment variables and files as an input, perform a cross-platform action (detect artifact, download files, install tools), prepare new environment variables and cache output if needed.

Note that CM can automatically detect or install all dependencies for a given benchmark and run it on a given platform in just one command using a simple JSON or YAML description of dependencies on all required CM scripts.

However, since the goal of this tutorial is to explain you how we modularize MLPerf and any other benchmark, we will show you all individual CM commands to prepare and run the MLPerf inference benchmark. You can reuse these commands in your own projects thus providing a common interface for research projects.

In the end, we will also show you how to run MLPerf benchmark in one command from scratch.

Optional: update CM and repository to the latest version

Note that if you already have CM and mlcommons@ck repository installed on your system, you can update them to the latest version at any time and clean the CM cache as follows:

python3 -m pip install cmind -U
cm pull repo mlcommons@ck --checkout=master
cm rm cache -f

Install system dependencies for your platform

We suggest you to install system dependencies required by the MLPerf inference benchmark using CM (requires SUDO access).

For this purpose, we have created a cross-platform CM script that will automatically install such dependencies based on your OS (Ubuntu, Debian, Red Hat, MacOS ...).

In this case, CM script serves simply as a wrapper with a unified and cross-platform interface for native scripts that you can find and extend here if some dependencies are missing on your machine - this is a collaborative way to make CM scripts portable and interoperable.

You can run this CM scripts as follows (note that you may be asked for a SUDO password on your platform):

cm run script "get sys-utils-cm" --quiet

If you think that you have all system dependencies installed, you can run this script with a --skip flag:

cm run script "get sys-utils-cm" --skip

Use CM to detect or install Python 3.8+

Since we use Python reference implementation of the MLPerf inference benchmark (unoptimized), we need to detect or install Python 3.8+ (MLPerf requirement).

You need to detect it using the following CM script:

cm run script "get python" --version_min=3.8

Install Python virtual environment with above Python

cm run script "install python-venv" --name=mlperf --version_min=3.8

You can change the name of your virtual Python environment using --name flag.

Customize and run the MLPerf inference benchmark

You can use this online GUI to generate CM commands to customize and run the MLPerf inference benchmark. You can select different implementations, models, data sets, frameworks and parameters and then copy/paste the final commands to your shell to run MLPerf.

Alternatively, you can use your own local GUI to run this benchmark as follows:

cm run script --tags=gui \
     --script="app generic mlperf inference" \
     --prefix="gnome-terminal --"

You may just need to substitute gnome-terminal -- with a command line that opens a new shell on your OS.

CM will attempt to automatically detect or download and install the default versions of all required ML components.

Debug the MLPerf benchmark

You can add flag --debug to CM command to let CM stop just before running a given MLPerf benchmark, open a shell and let you run/customize benchmark manually from command line while reusing environment variables and tools prepared by CM.

Customize MLPerf benchmark

Implementations

The community provided a unified CM API for the following implementations of the MLPerf inference benchmark:

Python reference implementation (CPU and CUDA)
- See the current coverage here and please help us test different combinations of models, frameworks and platforms (i.e. collaborative design space exploration)!
Universal C++ implementation (CPU and CUDA)
- Check our community projects to extend this and other implementations.
TFLite C++ implementation (CPU)
Nvidia's implementation (CPU and CUDA)

We are also working on a light-weight universal script to benchmark performance of any ML model with MLPerf loadgen without accuracy.

If you want to add your own implementation or backend, the simplest solution is to create a fork of the MLPerf inference GitHub repo, specify this repo in the above GUI in the fields Git URL for MLPerf inference sources to build LoadGen and Git URL for MLPerf inference sources to run benchmarks and update the CM meta description of our MLPerf wrapper.

Don't hesitate to get in touch with this taksforce to get free help from the community to add your implementation and prepare the submission.

Device

CPU

We have tested out-of-the-box CM automation for the MLPerf inference benchmark across diverse x86-64-based platforms (Intel and AMD) as well as Arm64-based machines from RPi4 to AWS Graviton.

CUDA

As a minimum requirement, you should have CUDA installed. It can be detected using CM as follows:

cm run script "get cuda"

We suggest you to install cuDNN and TensorRT too.

If it's not installed, you can use CM scripts to install them as follows:

cm run script --tags=get,cudnn --tar_file=<PATH_TO_CUDNN_TAR_FILE>

cm run script --tags=get,tensorrt --tar_file=<PATH_TO_TENSORRT_TAR_FILE>

Backend (ML framework)

You can install specific versions of various backends using CM as follows (optional):

Deepsparse

See this PR prepared by the open taskforce during the public hackathon to add Neural Magic's Deepsparse BERT backend for MLPerf to the CM automation.

We currently support BERT large model int 8 targeting CPU only. CUDA may come soon...

ONNX runtime CPU

cm run script "get generic-python-lib _onnxruntime" (--version=...)

ONNX runtime CUDA

cm run script "get generic-python-lib _onnxruntime_gpu" (--version=...)

PyTorch CPU

cm run script "get generic-python-lib _torch" (--version=...)

PyTorch CUDA

cm run script "get generic-python-lib _torch_cuda" (--version=...)

TensorFlow (Python)

cm run script "get generic-python-lib _tensorflow" (--version=...)

TensorFlow from source

cm run script "get tensorflow from-src" (--version=...)

TensorFlow Lite

cm run script "get tensorflow from-src _tflite" (--version=...)

TensorRT

cm run script --tags=get,tensorrt (--tar_file=<PATH_TO_DOWNLOADED_TENSORRT_PACKAGE_FILE>)

TVM ONNX (Python)

cm run script "get generic-python-lib _apache-tvm" (--version=...)

Datasets

ImageNet
Open Images
Squad
Criteo
Kits19
Libris Speech

Power measurements

Please follow this tutorial to run MLPerf with power measurements using CM.

Prepare submission

You can use this online GUI to generate CM commands to run the MLPerf inference benchmark, generate your submission and add your results to a temporal W&B dashboard.

Alternatively, you can use your own local GUI to run this benchmark as follows:

cm run script --tags=gui \
     --script="run mlperf inference generate-run-cmds" \
     --prefix="gnome-terminal --"

The next steps

You are welcome to join the open MLCommons taskforce on automation and reproducibility to contribute to this project and continue optimizing this benchmark and prepare an official submission for MLPerf inference benchmarks with the free help of the community.

See the development roadmap here.

Authors

Grigori Fursin (MLCommons, cTuning foundation, cKnowledge Ltd)
Arjun Suresh (MLCommons, cTuning foundation)

Acknowledgments

We thank Hai Ah Nam, Steve Leak, Vijay Janappa Reddi, Tom Jablin, Ramesh N Chukka, Peter Mattson, David Kanter, Pablo Gonzalez Mesa, Thomas Zhu, Thomas Schmid and Gaurav Verma for their suggestions and contributions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mlperf-inference-submission.md

mlperf-inference-submission.md

Tutorial: running the MLPerf inference benchmark and preparing the submission

Introduction

System preparation

Minimal system requirements

CM installation

Pull CM repository with cross-platform MLOps and DevOps scripts

Optional: update CM and repository to the latest version

Install system dependencies for your platform

Use CM to detect or install Python 3.8+

Install Python virtual environment with above Python

Customize and run the MLPerf inference benchmark

Debug the MLPerf benchmark

Customize MLPerf benchmark

Implementations

Device

CPU

CUDA

Backend (ML framework)

Deepsparse

ONNX runtime CPU

ONNX runtime CUDA

PyTorch CPU

PyTorch CUDA

TensorFlow (Python)

TensorFlow from source

TensorFlow Lite

TensorRT

TVM ONNX (Python)

Datasets

Power measurements

Prepare submission

The next steps

Authors

Acknowledgments

Files

mlperf-inference-submission.md

Latest commit

History

mlperf-inference-submission.md

File metadata and controls

Tutorial: running the MLPerf inference benchmark and preparing the submission

Introduction

System preparation

Minimal system requirements

CM installation

Pull CM repository with cross-platform MLOps and DevOps scripts

Optional: update CM and repository to the latest version

Install system dependencies for your platform

Use CM to detect or install Python 3.8+

Install Python virtual environment with above Python

Customize and run the MLPerf inference benchmark

Debug the MLPerf benchmark

Customize MLPerf benchmark

Implementations

Device

CPU

CUDA

Backend (ML framework)

Deepsparse

ONNX runtime CPU

ONNX runtime CUDA

PyTorch CPU

PyTorch CUDA

TensorFlow (Python)

TensorFlow from source

TensorFlow Lite

TensorRT

TVM ONNX (Python)

Datasets

Power measurements

Prepare submission

The next steps

Authors

Acknowledgments