Name		Name	Last commit message	Last commit date
parent directory ..
CMakeLists.txt		CMakeLists.txt
Makefile		Makefile
README.md		README.md
aie2.py		aie2.py
run_makefile.lit		run_makefile.lit
test.cpp		test.cpp

README.md

ReLU

ReLU, which stands for Rectified Linear Unit, is a type of activation function that is widely used in neural networks, particularly in deep learning models. It is defined mathematically as: $ReLU(x) = max(0,x)$

This function takes a single number as input and outputs the maximum of zero and the input number. Essentially, it passes positive values through unchanged, and clamps all the negative values to zero. Please refer to conv2d_fused_relu design on fusing ReLU with convolution.

Key Characteristics of ReLU:

Non-linear: While it looks like a linear function, ReLU introduces non-linearity into the model, which is essential for learning complex patterns in data.
Computational Efficiency: One of ReLU's biggest advantages is its computational simplicity. Unlike other activation functions like sigmoid or tanh, ReLU does not involve expensive operations (e.g., exponentials), which makes it computationally efficient and speeds up the training and inference processes.

This design implements a bfloat16 based ReLU on a vector, performed in parallel on two cores in a single column. This will end up being I/O bound due to the low compute intensity, and in a practical ML implementation, is an example of the type of kernel that is likely best fused onto another more compute-dense kernel (e.g., a convolution or GEMM).

Source Files Overview

aie2.py: A Python script that defines the AIE array structural design using MLIR-AIE operations. This generates MLIR that is then compiled using aiecc.py to produce design binaries (i.e., XCLBIN and inst.txt for the NPU in Ryzen™ AI).
relu.cc: A C++ implementation of a vectorized ReLU operation for AIE cores, which is a 1:1 implementation of the inherent function using low-level intrinsics. The AIE2 allows an element-wise max of 32 bfloat16 numbers against a second vector register containing all zeros, implementing the $ReLU(x) = max(0,x)$ function directly. The source can be found here.
test.cpp: This C++ code is a testbench for the design example. The code is responsible for loading the compiled XCLBIN file, configuring the AIE module, providing input data, and executing the AIE design on the NPU. After executing, the script verifies the memcpy results and optionally outputs trace data.

Usage

C++ Testbench

To compile the design and C++ testbench:

make

To run the design:

make run

To generate a trace file:

make trace

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

relu

relu

README.md

ReLU

Key Characteristics of ReLU:

Source Files Overview

Usage

C++ Testbench

Files

relu

Directory actions

More options

Directory actions

More options

Latest commit

History

relu

Folders and files

parent directory

README.md

ReLU

Key Characteristics of ReLU:

Source Files Overview

Usage

C++ Testbench