The goal of this assignment is to get a first grasp of performance-oriented programming. Furthermore, you will be using the LCC3 cluster, cachegrind
, and perf
for the first time in this lab.
Maybe you have already heard of Fractals or Mandelbrot. Fractals are geometric patterns that appear to be infinitely detailed. There are many functions that can generate such patterns. The more often you evaluate these functions, the more detailed the pattern becomes.
This task will focus on the Mandelbrot set, a set of complex numbers that is generated by the function zn+1=zn2+c, where z is a complex number and c is a constant.
Short reminder: a complex number z is the sum of a real part x and an imaginary part y that includes the imaginary unit i with the property (i2=-1) → z=x+yi.
Since the exact math behind this is not that important to us, we will provide you with a pseudocode:
function calc_mandelbrot(image):
for each (X,Y) do:
x = 0.0
y = 0.0
cx = mapped x_pixel_idx to Mandelbrot x-axis [-2.5, 1]
cy = mapped y_pixel_idx to Mandelbrot y-axis [-1, 1]
iteration = 0
while (x*x + y*y <= 2*2 AND iteration < MAX_ITER) do:
x_tmp = x*x - y*y + cx
y = 2*x*y + cy
x = x_tmp
iteration = iteration + 1
norm_iteration = mapped iteration to pixel range [0, 255]
image[y_pixel][x_pixel] = norm_iteration
- Open the
mandelbrot.c
file and implement the sequentialcalc_mandelbrot
function with the help of the provided pseudocode. - Check out the generated image
mandelbrot.png
to see if you implemented the algorithm correctly. - Benchmark your program on the LCC3 cluster, document your results and add them to the comparison spreadsheet linked on Discord. How would you improve program performance?
- Can you think of a way to parallelize this algorithm?
The Hadamard product is defined as the element-wise product of two matrices with the same dimensions. The following two snippets give an implementation of the Hadamard product for n x n square matrices.
for (size_t j = 0; j < n; ++j) {
for (size_t i = 0; i < n; ++i) {
c[i][j] = a[i][j] * b[i][j];
}
}
for (size_t i = 0; i < n; ++i) {
for (size_t j = 0; j < n; ++j) {
c[i][j] = a[i][j] * b[i][j];
}
}
- For both implementations, give a function f to calculate the number of data cache read misses for the matrix size n and the cache line size s in an 8-way set-associative cache. Assume that all variables are initialized appropriately (the matrix elements are of type
int32_t
) and that the matrices are stored in contiguous memory in row-major order. Additionally, the matrices are too large to store them entirely in the cache (n >> s). - Use the two snippets to implement two versions of the Hadamard product.
- Log into the LCC3 cluster and analyze the cache behavior of the implementations using
cachegrind
(e.g.valgrind --tool=cachegrind <your program>
) andperf
(e.g.perf stat -e LLC-load-misses -e LLC-store-misses <your program>
) Can you validate your theoretical findings? Compare the results of both tools.
All the material required by the tasks above (e.g., code, figures, text, etc...) must be part of the solution that is handed in. Your experiments should be reproducible and comparable to your measurements using the solution materials that you hand in.
Every member of your group must be able to explain the given problem, your solution, and possible findings. You may also need to answer detailed questions about any of these aspects.