Skip to content

Latest commit

 

History

History
87 lines (61 loc) · 4.78 KB

README.md

File metadata and controls

87 lines (61 loc) · 4.78 KB

Assignment 2 for March 29th, 2022

The goal of this assignment is to get a first grasp of performance-oriented programming. Furthermore, you will be using the LCC2 cluster, cachegrind, and perf for the first time in this lab.

Exercise 1 (1 Point)

Description

Maybe you have already heard of Fractals or Mandelbrot. Fractals are geometric patterns that appear to be infinitely detailed. There are many functions that can generate such patterns. The more often you evaluate these functions, the more detailed the pattern becomes.

This task will focus on the Mandelbrot set, a set of complex numbers that is generated by the function zn+1=zn2+c, where z is a complex number and c is a constant.

Short reminder: a complex number z is the sum of a real part x and an imaginary part y that includes the imaginary unit i with the property (i2=-1) → z=x+yi.

Since the exact math behind this is not that important to us, we will provide you with a pseudocode:

function calc_mandelbrot(image): 
    for each (X,Y) do:
        x = 0.0
        y = 0.0
        cx = mapped x_pixel_idx to Mandelbrot x-axis [-2.5, 1]
        cy = mapped y_pixel_idx to Mandelbrot y-axis [-1, 1]
        iteration = 0
        while (x*x + y*y <= 2*2 AND iteration < MAX_ITER) do:
            x_tmp = x*x - y*y + cx
            y = 2*x*y + cy
            x = x_tmp
            iteration = iteration + 1

        norm_iteration = mapped iteration to pixel range [0, 255]
        image[y_pixel][x_pixel] = norm_iteration

Tasks

  • Open the mandelbrot.c file and implement the sequential calc_mandelbrot function with the help of the provided pseudocode.
  • Check out the generated image mandelbrot.png to see if you implemented the algorithm correctly.
  • Benchmark your program on the LCC2 cluster, document your results and add them to the comparison spreadsheet linked on Discord. How would you improve program performance?
  • Can you think of a way to parallelize this algorithm?

Exercise 2 (1 Point)

Description

The Hadamard product is defined as the element-wise product of two matrices with the same dimensions. The following two snippets give an implementation of the Hadamard product for n x n square matrices.

for (size_t i = 0; i < n; ++i) {
    for (size_t j = 0; j < n; ++j) {
        c[i][j] = a[i][j] * b[i][j];
    }
}
for (size_t j = 0; j < n; ++j) {
    for (size_t i = 0; i < n; ++i) {
        c[i][j] = a[i][j] * b[i][j];
    }
}

Tasks

  • For both implementations, give a function f to calculate the number of data cache read misses for the matrix size n and the cache line size s in an 8-way set-associative cache. Assume that all variables are initialized appropriately (the matrix elements are of type int32_t) and that the matrices are stored in contiguous memory in row-major order. Additionally, the matrices are too large to store them entirely in the cache (n >> s).
  • Use the two snippets to implement two versions of the Hadamard product.
  • Log into the LCC2 cluster and analyze the cache behavior of the implementations using cachegrind and perf. Can you validate your theoretical findings? Compare the results of both tools.

Exercise 3 (1 Point)

Description

There are several methods to approximate Pi numerically. In this exercise, you are asked to approximate Pi using the Monte Carlo method. Therefore, n (n >> 1) random 2D points equally distributed in the interval [0, 1] x [0, 1] are generated, and it is checked whether they are inside the first quadrant of a unit circle. Let i be the number of points that lie inside the quarter of the unit circle. Then Pi can be approximated by 4 * i / n.

Tasks

  • Implement a serial version of Monte Carlo Pi approximation.
  • Implement a parallel version of Monte Carlo Pi approximation using POSIX Threads Library (pthread).
  • Benchmark the sequential version and the parallel version for 1 to 8 threads on LCC2 using n = 500,000,000. Document your results and visualize them in appropriate figures. How could you improve program performance?
  • Add your measured execution times for 1 and 8 threads on LCC2 to the comparison spreadsheet linked on Discord.

General Notes

All the material required by the tasks above (e.g., code, figures, text, etc...) must be part of the solution that is handed in. Your experiments should be reproducible and comparable to your measurements using the solution materials that you hand in.

Every member of your group must be able to explain the given problem, your solution, and possible findings. You may also need to answer detailed questions about any of these aspects.