Week 1: Introduction

Lecture: link
Seminar + bonus home assignment: link

Further reading

CUDA Programming Guide and CUDA C++ Best Practices Guide
How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog
PyTorch Performance Tuning Guide
Earlier version of this guide from NVIDIA
Docs for caching memory allocation in PyTorch
Overview of timeit for microbenchmarking
PyTorch Benchmark tutorial
Links on floating point precision in different libraries and environments: 1 2
On threading in PyTorch