🎉 Modern CUDA Learn Notes with PyTorch: CUDA Cores, Tensor Cores, fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, hgemm, sgemv, warp/block reduce, elementwise, softmax, layernorm, rmsnorm.
cuda
pytorch
triton
gemm
softmax
cuda-programming
layernorm
gemv
elementwise
rmsnorm
flash-attention
flash-attention-2
warp-reduce
block-reduce
flash-attention-3
-
Updated
Nov 5, 2024 - Cuda