You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be extremely useful to provide benchmark utilities, ideally in GFlop/s TFlop/s to compare with other frameworks, compare with the CPU peak theoretical throughput and also linpack.
The formula for MxK multiplied by KxN matrices is:
total required operations: M*K*N*2 2 for 1mul and 1add
divided by time taken
Additionally you might want to check the required data to derive arithmetic intensity for the roofline model:
constCpuGhz=3.5# i9-9980XE OC All turbo 4.1GHz (AVX2 4.0GHz, AVX512 3.5GHz)NumCpuCores=18VectorWidth=16# 8 float32 for AVX2, 16 for AVX512InstrCycle=2# How many instructions per cycle, (2xFMAs or 1xFMA for example)FlopInstr=2# How many FLOP per instr (FMAs = 1 add + 1 mul)TheoSerialPeak*=CpuGhz*VectorWidth*InstrCycle*FlopInstrTheoThreadedPeak*=TheoSerialPeak*NumCpuCores
FYI, you might be interested in my own research in cache utilization tuning, though skimming a bit I see that you tuned at the cache associativity-level while I used some heuristics:
Hello fellow gemm optimizer enthusiast,
It would be extremely useful to provide benchmark utilities, ideally in GFlop/s TFlop/s to compare with other frameworks, compare with the CPU peak theoretical throughput and also linpack.
The formula for MxK multiplied by KxN matrices is:
M*K*N*2
2 for 1mul and 1addAdditionally you might want to check the required data to derive arithmetic intensity for the roofline model:
M*K+K*N
And finally you might also want to check your theoretical peak like: https://github.com/mratsim/weave/blob/b6255af/benchmarks/matmul_gemm_blas/gemm_bench_config.nim#L5-L18
FYI, you might be interested in my own research in cache utilization tuning, though skimming a bit I see that you tuned at the cache associativity-level while I used some heuristics:
Benchmarks in my own implementation+OpenMP and OpenBLAS/MKL and MKL-DNN (Latest oneDNN was too entangled to extract the relevant GEMM primitives):
https://github.com/mratsim/laser/blob/d310294/benchmarks/gemm/gemm_bench_float32.nim#L374
Nim must be installed, and OpenBLAS or MKL and then (the submodule will download MKL-DNN)
Benchmarks with my own multithreading runtime (instead of OpenMP)
https://github.com/mratsim/weave/blob/b6255af/benchmarks/matmul_gemm_blas/all_gemm.nim
Nim must be installed, and OpenBLAS or MKL and then (the submodule will download MKL-DNN)
The text was updated successfully, but these errors were encountered: