Skip to content

Estimate the absolute performance of a piece of Julia code

License

Notifications You must be signed in to change notification settings

triscale-innov/GFlops.jl

Repository files navigation

GFlops.jl

Lifecycle Build Status Coverage

When code performance is an issue, it is sometimes useful to get absolute performance measurements in order to objectivise what is "slow" or "fast". GFlops.jl leverages the power of Cassette.jl to automatically count the number of floating-point operations in a piece of code. When combined with the accuracy of BenchmarkTools, this allows for easy and absolute performance measurements.

Installation

This package is registered and can therefore be simply be installed with

pkg> add GFlops

Example use

This simple example shows how to track the number of operations in a vector summation:

julia> using GFlops

julia> x = rand(1000);

julia> @count_ops sum($x)
Flop Counter: 999 flop
┌─────┬─────────┐
│     │ Float64 │
├─────┼─────────┤
│ add │     999 │
└─────┴─────────┘

julia> @gflops sum($x);
  8.86 GFlops,  12.76% peak  (9.99e+02 flop, 1.13e-07 s, 0 alloc: 0 bytes)

GFlops.jl internally tracks several types of Floating-Point operations, for both 32-bit and 64-bit operands. Pretty-printing a Flop Counter only shows non-zero entries, but any individual counter can be accessed:

julia> function mixed_dot(x, y)
           acc = 0.0
           @inbounds @simd for i in eachindex(x, y)
               acc += x[i] * y[i]
           end
           acc
       end
mixed_dot (generic function with 1 method)

julia> x = rand(Float32, 1000); y = rand(Float32, 1000);

julia> cnt = @count_ops mixed_dot($x, $y)
Flop Counter: 1000 flop
┌─────┬─────────┬─────────┐
│     │ Float32 │ Float64 │
├─────┼─────────┼─────────┤
│ add │       01000 │
│ mul │    10000 │
└─────┴─────────┴─────────┘

julia> fieldnames(GFlops.Counter)
(:fma32, :fma64, :muladd32, :muladd64, :add32, :add64, :sub32, ...)

julia> cnt.add64
1000

julia> @gflops mixed_dot($x, $y);
  9.91 GFlops,  13.36% peak  (2.00e+03 flop, 2.02e-07 s, 0 alloc: 0 bytes)

Caveats

Fused Multiplication and Addition: FMA & MulAdd

On systems which support them, FMAs and MulAdds compute two operations (an addition and a multiplication) in one instruction. @count_ops counts each individual FMA/MulAdd as one operation, which makes it easier to interpret counters. However, @gflops will count two floating-point operations for each FMA, in accordance to the way high-performance benchmarks usually behave:

julia> x = 0.5; coeffs = rand(10);

# 9 MulAdds but 18 flop
julia> cnt = @count_ops evalpoly($x, $coeffs)
Flop Counter: 18 flop
┌────────┬─────────┐
│        │ Float64 │
├────────┼─────────┤
│ muladd │       9 │
└────────┴─────────┘

julia> @gflops evalpoly($x, $coeffs);
  0.87 GFlops,  1.63% peak  (1.80e+01 flop, 2.06e-08 s, 0 alloc: 0 bytes)

Non-julia code

GFlops.jl does not see what happens outside the realm of Julia code. It especially does not see operations performed in external libraries such as BLAS calls:

julia> using LinearAlgebra

julia> @count_ops dot($x, $y)
Flop Counter: 0 flop

This is a known issue; we'll try and find a way to circumvent the problem.

About

Estimate the absolute performance of a piece of Julia code

Resources

License

Stars

Watchers

Forks

Packages

No packages published