You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The recent update to generic_tracer_min_max in #615 resolved several issues with this function, such as improved accuracy and elimination of "fuzz" factors, as well as improved reproducibility of extrema locations when dimensional scaling and rotational debugging are enabled. It also improved the overall performance by reducing the number of global reduction min/max operations across PEs.
Some of these changes were implemented by replacing the optimized minloc and maxloc intrinsics with explicit loops, which may show worse performance. Further issues could arise from the ijk_loc function calls inside of conditional if-blocks. Even if infrequent, such checks must be considered each iteration and could further impede optimization.
Some of these issues could be addressed by (conditionally) pre-rotating the block, using max/minloc(), and computing the global index of the final result. Precomputing valid_PEs and premasking some operations may also allow for additional speedup.
This is not a priority, since this function is not currently used in any production runs. But it is possible that external projects could someday depend on it, so we may want to come back around to this.
The text was updated successfully, but these errors were encountered:
The recent update to
generic_tracer_min_max
in #615 resolved several issues with this function, such as improved accuracy and elimination of "fuzz" factors, as well as improved reproducibility of extrema locations when dimensional scaling and rotational debugging are enabled. It also improved the overall performance by reducing the number of global reduction min/max operations across PEs.Some of these changes were implemented by replacing the optimized
minloc
andmaxloc
intrinsics with explicit loops, which may show worse performance. Further issues could arise from theijk_loc
function calls inside of conditional if-blocks. Even if infrequent, such checks must be considered each iteration and could further impede optimization.Some of these issues could be addressed by (conditionally) pre-rotating the block, using
max/minloc()
, and computing the global index of the final result. Precomputingvalid_PEs
and premasking some operations may also allow for additional speedup.This is not a priority, since this function is not currently used in any production runs. But it is possible that external projects could someday depend on it, so we may want to come back around to this.
The text was updated successfully, but these errors were encountered: