Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance considerations of generic_tracer_min_max #681

Open
marshallward opened this issue Jul 9, 2024 · 0 comments
Open

Performance considerations of generic_tracer_min_max #681

marshallward opened this issue Jul 9, 2024 · 0 comments

Comments

@marshallward
Copy link
Member

The recent update to generic_tracer_min_max in #615 resolved several issues with this function, such as improved accuracy and elimination of "fuzz" factors, as well as improved reproducibility of extrema locations when dimensional scaling and rotational debugging are enabled. It also improved the overall performance by reducing the number of global reduction min/max operations across PEs.

Some of these changes were implemented by replacing the optimized minloc and maxloc intrinsics with explicit loops, which may show worse performance. Further issues could arise from the ijk_loc function calls inside of conditional if-blocks. Even if infrequent, such checks must be considered each iteration and could further impede optimization.

Some of these issues could be addressed by (conditionally) pre-rotating the block, using max/minloc(), and computing the global index of the final result. Precomputing valid_PEs and premasking some operations may also allow for additional speedup.

This is not a priority, since this function is not currently used in any production runs. But it is possible that external projects could someday depend on it, so we may want to come back around to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant