Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add View of Views debugging tool #267

Draft
wants to merge 9 commits into
base: develop
Choose a base branch
from

Conversation

dalg24
Copy link
Member

@dalg24 dalg24 commented Aug 6, 2024

No description provided.

Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some short-term use for a specific View-of-Views debugging tool but I think that extending the scope and detecting any kind of fences in parallel constructs or nested (non-Team) parallel constructs would be a good idea.

tests/vov-bug-finder/test_view_of_views_bug_finder.cpp Outdated Show resolved Hide resolved
Copy link

@diehlpk diehlpk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Please fix the typo.

@masterleinad
Copy link
Contributor

CI is failing with

/__w/kokkos-tools/kokkos-tools/tests/vov-bug-finder/test_view_of_views_bug_finder.cpp(67): error: The enclosing parent function ("TestBody") for an extended __host__ __device__ lambda cannot have private or protected access within its class
     __attribute__((host)) __attribute__((device))(int) { new (&vov(0, 0)) V(a); new (&vov(0, 1)) V(a); new (&vov(1, 0)) V(b); }); })
                                                  ^

/__w/kokkos-tools/kokkos-tools/tests/vov-bug-finder/test_view_of_views_bug_finder.cpp(82): error: The enclosing parent function ("TestBody") for an extended __host__ __device__ lambda cannot have private or protected access within its class
     __attribute__((host)) __attribute__((device))(int) { V b("b", 5); new (&vov(1, 0)) V(b); }); })
                                                  ^

@dalg24
Copy link
Member Author

dalg24 commented Aug 6, 2024

I see some short-term use for a specific View-of-Views debugging tool but I think that extending the scope and detecting any kind of fences in parallel constructs or nested (non-Team) parallel constructs would be a good idea.

Like a "sanitizer" tool with a view of views check?

@masterleinad
Copy link
Contributor

Like a "sanitizer" tool with a view of views check?

I was thinking about not just covering kernels that have Kokkos::View::initialization in their label but always diagnose if the stack isn't empty in begin_parallel_*. I don't think the error message would be less helpful for the special case of Views of VIews. Instead of

constructing view "my_view" within a parallel region "my_region"

we would print

calling parallel_for "Kokkos::View::initialization [my_view]" within a parallel region "my_region"

@dalg24
Copy link
Member Author

dalg24 commented Aug 6, 2024

That would produce lower quality error messages

@masterleinad
Copy link
Contributor

That would produce lower quality error messages

You could still special case for those particular kernels but my point is that we can diagnose more problems fairly easily.

Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind have a tool which does not too many things at the same time, so I do not thing we need to expand on the functionality necessarily, however I do believe we need to fix the issue with the tool getting hung up on internal allocations such as Team Scratch stuff.

E.g.:

#include<Kokkos_Core.hpp>

int main(int argc, char* argv[]) {
  Kokkos::initialize(argc, argv);
  {
    Kokkos::parallel_for(Kokkos::TeamPolicy<>(1,Kokkos::AUTO).set_scratch_size(0, Kokkos::PerTeam(1000)),
      KOKKOS_LAMBDA(const typename Kokkos::TeamPolicy<>::member_type& team) {
    });
  }
  Kokkos::finalize();
}

will abort in the tool with stuff like this:

allocating "Kokkos::Serial::scratch_mem" within parallel region "Z4mainE3$_0"
Abort trap: 6

or:

deallocating "Kokkos::thread_scratch" within parallel region "Z4mainE3$_0"
Abort trap: 6

Another question is support for the monolithic tools library. This tool does not do that right now, but we I think it probably should.

Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work with Threads backend and I didn't check the other ones. I would just filter all "Kokkos::"

debugging/vov-bug-finder/kp_view_of_views_bug_finder.cpp Outdated Show resolved Hide resolved
}));
}

// TODO initialize in main and split unit tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to do this, but it seems not so easy. I'll open an issue for it, so we can move the discussion there.

"AllocatesInParallel]For",
Kokkos::RangePolicy<Kokkos::DefaultHostExecutionSpace>(0, 1),
KOKKOS_LAMBDA(int) {
V b("b", 5);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a Cuda build, the compiler is complaining about this line that:

warning #20011-D: calling a __host__  function ... is not allowed.

The reason seems to be that we end up in a constructor that isn't marked with Kokkos markup. One option would be to desactivate this part of the test in builds with a device backend. I'm not sure whether there's a better solution to this problem.

Comment on lines +107 to +118
ASSERT_DEATH(({
using V = Kokkos::View<int *>;
Kokkos::View<V **, Kokkos::HostSpace> vov("vo]v", 2, 3);
// ^ included a closing square bracket in the label to try
// to trip the substring extraction
V a("a", 4);
V b("b", 5);
vov(0, 0) = a;
vov(0, 1) = a;
vov(1, 0) = b;
}),
"view of views \"vo]v\" not properly cleared");
Copy link
Contributor

@maartenarnst maartenarnst Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that in my OpenMP build, I see this death test hang when I run with more than 1 omp thread. (Christian had noted that team tests sometimes hang, but I'm not sure if it's related.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants