Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to be more threadsafe with EOSPAC #451

Open
jhp-lanl opened this issue Jan 2, 2025 · 7 comments
Open

Try to be more threadsafe with EOSPAC #451

jhp-lanl opened this issue Jan 2, 2025 · 7 comments

Comments

@jhp-lanl
Copy link
Collaborator

jhp-lanl commented Jan 2, 2025

I think we've all been trying to develop around the fact that EOSPAC is not thread-safe, but I think maybe we should try to codify some development principles to try to make our EOSPAC classes approach something more like thread safety.

The main issue (as I see it) is that EOSPAC table options can be applied after the class has been initialized. I would propose forcing all EOSPAC table options to be applied only at initialization.

The biggest consequence of this would be that the Transform struct would not be able to be passed to the member functions. If a different transformation was desired, we'd need to create a new EOSPAC class with those transformations applied.

There might be memory usage implications, but I think the EOSPAC memory usage model might try to not load data twice if it can. This hypothesis is worth testing though.

Otherwise, I think this would help with reasoning about how EOSPAC class copying works. It might also reduce the hoops needed for the transformations to be applied.

@rbberger since you wrote this code initially, do you have thoughts? @jonahm-LANL your thoughts are also valuable here

@Yurlungur
Copy link
Collaborator

I think this would prevent us from using modifiers of eospac the way we currently do, full stop. Right now some of the modifiers for eospac go through eospac options. This design would force us to instead use scratch memory for them always.

@Yurlungur
Copy link
Collaborator

However, as I understand it, the calls that change eospac table options are applied before and after the call. This is a race condition in the case when multiple EOSPAC calls are made on CPU in multiple threads. Is that the main concern here? Or are you concerned about class copying?

@jhp-lanl
Copy link
Collaborator Author

jhp-lanl commented Jan 2, 2025

Really I was concerned about class copying, but the race condition is also a problem.

And yes, I didn't think about the modifiers at first but I'm thinking about them now. Is there a way to re-initialize the EOSPAC class when modifiers are applied? That would allow us to bake the transform into the modifier class and have that modifier class be tied to a specific EOSPAC table handle.

@Yurlungur
Copy link
Collaborator

Really I was concerned about class copying, but the race condition is also a problem.

Is there a way to re-initialize the EOSPAC class when modifiers are applied? That would allow us to bake the transform into the modifier class and have that modifier class be tied to a specific EOSPAC table handle.

I don't know... That might be a question for David. Can one create new table handles? One issue with that might be that it would need to be done before any shared memory or device-copying is done, as that is a one-and-done in the EOSPAC model.

Of course, we could also get around this by simply not short-circuiting singularity when doing modifiers. If all modifiers used scratch memory for vector calls, the problem would also go away. Though I don't know if we want to do that.

I think this may merit a longer discussion, as the issue with modifiers may not be a concern if we only care about copying, as the options applied to EOSPAC are applied before and then undone after each call. So it's safe if the calls are made in serial from the host.

On the other hand, if you're imagining us calling EOSPAC objects from multiple POSIX threads (as opposed to GPU threads) at once, then we're in trouble.

@jhp-lanl
Copy link
Collaborator Author

jhp-lanl commented Jan 2, 2025

On the other hand, if you're imagining us calling EOSPAC objects from multiple POSIX threads (as opposed to GPU threads) at once, then we're in trouble.

Why aren't we already in trouble for GPU threads?

@Yurlungur
Copy link
Collaborator

Ah, we are. But if we ask EOSPAC to manage the threads rather than be called from inside one (which is they're model) then we're safe.

@jhp-lanl
Copy link
Collaborator Author

jhp-lanl commented Jan 2, 2025

But yeah, I think a broader conversation is warranted. I just wanted to make an issue to think about it more. This came up when I was thinking about how we manage our classes and what happens with shared memory and such.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants