-
-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for pinning actors to a dedicated scheduler thread #4547
Conversation
We should discuss if this should have anything added to the tutorial to cover this (and other scheduler basics). |
Hi @dipinhora, The changelog - added label was added to this pull request; all PRs with a changelog label need to have release notes included as part of the PR. If you haven't added release notes already, please do. Release notes are added by creating a uniquely named file in the The basic format of the release notes (using markdown) should be:
Thanks. |
yeah, that's probably a good idea... also, not sure if this should be |
what would "experimental" mean in practical purposes? behind a runtime flag? |
erm... not necessarily behind a runtime flag since it's effectively a no-op if no actors ever get pinned.. |
not sure why
and nothing in this PR should changed how ASIO shutdown coordination occurs to possibly trigger that assertion.. |
I think there's a race condition in shutdown with kqueue and other asio systems and that a recent change made it more likely to happen in our CI. UPDATE: I found an issue and opened a PR: #4548 |
release notes added |
wow.. it happened for 3 more CI builds.. 8*/ maybe i should rebase onto |
No guarantee it fixes, but yes, sounds like a good idea. |
@dipinhora thoughts on main thread vs starting a dedicated thread for pinned actors and leaving the main thread "simpler"? |
i have no preference one way or the other but my limited understanding is that some libraries/OSes require that the |
It would be nice to be able to pin the "pinned actor" thread like we can with the asio thread and the I'm not sure how that would interact with having the main startup thread be the "pinned actor" thread. |
bddf568
to
d8a07e0
Compare
force pushed after the rebase.. hopefully it helps.. |
sure would! 8*P
me neither.. i'm going to punt on pinning the |
Looks like there is still an issue. Null pointer boom in the same place with a release build. Huh. |
that's annoying..
any idea what change made the ASIO backend null assertion trigger more often in CI? maybe that'll provide a clue that might help identify the root cause.. |
it is possible on startup for the backend to be set to NULL and for us to continue running so that is also a possibility. |
actually i cant see how we would keep running if it is null. |
i think i know what's going on with the ASIO ponyc/src/libponyrt/sched/scheduler.c Line 477 in f0d7afd
|
I've been looking for a way for the shutdown to happen early. That is probably another source of bug. If it shuts down early, the backend would be set to NULL while we are still trying to use it. What's the proper logic? Can you open a PR? |
src/libponyrt/sched/scheduler.c
Outdated
@@ -470,7 +575,8 @@ static bool quiescent(scheduler_t* sched, uint64_t tsc, uint64_t tsc2) | |||
|
|||
// only scheduler 0 can initiate shutdown (it is the ony that gets all the | |||
// ACK messages as part of the CNF/ACK coordination for shutdown) | |||
if(0 == sched->index) | |||
// only if there are no pinned actors in the system... | |||
if(0 == sched->index && 0 == get_pinned_actor_count()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the check against the pinned actor count for quiescence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so an app with only pinned actors doesn't shut down too early..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pinning doesn't prevent garbage collection of the actor so, i don't think this is needed.
i think what is needed is the usual "there's a noisy" or similiar.
we discussed this a good deal on the sync call. let's find some time to have a synchronous conversation, you and i about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pinning doesn't prevent garbage collection of the actor so, i don't think this is needed.
i don't see how garbage collection of pinned actors is related to this..
i think what is needed is the usual "there's a noisy" or similiar.
sure, that's another implementation option.. the current option falls into or similar
in my opinion because i expect the atomic variable used would have little to no contention so that's what i implemented..
we discussed this a good deal on the sync call. let's find some time to have a synchronous conversation, you and i about this.
yep.. sounds like a plan..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would an app with only pinned actors "shutdown early"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no.. my test was without any pinned actors.. your statement indicated that noblock would prevent quiescent if there are any cycles even with no pinned actors (or i misunderstood your statement to imply this) and i wanted to confirm that behavior...
I was incorrect on that front.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no.. my test was without any pinned actors..
So this change, without any actors pinned causes the ring example to not exit if noblock is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no.. my test was without any pinned actors..
So this change, without any actors pinned causes the ring example to not exit if noblock is used?
no, this change doesn't impact pre-existing behavior is no actors are pinned.. the quiescence
logic is impacted only if actors are pinned...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what part do you think is problematic? the existing behavior or the change in behavior with this PR and pinned actors or both?
I think the ideal would be for you to be able to pin actors and have everything "continue to work as before". Whether this is doable with the various trade-offs is what I want to look at.
yes, we discussed this already in comments yesterday and landed on experimental
for now while we iterate on the quiescence
implementation details in future PRs as opposed to trying to be perfect with this PR..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what part do you think is problematic? the existing behavior or the change in behavior with this PR and pinned actors or both?
I think the ideal would be for you to be able to pin actors and have everything "continue to work as before". Whether this is doable with the various trade-offs is what I want to look at.
yes, we discussed this already in comments yesterday and landed on
experimental
for now while we iterate on thequiescence
implementation details in future PRs as opposed to trying to be perfect with this PR..
yup. i'm ok with that approach. i am trying to understand better to craft what i think would be a more clear "caveat" message.
just opened #4550 |
@dipinhora once #4553 is merged, please rebase this against main. |
d5f6321
to
a21209d
Compare
.release-notes/4547.md
Outdated
|
||
The way it works is that an actor can request that it be pinned (which may or may not happen immediately) and then it must wait and check to confirm that the pinning was successfully applied (prior to running any workload that required the actor to be pinned) after which all subsequent behaviors on that actor will run on the same scheduler thread until the actor is destroyed or the actor requests to be unpinned. | ||
|
||
Additional details can be found in the `actor_pinning` package in the standard library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like "robust" release notes. Can you add an example program in here that demonstrates the concepts?
I also think the release notes should include a notice that this is an experimental feature and likely to change. That the implementation will evolve and that might change the feature in general. That sort of thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the release notes have been fattened up..
I've been I did both extremes:
In all cases, it moved the pinned actors 100% of the time on the first behaviour call. I have yet to break it. |
This is a function of the program design.. the following takes use "actor_pinning"
actor Main
new create(env: Env) =>
let p = Pactor(env)
// the following all happen before the actor is pinned
p.check_pinned()
p.check_pinned()
p.check_pinned()
actor Pactor
let _env: Env
let _auth: PinUnpinActorAuth
var _count: I32 = 1
new create(env: Env) =>
_env = env
_auth = PinUnpinActorAuth(env.root)
ActorPinning.request_pin(_auth)
// the following will happen after the actor is pinned
check_pinned()
be check_pinned() =>
if ActorPinning.is_successfully_pinned(_auth) then
_env.out.print("Attempts until successfully pinned: " + _count.string())
else
_count = _count + 1
end while the following is successful on the first attempt: use "actor_pinning"
actor Main
new create(env: Env) =>
let p = Pactor(env)
actor Pactor
let _env: Env
let _auth: PinUnpinActorAuth
var _count: I32 = 1
new create(env: Env) =>
_env = env
_auth = PinUnpinActorAuth(env.root)
ActorPinning.request_pin(_auth)
// the following will happen after the actor is pinned
check_pinned()
be check_pinned() =>
if ActorPinning.is_successfully_pinned(_auth) then
_env.out.print("Attempts until successfully pinned: " + _count.string())
else
_count = _count + 1
end |
This isn't a meaningful statement. As Dipin demonstrates, it has nothing to do with the feature and everything to do with when you are checking. If you put a check in a loop immediately after requesting a pin then you'd never see success based on the current runtime implementation. The important bit is "it will happen eventually" where "eventually" is implementation specific. |
#4556 has been opened for more shutdown/termination logic changes to try and get that edge case that kept cropping up in the it would probably make sense to wait for it to be merged and then to rebase this PR onto |
@dipinhora we are good with going forward with the "experimental" strategy we discussed. |
@dipinhora you can rebase this against main. the quiescence fix is in. |
The overall design goal and approach was to make it possible to have pinned actors while minimizing impact to pre-existing non-pinned actor workloads. This meant there could be no impact on message sends (i.e. can't check to see if the receiving actor is a pinned actor or not to decide what to do with it if it is unscheduled). These goals were chosen because it is expected that `pinned` actors will be a niche/small part of any pony application's overall workload. The approach taken has negligible performance impact to existing scheduler logic. It adds a couple of extra checks to see if an actor that is ready to run is a pinned actor or not and if not, there is no other overhead involved. The scheduler quiescence logic has an extra check for an atomic counter of pinned actors but that is also negligible if no pinned actors are ever used. The overall logic for pinned actors works as follows: * The `main` thread is dedicated to running pinned actors (and only pinned actors). This thread previously initialized the runtime and then sat around waiting for all schedulers to reach quiescence so now it runs pinned actors in the meantime if there are any. * The `pinned actor thread` (`main`) runs a custom run loop for pinned actors that does not participate in work stealing or any other normal scheduler messaging except for unmuting messages and the termination message. It also will only ever run `pinned` actors and any non-`pinned` actors will get pushed onto the `inject` queue. * Normal schedulers will only ever run non-`pinned` actors and any `pinned` actors will get pushed onto the `pinned actor thread`'s queue. * From an api perspective, there is now an `actor_pinning` package in the stdlib. An actor can request to be pinned, check that it has successfully been pinned (so that it can safely do whatever it needs to do while pinned), and request to be unpinned. While the above is not necessarily the most efficient way to run `pinned` actors, it meets the original design goals of making it possible while minimizing impact of pre-existing non-pinned actor workloads.
and also update the package documentation to clarify that `pinning` is not an immediate action
Co-authored-by: Sean T Allen <[email protected]>
4c5032f
to
2dfa7d7
Compare
rebased..
ummm.. about that... the
also for the same reason, the pinned actor thread can be pinned now.. |
@@ -43,6 +43,10 @@ | |||
" --ponypinasio Pin the ASIO thread to a CPU the way scheduler\n" \ | |||
" threads are pinned to CPUs. Requires `--ponypin` to\n" \ | |||
" be set to have any effect.\n" \ | |||
" --ponypinpinnedactorthread\n" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sweet addition
The overall design goal and approach was to make it possible to have pinned actors while minimizing impact to pre-existing non-pinned actor workloads. This meant there could be no impact on message sends (i.e. can't check to see if the receiving actor is a pinned actor or not to decide what to do with it if it is unscheduled). These goals were chosen because it is expected that
pinned
actors will be a niche/small part of any pony application's overall workload.The approach taken has negligible performance impact to existing scheduler logic. It adds a couple of extra checks to see if an actor that is ready to run is a pinned actor or not and if not, there is no other overhead involved. The scheduler quiescence logic has an extra check for an atomic counter of pinned actors but that is also negligible if no pinned actors are ever used.
The overall logic for pinned actors works as follows:
main
thread is dedicated to running pinned actors (and only pinned actors). This thread previously initialized the runtime and then sat around waiting for all schedulers to reach quiescence so now it runs pinned actors in the meantime if there are any.pinned actor thread
(main
) runs a custom run loop for pinned actors that does not participate in work stealing or any other normal scheduler messaging except for unmuting messages and the termination message. It also will only ever runpinned
actors and any non-pinned
actors will get pushed onto theinject
queue.pinned
actors and anypinned
actors will get pushed onto thepinned actor thread
's queue.actor_pinning
package in the stdlib. An actor can request to be pinned, check that it has successfully been pinned (so that it can safely do whatever it needs to do while pinned), and request to be unpinned.While the above is not necessarily the most efficient way to run
pinned
actors, it meets the original design goals of making it possible while minimizing impact of pre-existing non-pinned actor workloads.