Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust): improve ockam_node #8718

Conversation

SanjoDeundiak
Copy link
Member

@SanjoDeundiak SanjoDeundiak commented Dec 22, 2024

  1. Using Oneshot channels where possible - faster, better semantics, avoiding async when sending a signal
  2. Make start_worker and stop_worker sync, which allowed to make a lot of things sync, like start a secure channel listener, outlet, create TcpTransport, etc.
  3. Removing code that was leading to implicit Clones (like &str -> Address conversion). Cloning causes a lot of small allocation and memory fragmentation => therefore should be avoided => therefore should explicit for starters.
  4. HashMaps and HashSet instead of BTreeMap and BTreeSet for better performance
  5. More granular locks (lock individual fields instead of the whole struct) inside Router.
  6. Renaming to improve vocabulary consistency:
    • ctx.stop_worker(), ctx.stop_processor() -> ctx.stop_address()
    • ctx.stop() -> ctx.shutdown_node()
    • node.stop() -> node.shutdown()
    • terminate was used together with stop for the same things -> removed
  7. Using Weak<Router> for clearer ownership over Router, which helps to drop it earlier and in more predictable way. E.g., detached Context will no longer keep Router from running => we can shutdown the environment successfully and then detached Context instances that are left can be dropped safely considering they no longer has access to the Router.
  8. Use tokio::Handle instead of tokio::Runtime where possible, to achieve much clearer ownership over tokio's Runtime. Owning the Runtime instance keeps runtime alive + exclusive right to terminate it, while tokio::Handle only gives access to the Runtime, but doesn't keep it alive itself.
  9. Got read of AsyncTryClone. Cloning Context type became sync, so we no longer need AsyncTryClone. Still needed failable clone, so replaced with TryClone
  10. Made Worker and Processor treatment by the Router/Node almost identical, which will ease future merge of this abstractions.
  11. Replaced clusters with Priority shutdown. Each worker can specify its shutdown priority (or get a default one). During node shutdown workers will be stopped in batches starting with the highest priority to the lowest. Why it's useful: workers rely on other workers, e.g. secure channel worker needs tcp connection worker to deliver messages. Stopping secure channel worker first, can possible allow it to still use tcp connection worker to deliver some final message(s) to the other side (e.g., "close secure channel message")

Possible future improvements:

  1. More tests
  2. Merge Worker and Processor into one trait
  3. Try lockless structures (HashMaps and HashSets at least)
  4. Shutdown tokio Environment in cases where something hangs and can't shutdown gracefully

@SanjoDeundiak SanjoDeundiak force-pushed the sanjo/ockam_node_impl branch 7 times, most recently from d623ed3 to 483a9e1 Compare December 25, 2024 16:29
@SanjoDeundiak SanjoDeundiak force-pushed the sanjo/ockam_node_impl branch 2 times, most recently from 3b8a913 to 0572200 Compare December 30, 2024 21:31
@@ -4,5 +4,3 @@ pub mod common;
pub mod session;
#[allow(dead_code)]
pub mod test_spans;
#[allow(dead_code)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etorreborre is it ok to delete it? It looks unused

@SanjoDeundiak SanjoDeundiak force-pushed the sanjo/ockam_node_impl branch 12 times, most recently from 6953d64 to af1e201 Compare January 10, 2025 11:22
Copy link
Member

@davide-baldo davide-baldo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
Everything can be addressed in a follow-up

}
}
},
result = ctrl_rx.recv() => {
if result.is_some() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this a bug?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed if result.is_some() {, and result comes from the previous select!{} condition result = self.recv_message() => {, was this preventing a relay shutdown?

implementations/rust/ockam/ockam_node/src/router/router.rs Outdated Show resolved Hide resolved
implementations/rust/ockam/ockam_node/src/shutdown.rs Outdated Show resolved Hide resolved
@SanjoDeundiak SanjoDeundiak force-pushed the sanjo/ockam_node_impl branch 2 times, most recently from 15bab15 to ff4e120 Compare January 10, 2025 22:22
@SanjoDeundiak SanjoDeundiak marked this pull request as ready for review January 10, 2025 22:49
@SanjoDeundiak SanjoDeundiak requested a review from a team as a code owner January 10, 2025 22:49
@SanjoDeundiak SanjoDeundiak deleted the sanjo/ockam_node_impl branch January 14, 2025 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants