Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Network Peer actor #231

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open

Add Network Peer actor #231

wants to merge 27 commits into from

Conversation

hmzakhalid
Copy link
Member

@hmzakhalid hmzakhalid commented Dec 30, 2024

Summary by CodeRabbit

Based on the comprehensive summary, here are the updated release notes:

  • Infrastructure Updates

    • Updated GitHub Actions workflow cache mechanism from v2 to v4.
  • Deployment Configuration

    • Updated peer network configuration for nodes and aggregator.
    • Removed redundant peer entries in deployment configurations.
  • Network Management

    • Introduced new DialerActor for improved peer connection management.
    • Enhanced event bus and message routing for network communications.
    • Migrated from tokio to actix runtime for network operations.
  • Code Refactoring

    • Simplified import statements and removed duplicate code.
    • Improved variable naming for clarity.
    • Centralized event type handling across components.

These changes primarily focus on improving the network infrastructure, event handling, and overall code organization.

Copy link
Contributor

coderabbitai bot commented Dec 30, 2024

Warning

Rate limit exceeded

@hmzakhalid has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 26 minutes and 26 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 05dee3b and 0d42941.

📒 Files selected for processing (2)
  • packages/ciphernode/net/src/bin/p2p_test.rs (5 hunks)
  • packages/ciphernode/net/src/dialer.rs (1 hunks)

Walkthrough

This pull request introduces significant architectural changes to the network communication and event handling mechanisms in the project. The modifications span across multiple files, primarily focusing on transitioning from a broadcast channel-based event system to an actor-based event bus model. Key changes include updating GitHub Actions workflow configurations, modifying network peer and network manager implementations, and refactoring event handling strategies. The changes aim to improve modularity, event routing, and overall network communication robustness.

Changes

File Change Summary
.github/workflows/* Updated actions/cache from v2 to v4 in CI/CD workflows
deploy/agg.yaml, deploy/cn*.yaml Removed and updated peer entries for network configuration
packages/ciphernode/enclave_core/src/* Renamed join_handle to handle and cleaned up import statements
packages/ciphernode/events/src/enclave_event/mod.rs Added event_type method to EnclaveEvent and updated trait implementation
packages/ciphernode/net/ Major refactoring of network communication, introducing DialerActor, updating event handling, and modularizing network peer logic
packages/ciphernode/net/src/lib.rs Adjusted module visibility, making dialer public and removing retry from public interface
packages/ciphernode/net/src/network_manager.rs Updated event handling to use net_bus and refactored event processing logic
packages/ciphernode/net/src/network_peer.rs Enhanced NetworkPeer to use net_bus for event transmission instead of broadcast
packages/ciphernode/net/src/retry.rs Removed retry mechanism implementation
packages/ciphernode/tests/tests/test_aggregation_and_decryption.rs Updated tests to utilize net_bus for event handling

Sequence Diagram

sequenceDiagram
    participant NetworkManager
    participant DialerActor
    participant NetworkPeer
    participant EventBus

    NetworkManager->>EventBus: Initialize net_bus
    NetworkManager->>DialerActor: Create dialer with net_bus
    DialerActor->>NetworkPeer: Initiate peer connection
    NetworkPeer->>EventBus: Send network events
    EventBus->>NetworkManager: Propagate events
Loading

Poem

🐰 Hops of Network Change
Actors dance, events take flight,
Buses carry messages bright,
From broadcast's old and weary way,
To modular paths that smoothly sway,
A rabbit's code, now swift and light! 🚀


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@hmzakhalid hmzakhalid marked this pull request as draft December 30, 2024 22:55
deploy/agg.yaml Show resolved Hide resolved
packages/ciphernode/net/src/network_manager.rs Outdated Show resolved Hide resolved
packages/ciphernode/net/src/network_peer.rs Outdated Show resolved Hide resolved
packages/ciphernode/net/src/network_peer.rs Outdated Show resolved Hide resolved
packages/ciphernode/net/src/network_peer.rs Outdated Show resolved Hide resolved
packages/ciphernode/net/src/network_peer.rs Outdated Show resolved Hide resolved
packages/ciphernode/net/src/network_peer.rs Outdated Show resolved Hide resolved
packages/ciphernode/net/src/network_peer.rs Outdated Show resolved Hide resolved
packages/ciphernode/net/src/network_peer.rs Outdated Show resolved Hide resolved
packages/ciphernode/net/src/network_peer.rs Outdated Show resolved Hide resolved
@ryardley
Copy link
Contributor

ryardley commented Jan 2, 2025

After our discussion we have found that in order to move forward with this it makes a lot of sense to look into replacing Actix before doing this.

@hmzakhalid hmzakhalid changed the base branch from tech-debt to main January 3, 2025 19:40
@hmzakhalid hmzakhalid marked this pull request as ready for review January 3, 2025 20:44
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (18)
packages/ciphernode/net/src/lib.rs (1)

11-11: Consider limiting exposures in your public interface.
Exporting entire modules (dialer, network_manager, network_peer, repo) may yield a large footprint. Review whether to selectively export only the types or functions needed at the public interface boundary.

deploy/cn1.yaml (1)

5-7: Ensure consistent peer addressing pattern

The peer URLs follow a consistent pattern but consider the following:

  1. The sequential port numbering (9092-9094) suggests a convention - document this if it's intentional
  2. Consider using environment variables for node addresses to make configuration more maintainable

Example refactor using environment variables:

peers:
-  - "/dns4/cn2/udp/9092/quic-v1"
-  - "/dns4/cn3/udp/9093/quic-v1"
-  - "/dns4/aggregator/udp/9094/quic-v1"
+  - "/dns4/${CN2_HOST}/udp/${CN2_PORT}/quic-v1"
+  - "/dns4/${CN3_HOST}/udp/${CN3_PORT}/quic-v1"
+  - "/dns4/${AGGREGATOR_HOST}/udp/${AGGREGATOR_PORT}/quic-v1"
.github/workflows/ci.yml (1)

38-38: Maintain consistent cache action versions

The upgrade to actions/cache@v4 aligns with the change in rust-ci.yml, which is good for maintaining consistency across workflows.

However, this infrastructure update appears unrelated to the PR's main objective of implementing a Network Peer actor. Consider:

  1. Moving these CI updates to a separate PR
  2. Updating the PR description to include these infrastructure changes
packages/ciphernode/net/src/bin/p2p_test.rs (2)

74-76: DialerActor usage
The DialerActor::dial_peer() pattern effectively decouples the dial logic from the main function. If you anticipate dynamic peer lists, consider scheduling repeated or conditional dials for new peers in case the environment changes at runtime.


103-115: Polling event history in a loop
Retrieving all past events in every loop iteration might cause performance degradation for large histories. A sliding window approach or a more targeted query (e.g., retrieving only events after the last known index) could reduce overhead if you expect a large event volume.

packages/ciphernode/net/src/network_manager.rs (2)

65-67: Subscribing to all event types
Using String::from("*") to subscribe to all events is convenient for prototyping but might introduce overhead if you only need a subset of events. If performance or clarity becomes an issue, consider narrower subscriptions or event filtering.


83-87: EventBus instantiation
Constructing a dedicated EventBus<NetworkPeerEvent> at runtime here is useful. If your application grows, you may need to centralize bus instantiation in a higher-level module to share or coordinate multiple bus instances more systematically.

packages/ciphernode/net/src/dialer.rs (4)

14-16: Backoff constants
Defining BACKOFF_DELAY and BACKOFF_MAX_RETRIES at the top is a neat approach. Expose them as configuration parameters or environment variables if you need to tweak connection retry strategies without recompiling.


57-65: DialPeer message handling
The pattern of creating a new DialerActor on-the-fly and then immediately sending a DialPeer message is flexible. However, spinning up multiple DialerActor instances for each dial might become resource-intensive. Consider a single DialerActor instance that handles all dials if frequent or batching of dial requests is expected.


66-100: attempt_dial method
Useful usage of self.pending_connection to track dial attempts. If you need advanced scheduling (e.g., exponential or jittered backoff beyond a static doubling), factor this logic into a separate backoff manager for code cleanliness.

Would you like me to open an issue or provide a snippet with an advanced jittered backoff algorithm?


176-245: Event handling - connection errors
The logic for retrying or permanently failing connections is generally robust. Consider implementing separate states or a typed mechanism for different error categories (e.g., ephemeral networking issues vs. unreachable addresses). This might simplify the logic if new error types appear.

packages/ciphernode/net/src/events.rs (2)

55-60: event_type method
This approach of extracting the event name via a string format is acceptable for debugging or logging. If you foresee a big performance overhead from string manipulation, consider a more direct approach (e.g., match arms returning a static &str).


70-73: event_id returning constant
Returning a hardcoded string as the event ID is a suitable short-term approach. If you add more specialized event categories within NetworkPeerEvent, consider returning distinct IDs for each variant to facilitate event filtering.

packages/ciphernode/events/src/enclave_event/mod.rs (1)

142-146: Consider removing duplicate method implementations.
It appears that event_type is defined both here and within the impl Event for EnclaveEvent. If the logic is the same, you can rely on the trait’s implementation to avoid redundancy and reduce maintenance overhead.

packages/ciphernode/net/src/network_peer.rs (2)

59-62: Listening on UDP QUIC.
Check if you need graceful error handling here. Currently, a ? will bubble up any error. That may be sufficient if you intend to fail fast.


94-107: Process Dial command logic.
Dial success/failure is handled properly. Consider whether you need exponential backoff or retry logic beyond a single attempt.

packages/ciphernode/enclave_core/src/aggregator_start.rs (2)

2-2: Consider architectural implications of Actix dependency

Given the discussion about potentially replacing Actix, it's worth noting that this file shows deep integration with Actix's actor system. Any future migration would require significant refactoring of the event bus and network components.

Consider:

  1. Abstracting actor-specific code behind interfaces
  2. Minimizing direct Actix dependencies in new code
  3. Planning a gradual migration strategy if replacement is desired

Line range hint 1-105: Consider documenting the new event bus architecture

The changes successfully integrate the new Network Peer actor and event bus system. However, given the complexity of the initialization process and the architectural significance of these changes, consider:

  1. Adding documentation about the event flow between components
  2. Creating architecture diagrams showing the actor relationships
  3. Documenting the rationale behind the actor-based design to help with future maintenance and potential Actix replacement

Would you like me to help create a template for the architectural documentation?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7fba85d and 9cdf56c.

📒 Files selected for processing (19)
  • .github/workflows/ci.yml (1 hunks)
  • .github/workflows/integration.yml (1 hunks)
  • .github/workflows/rust-ci.yml (1 hunks)
  • deploy/agg.yaml (0 hunks)
  • deploy/cn1.yaml (1 hunks)
  • deploy/cn2.yaml (0 hunks)
  • deploy/cn3.yaml (0 hunks)
  • packages/ciphernode/enclave_core/src/aggregator_start.rs (3 hunks)
  • packages/ciphernode/enclave_core/src/start.rs (3 hunks)
  • packages/ciphernode/events/src/enclave_event/mod.rs (1 hunks)
  • packages/ciphernode/net/Cargo.toml (1 hunks)
  • packages/ciphernode/net/src/bin/p2p_test.rs (5 hunks)
  • packages/ciphernode/net/src/dialer.rs (1 hunks)
  • packages/ciphernode/net/src/events.rs (4 hunks)
  • packages/ciphernode/net/src/lib.rs (1 hunks)
  • packages/ciphernode/net/src/network_manager.rs (5 hunks)
  • packages/ciphernode/net/src/network_peer.rs (6 hunks)
  • packages/ciphernode/net/src/retry.rs (0 hunks)
  • packages/ciphernode/tests/tests/test_aggregation_and_decryption.rs (4 hunks)
💤 Files with no reviewable changes (4)
  • deploy/cn3.yaml
  • deploy/cn2.yaml
  • deploy/agg.yaml
  • packages/ciphernode/net/src/retry.rs
🧰 Additional context used
📓 Learnings (1)
packages/ciphernode/net/Cargo.toml (1)
Learnt from: ryardley
PR: gnosisguild/enclave#145
File: packages/ciphernode/data/Cargo.toml:0-0
Timestamp: 2024-11-10T16:42:46.226Z
Learning: In `packages/ciphernode/data/Cargo.toml`, `actix-rt` is only used in tests and should remain in `[dev-dependencies]`.
🔇 Additional comments (33)
deploy/cn1.yaml (1)

5-7: Review security implications of UDP/QUIC configuration

While QUIC-v1 provides security features, consider the following security aspects:

  1. Ensure UDP ports (9092-9094) are properly secured in the network configuration
  2. Verify that DNS resolution is secure and cannot be hijacked
  3. Consider implementing additional network security measures (e.g., firewall rules, access controls)

Run this script to check for security-related configurations:

✅ Verification successful

Let me gather more information about the security configurations and network setup.


QUIC configuration appears secure with additional network security measures in place

Based on the verification results, the QUIC/UDP configuration appears to be properly secured:

  1. The system uses QUIC-v1 protocol which inherently provides encryption and security features
  2. Network isolation is implemented through Docker networks (global-network in docker-compose.yml)
  3. Each node has a unique network private key for authentication
  4. Ports are properly configured with environment variables (QUIC_PORT) rather than hardcoded values
  5. mDNS is explicitly disabled (enable_mdns: false) which reduces network discovery attack surface

Additional security measures observed:

  • Usage of secure WebSocket (wss://) for RPC connections
  • Configuration management through environment variables
  • Containerized deployment with network isolation
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for security-related configurations
# Expected: Find security-related settings across configuration files

# Search for security-related configurations
echo "Checking for security configurations..."
rg -i "security|firewall|tls|certificate|encrypt" deploy/

# Check for port definitions
echo "Checking port configurations..."
rg "udp/909[0-9]" deploy/

Length of output: 948


Script:

#!/bin/bash
# Check for TLS/security configurations in deployment files and documentation
echo "Checking for QUIC and TLS configurations..."
rg -A 5 -B 5 "quic-v1" deploy/

# Check for network security configurations
echo "Checking for network configurations..."
rg -i "network|port|listen|bind" deploy/

# Look for any security documentation
echo "Checking for security documentation..."
fd README -e md | xargs rg -i "security|network|configuration"

Length of output: 5453

.github/workflows/integration.yml (1)

Line range hint 1-140: Verify completeness of the PR

There appears to be a significant discrepancy between:

  1. The PR's stated objective of "Add Network Peer actor"
  2. The actual changes shown (GitHub Actions workflow updates)
  3. The AI summary mentioning architectural changes to network communication

Please verify:

  1. Are there additional files missing from this review?
  2. If not, should this PR be split into separate concerns (infrastructure updates vs. Network Peer implementation)?
🧰 Tools
🪛 yamllint (1.35.1)

[error] 50-50: trailing spaces

(trailing-spaces)

packages/ciphernode/net/src/bin/p2p_test.rs (3)

22-22: Switching from Tokio to Actix runtime
Using #[actix::main] instead of #[tokio::main] is a significant architectural shift. Ensure that all async tasks and dependencies, especially those previously tailored to Tokio, have been updated to avoid compatibility issues, particularly if they rely on Tokio-specific APIs or features.

Would you like a script to search for any Tokio-dependent snippets that might need to be refactored?


48-60: Event bus & peer channel initialization
Initializing both the mpsc channel and the EventBus in close succession is good practice. However, verify that the buffer size of 100 for the channel is appropriate based on expected traffic volume. If message bursts exceed this capacity, you may encounter dropped messages or deadlocks.

Please confirm using load or stress tests to ensure that a buffer of 100 is sufficient for peak event throughput.


141-142: Swarm handle early completion
Logging a warning if the swarm handle finishes prematurely is good. However, consider expanding the warning or capturing additional context about why it might have exited early, if you suspect ephemeral errors or race conditions.

packages/ciphernode/net/src/network_manager.rs (4)

24-24: New net_bus field
Storing the net_bus in NetworkManager is a clean way to unify event handling. Ensure all inbound and outbound events are thoroughly tested to confirm there are no missing routes for critical messages.


38-45: Extended constructor
Adding the net_bus parameter to the NetworkManager::new constructor is consistent with your architecture shift to an event bus. Double-check all callers of new to ensure they now pass the correct Addr<EventBus<NetworkPeerEvent>>.


102-110: Creating NetworkPeer
The new NetworkPeer::new call in setup_with_peer effectively ties together the bus, the receiver, and peer logic. Consider adding structured error logging for warn!("Failed to create NetworkPeer: {:?}", e); to facilitate diagnosing which part of the peer setup failed, especially if future logic expands.


126-139: Handler<NetworkPeerEvent> implementation
Forwarding GossipData to the EnclaveEvent bus is a clear separation of concerns. Confirm that no extraneous events inadvertently get forwarded if you introduce additional NetworkPeerEvent variants that should remain internal.

packages/ciphernode/net/src/events.rs (1)

42-45: Extended DialError usage
Adding connection_id to DialError events enhances debuggability by correlating errors to specific connection attempts. Make sure to handle potential collisions if your system reuses connection IDs.

packages/ciphernode/enclave_core/src/start.rs (2)

82-82: Renaming join_handle to handle
The rename is stylistically neutral, though it may improve consistency. Confirm that all references and logs reflect the updated naming convention to avoid confusion when debugging.


95-95: Returning updated tuple
Simultaneously returning (bus, handle, peer_id) rather than (bus, join_handle, peer_id) is consistent with the variable rename. Ensure any doc comments, if present, are updated to reflect this new naming.

packages/ciphernode/net/src/network_peer.rs (13)

1-3: Imports look good.
These imports improve readability by clarifying the usage of Actix and EventBus.


18-18: Validate that tokio::sync::mpsc meets concurrency needs.
Ensure that the chosen channel type adequately handles potential surges of commands without risking channel overload or blocking.


26-30: Good practice: making behaviour fields public.
Exposing these fields supports flexible extension and testing. Confirm that no sensitive fields are unintendedly exposed.


35-35: Introduction of the event bus field.
Replacing broadcast channels with a centralized event bus can simplify communication across multiple actors. This change appears beneficial for decoupling.


43-44: Accepting net_bus and cmd_rx in constructor.
Constructing the actor with the event bus can improve testability and modularity.


54-54: Initialize event bus in struct.
No issues seen here. The new approach aligns well with the final usage.


65-67: Subscription method.
The subscription approach with gossipsub is straightforward. If future logic is needed for subscription side-effects, consider capturing errors or returning them explicitly.


70-93: Process GossipPublish commands.
The handling of publish success/error and sending corresponding events via net_bus is clear and robust. Logging is also helpful for troubleshooting.


109-110: Concise closure of command processing.
No further concerns here; keep it simple and consistent.


112-121: Main loop structure with select!.
This approach cleanly merges incoming commands and swarm events. Double-check that all panics or critical errors are appropriately handled to avoid silent failures.


Line range hint 181-201: ConnectionEstablished event.
Adding addresses to the Kademlia DHT is correct, and updating gossipsub with the new peer is good.


210-213: OutgoingConnectionError handling.
Emitting an event with the failing connection’s ID is helpful for debugging.


247-247: Gossip data forwarding.
Sending the raw message.data over NetworkPeerEvent is straightforward. Confirm that large messages won’t cause memory issues.

packages/ciphernode/tests/tests/test_aggregation_and_decryption.rs (5)

482-486: Creation of the additional EventBus for NetworkPeerEvent.
This approach nicely differentiates internal enclave events from network-level events.


488-488: NetworkManager setup.
Invoking setup with both event buses is consistent with the new architecture.


505-505: Simulated gossip publish in test.
This logic nicely mimics a realistic network scenario by capturing the command in cmd_rx and then sending it to the network bus.


567-572: Second test NetworkManager setup.
Duplicating the pattern ensures consistent coverage. Tests appear thorough for verifying the new bus-based approach.


584-584: Receive event from the net bus in the test.
Successfully bridging NetworkPeerEvent::GossipData back into EnclaveEvent is crucial for integration tests.

packages/ciphernode/net/Cargo.toml (1)

11-12: Verify that these dependencies are required at runtime.
If anyhow and actix usage is limited to development or testing, consider placing them under [dev-dependencies]. Otherwise, this is acceptable if needed in production code.

packages/ciphernode/enclave_core/src/aggregator_start.rs (2)

105-105: Clean return statement maintains API contract

The variable renaming from join_handle to handle improves conciseness while maintaining the same functionality and type safety.


Line range hint 85-92: Network setup aligns with new event bus architecture

The network setup properly integrates with the new actor-based event bus while maintaining security through cipher and proper key management.

Let's verify the network peer implementation:

✅ Verification successful

Network setup properly integrates event bus with secure peer-to-peer communication

The verification confirms that the network setup in aggregator_start.rs correctly implements:

  • A dual event bus architecture (EnclaveEvent and NetworkPeerEvent) for proper message routing
  • Secure peer management with encrypted keypair handling
  • Proper async handling with tokio for network operations
  • Clean separation between network management and peer communication

Key implementation details:

  • NetworkManager acts as a bridge between EventBus events and Libp2p events
  • Proper error handling and connection management through DialerActor
  • Race condition prevention through proper async/await patterns and message passing
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify NetworkManager and NetworkPeer actor implementations
# Check for proper event handling patterns and potential race conditions

# Search for NetworkPeer actor implementation
ast-grep --pattern 'impl Actor for NetworkPeer {
  type Context = $_;
  fn started($_) {
    $$$
  }
}'

# Check for proper event bus subscription patterns
rg -A 5 'impl Handler<.*Event>' packages/ciphernode/net/src/network_peer.rs

Length of output: 183


Script:

#!/bin/bash
# Let's try a different approach to verify the network implementation

# Check NetworkManager setup and peer handling
rg -A 10 "setup_with_peer" packages/ciphernode/net/src/

# Look for event bus integration patterns
rg -A 5 "EventBus|bus\.send|bus\.subscribe" packages/ciphernode/net/src/

# Check network peer related structs and implementations
ast-grep --pattern 'struct NetworkPeer {
  $$$
}'

# Look for potential race conditions in async code
rg -A 5 "async fn|tokio::spawn|\.await" packages/ciphernode/net/src/

Length of output: 19583

deploy/cn1.yaml Show resolved Hide resolved
.github/workflows/rust-ci.yml Show resolved Hide resolved
.github/workflows/integration.yml Show resolved Hide resolved
@hmzakhalid hmzakhalid requested a review from ryardley January 6, 2025 06:37
@hmzakhalid
Copy link
Member Author

@ryardley This is ready for review now!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
packages/ciphernode/net/src/dialer.rs (2)

16-16: Remove unused constant

The CONNECTION_TIMEOUT constant is defined but never used in the code.

Either remove the unused constant or implement the timeout mechanism.


119-126: Add DNS resolution caching

The DNS resolution logic performs lookups on every attempt without caching. This could be inefficient for repeated connection attempts.

Consider implementing a TTL-based DNS cache:

use std::time::{Duration, Instant};

struct DnsCache {
    ip: String,
    expires: Instant,
}

impl Dialer {
    fn get_resolved_multiaddr(&self, value: &Multiaddr) -> Result<Multiaddr> {
        if let Some(domain) = self.extract_dns_host(value) {
            if let Some(cached) = self.dns_cache.get(&domain) {
                if cached.expires > Instant::now() {
                    return self.dns_to_ip_addr(value, &cached.ip);
                }
            }
            let ip = self.resolve_ipv4(&domain)?;
            self.dns_cache.insert(domain, DnsCache {
                ip: ip.clone(),
                expires: Instant::now() + Duration::from_secs(300), // 5 minutes TTL
            });
            self.dns_to_ip_addr(value, &ip)
        } else {
            Ok(value.clone())
        }
    }
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4ceda49 and 05dee3b.

📒 Files selected for processing (4)
  • packages/ciphernode/events/src/enclave_event/mod.rs (1 hunks)
  • packages/ciphernode/net/src/bin/p2p_test.rs (5 hunks)
  • packages/ciphernode/net/src/dialer.rs (1 hunks)
  • packages/ciphernode/net/src/network_manager.rs (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/ciphernode/events/src/enclave_event/mod.rs
  • packages/ciphernode/net/src/network_manager.rs
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: test-net
  • GitHub Check: prebuild
  • GitHub Check: ci
  • GitHub Check: Build Image
🔇 Additional comments (2)
packages/ciphernode/net/src/bin/p2p_test.rs (1)

50-54: Consider memory implications of EventBus configuration

The EventBus is configured to capture history without deduplication. While this might be intentional for testing, it could lead to memory growth in long-running tests.

Consider adding a cleanup mechanism or limiting the history size. Would you like me to propose an implementation?

packages/ciphernode/net/src/dialer.rs (1)

30-33: Consider using Actor messages instead of tx channel

As suggested in previous reviews, the tx channel could be replaced with Actor messages for a more consistent actor-based approach.

Consider implementing a unified message type:

enum NetworkMessage {
    Event(NetworkPeerEvent),
    Command(NetworkPeerCommand)
}

And updating the Dialer struct:

pub struct Dialer {
    net_bus: Addr<EventBus<NetworkMessage>>,
    pending_connection: HashMap<ConnectionId, PendingConnection>,
}

packages/ciphernode/net/src/bin/p2p_test.rs Outdated Show resolved Hide resolved
packages/ciphernode/net/src/bin/p2p_test.rs Show resolved Hide resolved
packages/ciphernode/net/src/bin/p2p_test.rs Show resolved Hide resolved
packages/ciphernode/net/src/dialer.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@ryardley ryardley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think we should re-consider splitting up NetworkPeer into "Actor" and "Inner" and use the "Actor" to do things like correlations on synchronous things like dialing and errors. We can then use the retry function we already have and call it within the Dialer actor to handle the retry functionality in a really simple and reusable way.
  2. We should only use channels to communicate between libp2p struct and the wrapper actor and should favor a unified event for the event bus that is both commands and events. (Splitting them up is as confusing as using separate channels)

@hmzakhalid
Copy link
Member Author

  1. I think we should re-consider splitting up NetworkPeer into "Actor" and "Inner" and use the "Actor" to do things like correlations on synchronous things like dialing and errors. We can then use the retry function we already have and call it within the Dialer actor to handle the retry functionality in a really simple and reusable way.
  2. We should only use channels to communicate between libp2p struct and the wrapper actor and should favor a unified event for the event bus that is both commands and events. (Splitting them up is as confusing as using separate channels)

There's an issue with NetworkPeer and NetworkPeerInner split.
The NetworkPeerInner cannot be stored directly in the NetworkPeer actor because Swarm cannot be cloned.

Which has some important implications:

  1. We can't directly call methods on NetworkPeerInner from NetworkPeer because:

    • The Inner must live in its own task/thread
    • We can't share or clone the Swarm between tasks
  2. This forces us to use channels for all communication:

    • mpsc channel for sending commands to the Inner
    • broadcast channel for receiving SwarmEvents from the Inner

and on the broadcast channel suggestion - even with a broadcast channel, we still need an mpsc channel for commands because:

  • Any interaction with the Swarm (like dialing) must go through its own task
  • We can't directly call methods on the Swarm from the NetworkPeer actor

The architecture would look like:
NetworkPeer (Actor) -> mpsc -> NetworkPeerInner (SwarmOperations (Dial, Publish))
NetworkPeerInner -> broadcast -> NetworkPeer (SwarmEvents)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants