-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Network Peer actor #231
base: main
Are you sure you want to change the base?
Conversation
Warning Rate limit exceeded@hmzakhalid has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 26 minutes and 26 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
WalkthroughThis pull request introduces significant architectural changes to the network communication and event handling mechanisms in the project. The modifications span across multiple files, primarily focusing on transitioning from a broadcast channel-based event system to an actor-based event bus model. Key changes include updating GitHub Actions workflow configurations, modifying network peer and network manager implementations, and refactoring event handling strategies. The changes aim to improve modularity, event routing, and overall network communication robustness. Changes
Sequence DiagramsequenceDiagram
participant NetworkManager
participant DialerActor
participant NetworkPeer
participant EventBus
NetworkManager->>EventBus: Initialize net_bus
NetworkManager->>DialerActor: Create dialer with net_bus
DialerActor->>NetworkPeer: Initiate peer connection
NetworkPeer->>EventBus: Send network events
EventBus->>NetworkManager: Propagate events
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
After our discussion we have found that in order to move forward with this it makes a lot of sense to look into replacing Actix before doing this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (18)
packages/ciphernode/net/src/lib.rs (1)
11-11
: Consider limiting exposures in your public interface.
Exporting entire modules (dialer
,network_manager
,network_peer
,repo
) may yield a large footprint. Review whether to selectively export only the types or functions needed at the public interface boundary.deploy/cn1.yaml (1)
5-7
: Ensure consistent peer addressing patternThe peer URLs follow a consistent pattern but consider the following:
- The sequential port numbering (9092-9094) suggests a convention - document this if it's intentional
- Consider using environment variables for node addresses to make configuration more maintainable
Example refactor using environment variables:
peers: - - "/dns4/cn2/udp/9092/quic-v1" - - "/dns4/cn3/udp/9093/quic-v1" - - "/dns4/aggregator/udp/9094/quic-v1" + - "/dns4/${CN2_HOST}/udp/${CN2_PORT}/quic-v1" + - "/dns4/${CN3_HOST}/udp/${CN3_PORT}/quic-v1" + - "/dns4/${AGGREGATOR_HOST}/udp/${AGGREGATOR_PORT}/quic-v1".github/workflows/ci.yml (1)
38-38
: Maintain consistent cache action versionsThe upgrade to actions/cache@v4 aligns with the change in rust-ci.yml, which is good for maintaining consistency across workflows.
However, this infrastructure update appears unrelated to the PR's main objective of implementing a Network Peer actor. Consider:
- Moving these CI updates to a separate PR
- Updating the PR description to include these infrastructure changes
packages/ciphernode/net/src/bin/p2p_test.rs (2)
74-76
: DialerActor usage
TheDialerActor::dial_peer()
pattern effectively decouples the dial logic from the main function. If you anticipate dynamic peer lists, consider scheduling repeated or conditional dials for new peers in case the environment changes at runtime.
103-115
: Polling event history in a loop
Retrieving all past events in every loop iteration might cause performance degradation for large histories. A sliding window approach or a more targeted query (e.g., retrieving only events after the last known index) could reduce overhead if you expect a large event volume.packages/ciphernode/net/src/network_manager.rs (2)
65-67
: Subscribing to all event types
UsingString::from("*")
to subscribe to all events is convenient for prototyping but might introduce overhead if you only need a subset of events. If performance or clarity becomes an issue, consider narrower subscriptions or event filtering.
83-87
:EventBus
instantiation
Constructing a dedicatedEventBus<NetworkPeerEvent>
at runtime here is useful. If your application grows, you may need to centralize bus instantiation in a higher-level module to share or coordinate multiple bus instances more systematically.packages/ciphernode/net/src/dialer.rs (4)
14-16
: Backoff constants
DefiningBACKOFF_DELAY
andBACKOFF_MAX_RETRIES
at the top is a neat approach. Expose them as configuration parameters or environment variables if you need to tweak connection retry strategies without recompiling.
57-65
:DialPeer
message handling
The pattern of creating a newDialerActor
on-the-fly and then immediately sending aDialPeer
message is flexible. However, spinning up multipleDialerActor
instances for each dial might become resource-intensive. Consider a single DialerActor instance that handles all dials if frequent or batching of dial requests is expected.
66-100
:attempt_dial
method
Useful usage ofself.pending_connection
to track dial attempts. If you need advanced scheduling (e.g., exponential or jittered backoff beyond a static doubling), factor this logic into a separate backoff manager for code cleanliness.Would you like me to open an issue or provide a snippet with an advanced jittered backoff algorithm?
176-245
: Event handling - connection errors
The logic for retrying or permanently failing connections is generally robust. Consider implementing separate states or a typed mechanism for different error categories (e.g., ephemeral networking issues vs. unreachable addresses). This might simplify the logic if new error types appear.packages/ciphernode/net/src/events.rs (2)
55-60
:event_type
method
This approach of extracting the event name via a string format is acceptable for debugging or logging. If you foresee a big performance overhead from string manipulation, consider a more direct approach (e.g., match arms returning a static &str).
70-73
:event_id
returning constant
Returning a hardcoded string as the event ID is a suitable short-term approach. If you add more specialized event categories withinNetworkPeerEvent
, consider returning distinct IDs for each variant to facilitate event filtering.packages/ciphernode/events/src/enclave_event/mod.rs (1)
142-146
: Consider removing duplicate method implementations.
It appears thatevent_type
is defined both here and within theimpl Event for EnclaveEvent
. If the logic is the same, you can rely on the trait’s implementation to avoid redundancy and reduce maintenance overhead.packages/ciphernode/net/src/network_peer.rs (2)
59-62
: Listening on UDP QUIC.
Check if you need graceful error handling here. Currently, a?
will bubble up any error. That may be sufficient if you intend to fail fast.
94-107
: Process Dial command logic.
Dial success/failure is handled properly. Consider whether you need exponential backoff or retry logic beyond a single attempt.packages/ciphernode/enclave_core/src/aggregator_start.rs (2)
2-2
: Consider architectural implications of Actix dependencyGiven the discussion about potentially replacing Actix, it's worth noting that this file shows deep integration with Actix's actor system. Any future migration would require significant refactoring of the event bus and network components.
Consider:
- Abstracting actor-specific code behind interfaces
- Minimizing direct Actix dependencies in new code
- Planning a gradual migration strategy if replacement is desired
Line range hint
1-105
: Consider documenting the new event bus architectureThe changes successfully integrate the new Network Peer actor and event bus system. However, given the complexity of the initialization process and the architectural significance of these changes, consider:
- Adding documentation about the event flow between components
- Creating architecture diagrams showing the actor relationships
- Documenting the rationale behind the actor-based design to help with future maintenance and potential Actix replacement
Would you like me to help create a template for the architectural documentation?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
.github/workflows/ci.yml
(1 hunks).github/workflows/integration.yml
(1 hunks).github/workflows/rust-ci.yml
(1 hunks)deploy/agg.yaml
(0 hunks)deploy/cn1.yaml
(1 hunks)deploy/cn2.yaml
(0 hunks)deploy/cn3.yaml
(0 hunks)packages/ciphernode/enclave_core/src/aggregator_start.rs
(3 hunks)packages/ciphernode/enclave_core/src/start.rs
(3 hunks)packages/ciphernode/events/src/enclave_event/mod.rs
(1 hunks)packages/ciphernode/net/Cargo.toml
(1 hunks)packages/ciphernode/net/src/bin/p2p_test.rs
(5 hunks)packages/ciphernode/net/src/dialer.rs
(1 hunks)packages/ciphernode/net/src/events.rs
(4 hunks)packages/ciphernode/net/src/lib.rs
(1 hunks)packages/ciphernode/net/src/network_manager.rs
(5 hunks)packages/ciphernode/net/src/network_peer.rs
(6 hunks)packages/ciphernode/net/src/retry.rs
(0 hunks)packages/ciphernode/tests/tests/test_aggregation_and_decryption.rs
(4 hunks)
💤 Files with no reviewable changes (4)
- deploy/cn3.yaml
- deploy/cn2.yaml
- deploy/agg.yaml
- packages/ciphernode/net/src/retry.rs
🧰 Additional context used
📓 Learnings (1)
packages/ciphernode/net/Cargo.toml (1)
Learnt from: ryardley
PR: gnosisguild/enclave#145
File: packages/ciphernode/data/Cargo.toml:0-0
Timestamp: 2024-11-10T16:42:46.226Z
Learning: In `packages/ciphernode/data/Cargo.toml`, `actix-rt` is only used in tests and should remain in `[dev-dependencies]`.
🔇 Additional comments (33)
deploy/cn1.yaml (1)
5-7
: Review security implications of UDP/QUIC configurationWhile QUIC-v1 provides security features, consider the following security aspects:
- Ensure UDP ports (9092-9094) are properly secured in the network configuration
- Verify that DNS resolution is secure and cannot be hijacked
- Consider implementing additional network security measures (e.g., firewall rules, access controls)
Run this script to check for security-related configurations:
✅ Verification successful
Let me gather more information about the security configurations and network setup.
QUIC configuration appears secure with additional network security measures in place
Based on the verification results, the QUIC/UDP configuration appears to be properly secured:
- The system uses QUIC-v1 protocol which inherently provides encryption and security features
- Network isolation is implemented through Docker networks (
global-network
in docker-compose.yml)- Each node has a unique network private key for authentication
- Ports are properly configured with environment variables (
QUIC_PORT
) rather than hardcoded values- mDNS is explicitly disabled (
enable_mdns: false
) which reduces network discovery attack surfaceAdditional security measures observed:
- Usage of secure WebSocket (wss://) for RPC connections
- Configuration management through environment variables
- Containerized deployment with network isolation
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for security-related configurations # Expected: Find security-related settings across configuration files # Search for security-related configurations echo "Checking for security configurations..." rg -i "security|firewall|tls|certificate|encrypt" deploy/ # Check for port definitions echo "Checking port configurations..." rg "udp/909[0-9]" deploy/Length of output: 948
Script:
#!/bin/bash # Check for TLS/security configurations in deployment files and documentation echo "Checking for QUIC and TLS configurations..." rg -A 5 -B 5 "quic-v1" deploy/ # Check for network security configurations echo "Checking for network configurations..." rg -i "network|port|listen|bind" deploy/ # Look for any security documentation echo "Checking for security documentation..." fd README -e md | xargs rg -i "security|network|configuration"Length of output: 5453
.github/workflows/integration.yml (1)
Line range hint
1-140
: Verify completeness of the PRThere appears to be a significant discrepancy between:
- The PR's stated objective of "Add Network Peer actor"
- The actual changes shown (GitHub Actions workflow updates)
- The AI summary mentioning architectural changes to network communication
Please verify:
- Are there additional files missing from this review?
- If not, should this PR be split into separate concerns (infrastructure updates vs. Network Peer implementation)?
🧰 Tools
🪛 yamllint (1.35.1)
[error] 50-50: trailing spaces
(trailing-spaces)
packages/ciphernode/net/src/bin/p2p_test.rs (3)
22-22
: Switching from Tokio to Actix runtime
Using#[actix::main]
instead of#[tokio::main]
is a significant architectural shift. Ensure that all async tasks and dependencies, especially those previously tailored to Tokio, have been updated to avoid compatibility issues, particularly if they rely on Tokio-specific APIs or features.Would you like a script to search for any Tokio-dependent snippets that might need to be refactored?
48-60
: Event bus & peer channel initialization
Initializing both the mpsc channel and theEventBus
in close succession is good practice. However, verify that the buffer size of 100 for the channel is appropriate based on expected traffic volume. If message bursts exceed this capacity, you may encounter dropped messages or deadlocks.Please confirm using load or stress tests to ensure that a buffer of 100 is sufficient for peak event throughput.
141-142
: Swarm handle early completion
Logging a warning if the swarm handle finishes prematurely is good. However, consider expanding the warning or capturing additional context about why it might have exited early, if you suspect ephemeral errors or race conditions.packages/ciphernode/net/src/network_manager.rs (4)
24-24
: Newnet_bus
field
Storing thenet_bus
inNetworkManager
is a clean way to unify event handling. Ensure all inbound and outbound events are thoroughly tested to confirm there are no missing routes for critical messages.
38-45
: Extended constructor
Adding thenet_bus
parameter to theNetworkManager::new
constructor is consistent with your architecture shift to an event bus. Double-check all callers ofnew
to ensure they now pass the correctAddr<EventBus<NetworkPeerEvent>>
.
102-110
: CreatingNetworkPeer
The newNetworkPeer::new
call insetup_with_peer
effectively ties together the bus, the receiver, and peer logic. Consider adding structured error logging forwarn!("Failed to create NetworkPeer: {:?}", e);
to facilitate diagnosing which part of the peer setup failed, especially if future logic expands.
126-139
:Handler<NetworkPeerEvent>
implementation
ForwardingGossipData
to theEnclaveEvent
bus is a clear separation of concerns. Confirm that no extraneous events inadvertently get forwarded if you introduce additionalNetworkPeerEvent
variants that should remain internal.packages/ciphernode/net/src/events.rs (1)
42-45
: ExtendedDialError
usage
Addingconnection_id
toDialError
events enhances debuggability by correlating errors to specific connection attempts. Make sure to handle potential collisions if your system reuses connection IDs.packages/ciphernode/enclave_core/src/start.rs (2)
82-82
: Renamingjoin_handle
tohandle
The rename is stylistically neutral, though it may improve consistency. Confirm that all references and logs reflect the updated naming convention to avoid confusion when debugging.
95-95
: Returning updated tuple
Simultaneously returning(bus, handle, peer_id)
rather than(bus, join_handle, peer_id)
is consistent with the variable rename. Ensure any doc comments, if present, are updated to reflect this new naming.packages/ciphernode/net/src/network_peer.rs (13)
1-3
: Imports look good.
These imports improve readability by clarifying the usage of Actix and EventBus.
18-18
: Validate thattokio::sync::mpsc
meets concurrency needs.
Ensure that the chosen channel type adequately handles potential surges of commands without risking channel overload or blocking.
26-30
: Good practice: making behaviour fields public.
Exposing these fields supports flexible extension and testing. Confirm that no sensitive fields are unintendedly exposed.
35-35
: Introduction of the event bus field.
Replacing broadcast channels with a centralized event bus can simplify communication across multiple actors. This change appears beneficial for decoupling.
43-44
: Acceptingnet_bus
andcmd_rx
in constructor.
Constructing the actor with the event bus can improve testability and modularity.
54-54
: Initialize event bus in struct.
No issues seen here. The new approach aligns well with the final usage.
65-67
: Subscription method.
The subscription approach with gossipsub is straightforward. If future logic is needed for subscription side-effects, consider capturing errors or returning them explicitly.
70-93
: Process GossipPublish commands.
The handling of publish success/error and sending corresponding events vianet_bus
is clear and robust. Logging is also helpful for troubleshooting.
109-110
: Concise closure of command processing.
No further concerns here; keep it simple and consistent.
112-121
: Main loop structure withselect!
.
This approach cleanly merges incoming commands and swarm events. Double-check that all panics or critical errors are appropriately handled to avoid silent failures.
Line range hint
181-201
: ConnectionEstablished event.
Adding addresses to the Kademlia DHT is correct, and updating gossipsub with the new peer is good.
210-213
: OutgoingConnectionError handling.
Emitting an event with the failing connection’s ID is helpful for debugging.
247-247
: Gossip data forwarding.
Sending the rawmessage.data
overNetworkPeerEvent
is straightforward. Confirm that large messages won’t cause memory issues.packages/ciphernode/tests/tests/test_aggregation_and_decryption.rs (5)
482-486
: Creation of the additionalEventBus
forNetworkPeerEvent
.
This approach nicely differentiates internal enclave events from network-level events.
488-488
: NetworkManager setup.
Invokingsetup
with both event buses is consistent with the new architecture.
505-505
: Simulated gossip publish in test.
This logic nicely mimics a realistic network scenario by capturing the command incmd_rx
and then sending it to the network bus.
567-572
: Second testNetworkManager
setup.
Duplicating the pattern ensures consistent coverage. Tests appear thorough for verifying the new bus-based approach.
584-584
: Receive event from the net bus in the test.
Successfully bridgingNetworkPeerEvent::GossipData
back intoEnclaveEvent
is crucial for integration tests.packages/ciphernode/net/Cargo.toml (1)
11-12
: Verify that these dependencies are required at runtime.
Ifanyhow
andactix
usage is limited to development or testing, consider placing them under[dev-dependencies]
. Otherwise, this is acceptable if needed in production code.packages/ciphernode/enclave_core/src/aggregator_start.rs (2)
105-105
: Clean return statement maintains API contractThe variable renaming from
join_handle
tohandle
improves conciseness while maintaining the same functionality and type safety.
Line range hint
85-92
: Network setup aligns with new event bus architectureThe network setup properly integrates with the new actor-based event bus while maintaining security through cipher and proper key management.
Let's verify the network peer implementation:
✅ Verification successful
Network setup properly integrates event bus with secure peer-to-peer communication
The verification confirms that the network setup in
aggregator_start.rs
correctly implements:
- A dual event bus architecture (EnclaveEvent and NetworkPeerEvent) for proper message routing
- Secure peer management with encrypted keypair handling
- Proper async handling with tokio for network operations
- Clean separation between network management and peer communication
Key implementation details:
- NetworkManager acts as a bridge between EventBus events and Libp2p events
- Proper error handling and connection management through DialerActor
- Race condition prevention through proper async/await patterns and message passing
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify NetworkManager and NetworkPeer actor implementations # Check for proper event handling patterns and potential race conditions # Search for NetworkPeer actor implementation ast-grep --pattern 'impl Actor for NetworkPeer { type Context = $_; fn started($_) { $$$ } }' # Check for proper event bus subscription patterns rg -A 5 'impl Handler<.*Event>' packages/ciphernode/net/src/network_peer.rsLength of output: 183
Script:
#!/bin/bash # Let's try a different approach to verify the network implementation # Check NetworkManager setup and peer handling rg -A 10 "setup_with_peer" packages/ciphernode/net/src/ # Look for event bus integration patterns rg -A 5 "EventBus|bus\.send|bus\.subscribe" packages/ciphernode/net/src/ # Check network peer related structs and implementations ast-grep --pattern 'struct NetworkPeer { $$$ }' # Look for potential race conditions in async code rg -A 5 "async fn|tokio::spawn|\.await" packages/ciphernode/net/src/Length of output: 19583
@ryardley This is ready for review now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (2)
packages/ciphernode/net/src/dialer.rs (2)
16-16
: Remove unused constantThe
CONNECTION_TIMEOUT
constant is defined but never used in the code.Either remove the unused constant or implement the timeout mechanism.
119-126
: Add DNS resolution cachingThe DNS resolution logic performs lookups on every attempt without caching. This could be inefficient for repeated connection attempts.
Consider implementing a TTL-based DNS cache:
use std::time::{Duration, Instant}; struct DnsCache { ip: String, expires: Instant, } impl Dialer { fn get_resolved_multiaddr(&self, value: &Multiaddr) -> Result<Multiaddr> { if let Some(domain) = self.extract_dns_host(value) { if let Some(cached) = self.dns_cache.get(&domain) { if cached.expires > Instant::now() { return self.dns_to_ip_addr(value, &cached.ip); } } let ip = self.resolve_ipv4(&domain)?; self.dns_cache.insert(domain, DnsCache { ip: ip.clone(), expires: Instant::now() + Duration::from_secs(300), // 5 minutes TTL }); self.dns_to_ip_addr(value, &ip) } else { Ok(value.clone()) } } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
packages/ciphernode/events/src/enclave_event/mod.rs
(1 hunks)packages/ciphernode/net/src/bin/p2p_test.rs
(5 hunks)packages/ciphernode/net/src/dialer.rs
(1 hunks)packages/ciphernode/net/src/network_manager.rs
(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/ciphernode/events/src/enclave_event/mod.rs
- packages/ciphernode/net/src/network_manager.rs
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: test-net
- GitHub Check: prebuild
- GitHub Check: ci
- GitHub Check: Build Image
🔇 Additional comments (2)
packages/ciphernode/net/src/bin/p2p_test.rs (1)
50-54
: Consider memory implications of EventBus configurationThe EventBus is configured to capture history without deduplication. While this might be intentional for testing, it could lead to memory growth in long-running tests.
Consider adding a cleanup mechanism or limiting the history size. Would you like me to propose an implementation?
packages/ciphernode/net/src/dialer.rs (1)
30-33
: Consider using Actor messages instead of tx channelAs suggested in previous reviews, the
tx
channel could be replaced with Actor messages for a more consistent actor-based approach.Consider implementing a unified message type:
enum NetworkMessage { Event(NetworkPeerEvent), Command(NetworkPeerCommand) }And updating the Dialer struct:
pub struct Dialer { net_bus: Addr<EventBus<NetworkMessage>>, pending_connection: HashMap<ConnectionId, PendingConnection>, }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think we should re-consider splitting up
NetworkPeer
into "Actor" and "Inner" and use the "Actor" to do things like correlations on synchronous things like dialing and errors. We can then use the retry function we already have and call it within theDialer
actor to handle the retry functionality in a really simple and reusable way. - We should only use channels to communicate between libp2p struct and the wrapper actor and should favor a unified event for the event bus that is both commands and events. (Splitting them up is as confusing as using separate channels)
There's an issue with Which has some important implications:
and on the broadcast channel suggestion - even with a broadcast channel, we still need an mpsc channel for commands because:
The architecture would look like: |
Summary by CodeRabbit
Based on the comprehensive summary, here are the updated release notes:
Infrastructure Updates
Deployment Configuration
Network Management
DialerActor
for improved peer connection management.tokio
toactix
runtime for network operations.Code Refactoring
These changes primarily focus on improving the network infrastructure, event handling, and overall code organization.