Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove p2pservice networkcodec trait #2388

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

acerone85
Copy link
Contributor

@acerone85 acerone85 commented Oct 23, 2024

Linked Issues/PRs

Closes #2368

Description

In the P2P service, we couple together GossipSub and RequestResponse codecs in a NetworkCodec trait. In practice, this coupling is not necessary as the two codecs use different Messages and different logics for encoding/deconding messages. The only thing that they share in common is using Postcard as the data format to serialize and deserialize data, but this is not made clear in the network trait.

This PR is a rework of Codecs in the libp2p service, which deletes the NetworkCodec trait and therefore the coupling between GossipSubCodec and RequestResponseCodec. In particular:

  • A new DataFormat trait is defined for serializing deserializing any SerDe datatype, and the helper functions for postcard serialization and deserializationare moved to a structPostcardthat implements theDataFormat` trait,
  • A new BoundedCodec struct that implements the RequestResponse::Codec trait for any DataFormat supersedes the old PostcardCodec.
  • A new UnboundedCodec struct that implements the GossipsubCodec trait for any DataFormatsupersedes the oldPostcardCodec`,
  • [ ]
  • Parts specific to the Request/Response protocol have been moved to a new request_response::protocols module`,
  • The FuelBehaviour and LibP2P service have been refactored to use different codecs for Gossipsub and RequestResponse.

TODO:

  • Better Documentation
  • Better names for modules
  • Maybe move DataFormats to a different part of the codebase?

Checklist

  • Breaking changes are clearly marked as such in the PR description and changelog
  • New behavior is reflected in tests
  • The specification matches the implemented behavior (link update PR if changes are needed)

Before requesting review

  • I have reviewed the code myself
  • I have created follow-up issues caused by this PR and linked them here

After merging, notify other teams

[Add or remove entries as needed]

@acerone85 acerone85 self-assigned this Oct 23, 2024
@acerone85 acerone85 linked an issue Oct 23, 2024 that may be closed by this pull request
@acerone85 acerone85 marked this pull request as draft October 23, 2024 17:01
@acerone85 acerone85 changed the title 2368 remove p2pservice networkcodec trait Remove p2pservice networkcodec trait Oct 23, 2024
@acerone85 acerone85 changed the base branch from release/v0.40.0 to master October 23, 2024 17:01
Copy link
Contributor

@netrome netrome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I like the decoupling you're achieving here. I think we could do some simplifications by reducing the number of traits, and for example just having one GossipSubCodec trait and oneRequestResponseReadWrite trait. Both of these could be generic over the data format if needed.

Comment on lines 13 to 20
trait DataFormatCodec {
type Error;
fn deserialize<'a, R: Deserialize<'a>>(
encoded_data: &'a [u8],
) -> Result<R, Self::Error>;

fn serialize<D: Serialize>(data: &D) -> Result<Vec<u8>, Self::Error>;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure there are no data formats that take parameters? I'd err on the cautious side and accept a self parameter to these methods. This would for example allow a dynamically configurable formalt such as:

enum MyFormat {
    Postcard,
    Json
}

While I don't know by heart when or where we'd like to use this, it feels overly restrictive to explicitly rule it out. This also eliminates the need for `PhantomData` markers in our structs, which is a bit of code smell to me.
Suggested change
trait DataFormatCodec {
type Error;
fn deserialize<'a, R: Deserialize<'a>>(
encoded_data: &'a [u8],
) -> Result<R, Self::Error>;
fn serialize<D: Serialize>(data: &D) -> Result<Vec<u8>, Self::Error>;
}
trait DataFormatCodec {
type Error;
fn deserialize<'a, R: Deserialize<'a>>(
&self,
encoded_data: &'a [u8],
) -> Result<R, Self::Error>;
fn serialize<D: Serialize>(&self, data: &D) -> Result<Vec<u8>, Self::Error>;
}

Copy link
Contributor Author

@acerone85 acerone85 Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I like the suggestion. I have made the change here: c564941 and 138716b. Note that the DataFormatCodec trait has been removed in favour of two Encode and Decode traits, following what has been done in the storage crate.


#[derive(Debug, Clone)]
pub struct BoundedCodec<Format> {
_format: PhantomData<Format>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following my other suggestion I'd suggest skipping the PhantomData here. If the Format is an empty struct, there's no cost to holding it directly.

Suggested change
_format: PhantomData<Format>,
_format: Format,

Copy link
Contributor Author

@acerone85 acerone85 Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, see c564941 and 138716b

/// If the substream was not properly closed when dropped, the sender would instead
/// run into a timeout waiting for the response.
#[async_trait]
impl<Format> request_response::Codec for BoundedCodec<Format>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming feels off here. A trait named Codec should have encode/decode or serialize/deserialize methods. This trait contains read/write methods and seem to be representing primarily I/O operations. I get that this comes from p2p, but perhaps we should find another name for BoundedCodec. Right now I'd expect some sort of similarity between the BoundedCodec and UnboundedCodec types, but they feel very different in what they do with the current implementations.

Perhaps I'm just not understanding this well enough right now. I'd love to hear your thoughts on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I had a lot of troubles trying to find the right terminology (I also did not like BoundedCodec and UnboundedCodec. In the last version I have made the following changes:

  • DataFormatCodec has been replaced by Encode and Decode traits,
  • BoundedCodec is now a RequestResponseMessageHandler. I still don't like the Handler part in the name as it might make people think that it's something akin to a task, but I could not find anything better. The data_format field has been renamed to codec
  • UnboundedCodec is now GossipsubMessageHandler, and the data_format field has been renamed to codec,
  • The Format type parameter has been renamed to codec.
  • I did not rename the GossipsubCodec trait, suggestions for better names are very welcome.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commits are ead761a and c5aa362

Comment on lines 216 to 217
gossipsub_codec: UnboundedCodec<PostcardDataFormat>,
request_response_codec: BoundedCodec<PostcardDataFormat>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that these are the only two instances of using these codecs, perhaps we could simplify the implementation with more specific traits. What if we just renamed UnboundedCodec -> GossipsubCodec (perhaps we only need one trait and not two traits here), and renamed BoundedCodec -> RequestResponseReadWrite (or something similar)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed the suggestion: c5aa362 and ead761a

use std::io;

trait DataFormatCodec {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just renaming this DataFormat?

trait DataFormatCodec {
type Error;
fn deserialize<'a, R: Deserialize<'a>>(
encoded_data: &'a [u8],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be generalized to a impl std::io::Read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow completely here, do you mean changing the type of econded_data? If we do so, we need to allocate a second buffer where to read the value into. Or do you have something else in mind?

Copy link
Contributor

@netrome netrome Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow completely here, do you mean changing the type of econded_data?

Yes, the suggestion was taking encoded_data: impl std::io::Read + 'a, instead of explicitly taking a slice here since it allows users to pass in any reader, which is more flexible. However, I see that we use AsyncRead and not Read for the network communication which would be a bit more messy so there's no immediate benefit in doing this right now anyway.

If we do so, we need to allocate a second buffer where to read the value into. Or do you have something else in mind?

That shouldn't be necessary since &[u8] also implements Read, so all current calls should still work. Or perhaps I'm misunderstanding you here.

encoded_data: &'a [u8],
) -> Result<R, Self::Error>;

fn serialize<D: Serialize>(data: &D) -> Result<Vec<u8>, Self::Error>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be generalized by taking a impl std::io::Write input which it writes to. Then a convenience function serialize_to_vec could be provided with this signature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been changed to reflect what's been done in the Storage crate. We now have an Encoder trait that is used as the type where the result is stored, and this is implemented for Cow<'_, [u8]>. I could in theory change this to have the input where the serialized data is written passed into the function, but this would create me trouble when I will try to merge the traits in the storage trait and this PR together (see new issue #2403)

@acerone85 acerone85 marked this pull request as ready for review October 29, 2024 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove P2PService NetworkCodec trait
2 participants