Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCP/PROTO/RECONFIG: Fix copy header handling in reconfig. #10452

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

ofirfarjun7
Copy link
Contributor

@ofirfarjun7 ofirfarjun7 commented Jan 26, 2025

What?

  • Handle copy of AM user header in ucp_proto_reconfig_progress.
  • Refactor code
  • Handle request when status != UCS_OK correctly.

Why?

When using sockaddr to establish connection requests go into pending queue but user header is not being copy even if UCP_AM_SEND_FLAG_COPY_HEADER is enabled. It breaks API and can cause data corruption.

fix #10424

How?

Copy user header to internal UCX buffer if needed like done in other protocols.

@ofirfarjun7 ofirfarjun7 added the WIP-DNM Work in progress / Do not review label Jan 26, 2025
@ofirfarjun7 ofirfarjun7 removed the WIP-DNM Work in progress / Do not review label Jan 27, 2025
@ofirfarjun7 ofirfarjun7 requested a review from gleon99 January 27, 2025 09:38
@@ -51,6 +53,15 @@ static ucs_status_t ucp_proto_reconfig_progress(uct_pending_req_t *self)
return UCS_OK;
}

if (ucs_unlikely(ucp_proto_config_is_am(req->send.proto_config) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be ucp_proto_request_restart() call (from wireup) on this request before any other AM proto calls, but after ucp_proto_reconfig_progress()? If yes I suspect this could be an issue because of:
-

ucs_assert(!(request->flags & UCP_REQUEST_FLAG_USER_HEADER_COPIED));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see ucp_proto_request_restart calls reset then proto selection is done and ucp_request_send is called again.
If it choose ZCOPY then ucp_am_eager_zcopy_pack_user_header will be called and will set this flag to 0, but perhaps I'm wrong.

&sb[0], size, &param);
if (flags & UCP_AM_SEND_FLAG_COPY_HEADER) {
ucs::fill_random(shdr_cpy);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe fill with specific value, and also free?


#include <ucp/core/ucp_worker.inl>
#include <ucp/am/ucp_am.inl>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Order

@@ -11,8 +11,10 @@
#include "proto_debug.h"
#include "proto_select.h"
#include "proto_common.inl"
#include "proto_am.inl"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Order

if (ucp_proto_config_is_am(req->send.proto_config)) {
ucp_am_release_user_header(req);
}
ucp_request_complete_send(req, status);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add blank line

static ucs_status_t ucp_proto_reconfig_progress(uct_pending_req_t *self)
{
ucp_request_t *req = ucs_container_of(self, ucp_request_t, send.uct);
ucp_ep_h ep = req->send.ep;
ucs_status_t status;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move down

@@ -2526,6 +2526,7 @@ class test_ucp_sockaddr_protocols : public test_ucp_sockaddr {
std::string sb(size, 'x');
std::string rb(size, 'y');
std::string shdr(hdr_size, 'x');
std::string shdr_cpy = shdr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpy -> copy

&sb[0], size, &param);
if (flags & UCP_AM_SEND_FLAG_COPY_HEADER) {
shdr_cpy[0] = 'X';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an explanation (comment)

* To check UCP_AM_SEND_FLAG_COPY_HEADER we change AM header
* content while the request is still in pending queue.*/
if (flags & UCP_AM_SEND_FLAG_COPY_HEADER) {
shdr_copy[0] = 'X';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. the shdr_copy should remain the original value and shdr should be "corrupted"
  2. i would fill entire "corrupted" header with aaa..

UCP_AM_SEND_FLAG_COPY_HEADER))) {
status = ucp_proto_am_req_copy_header(req);
if (ucs_unlikely(status != UCS_OK)) {
return status;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if copy fails we should abort the operation and return OK from progress. pls see how it's done in other places ucp_proto_am_req_copy_header is used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After checking code again we noticed that also for other protocols in protov2 status != UCS_OK case is not handled properly. I will fix it and add it to this PR.

return UCS_OK;
}

if (ucs_unlikely(ucp_proto_config_is_am(req->send.proto_config) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not fast path so ucs_unlikely is not needed here

@ofirfarjun7 ofirfarjun7 added the WIP-DNM Work in progress / Do not review label Jan 29, 2025
@ofirfarjun7 ofirfarjun7 added WIP-DNM Work in progress / Do not review and removed WIP-DNM Work in progress / Do not review labels Jan 29, 2025
@ofirfarjun7 ofirfarjun7 removed the WIP-DNM Work in progress / Do not review label Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UCP_AM_SEND_FLAG_COPY_HEADER not working as expected with protov2
4 participants