-
Notifications
You must be signed in to change notification settings - Fork 395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prov/verbs: add param for setting RDMA CM ToS #10360
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general. Just some nitpicks.
Please add a description of the new variable to the verbs provider man page. You can also drop the "RFC" part from the title. |
80995b1
to
20fac51
Compare
Done. Also rebased and retested. |
The fi_tx_attr includes a tclass value that can be used to control the service level. It supports DSCP via helper functions. Can this be used instead? |
bot:aws:retest |
I do think this is a good idea. I have some questions regarding how to map these libfabric tclass values to an appropriate "TOS" value though. The RDMA CM API, like to the regular socket API, allows you to specify an 8 bit "TOS" value. However, the definition of this octet has changed over the years and seems to have converged on two incompatible encodings:
For either encoding, the value is set using the same API (DSCP values need to be left shifted by 2 first) and there's unfortunately no way to indicate the encoding. I assume that the RDMA CM TOS is semantically similar to the regular socket TOS in this regard. DSCP is the more modern encoding, but some places still assume that this value is an RFC 1349 TOS. For example, Linux assumes that the value specified in a With a regular IPv4 socket, this can be worked around by just setting the priority afterwards with I'm wondering if it makes sense to support a few options here:
WDYT? |
Hi @jakemoroni, I came across your PR while searching for a solution to set ToS/DSCP values to RoCE endpoints. For ToS/DSCP, can't we simply pass the output of |
@jakemoroni I have been working on the changes to implement ToS using With these changes, setting ToS/DSCP then becomes super easy as the application is just required to call Note 1: In my changes, the DSCP/ToS value is not left shifted by 2. This operation must be done by the application. |
Hey @sydidelot, Thanks for the input!
A few questions:
|
Hey @jakemoroni. Please accept my apologies for the late reply.
I guess that's fine if applications pay the price of
Right, that's a risk. @j-xiong what is your opinion on this point? Shall we enforce
For widows we could return
I have no idea how existing applications are using
Yes, I think we should focus on DSCP/ToS first and address other tclass values later. |
By default, if the RDMA CM application does not explicitly set a ToS, then the system default is used. This system default can be changed via the default_roce_tos configfs param, but this is global to the system and is somewhat cumbersome to deal with. This change introduces a new environment param: FI_VERBS_TOS This param can be used to explicitly set the ToS via the rdma_set_option API. If the ToS is set, then the system default is ignored and the new value is used instead. This allows for multiple concurrent workloads to use different ToS values. Valid range is -1 through 255. If unset or set to -1, then the call to rdma_set_option is omitted, thus preserving existing behavior. Not supported/tested on Windows. Signed-off-by: Jacob Moroni <[email protected]>
Opens:
rdma_set_option
so it should build and have no effect if no ToS is set. If a ToS is set, then it should result in a warning.verbs_domain_xrc.c
. They may not all be necessary.FI_VERBS_TOS
param takes a range of -1 through 255, with -1 being a special value to indicate "no ToS set" (default), which allows the call tordma_set_option
to be completely skipped. My original plan was to just have 0 be the default and always callrdma_set_option
but this would introduce a behavioral change on systems where the default is non-0 and would add extra overhead in scenarios where the ToS is a don't care.By default, if the RDMA CM application does not explicitly set a ToS, then the system default is used. This system default can be changed via the default_roce_tos configfs param, but this is global to the system and is somewhat cumbersome to deal with.
This change introduces a new environment param: FI_VERBS_TOS
This param can be used to explicitly set the ToS via the rdma_set_option API. If the ToS is set, then the system default is ignored and the new value is used instead.
This allows for multiple concurrent workloads to use different ToS values.
Valid range is -1 through 255. If unset or set to -1, then the call to rdma_set_option is omitted, thus preserving existing behavior.