Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve RPC Client Response RX pipeline #228

Closed
serges147 opened this issue Sep 13, 2024 · 1 comment · Fixed by #231
Closed

Improve RPC Client Response RX pipeline #228

serges147 opened this issue Sep 13, 2024 · 1 comment · Fixed by #231
Assignees

Comments

@serges147
Copy link
Contributor

Currently I observe RPC client response timeouts b/c of periodic drops of incoming RX frames. Here is PDF diagram build on candump analysis (attached as well).
Zubax_LibCyphal_CanTraffic.pdf
candump13-2024-09-11_163217.log
candump02-2024-09-11_163221.log

I'm not sure whether it is correct fix or not but the following patch makes such timeouts go away:
image

@pavel-kirienko
Copy link
Member

The !same_transport condition is redundant because a frame with a matching transfer-ID would be accepted if it was the same iface anyway. This leaves us with two preconditions: whether the transfer-ID matches the expectation and whether the reassembly state machine is not currently mid-transfer.

The removal of the constraint that a new transfer must arrive from the same interface as the previous one (the same can be achieved by removing the correct_iface constraint from rxSessionUpdate) is generally not sound, because it may cause transfer duplication if the non-current interface is lagging behind the current one by exactly 32 transfers, which may occur in certain failure modes.

A possible solution that does not introduce undesirable side effects is to add a new acceptance condition:

    const bool restartable = (same_transport && tid_new) ||      // unchanged
                             (same_transport && tid_timeout) ||  // unchanged
                             (tid_timeout && tid_new) ||         // unchanged
                             (tid_timeout && tid_match && idle);  // new

where:

const bool tid_match = rxs->transfer_id == frame->transfer_id;
const bool idle = 0U == rxs->total_payload_size;

The purpose of tid_match is to eliminate duplicates arriving from redundant interfaces and to make the condition more selective such that it only activates if the transfers arrive in strictly the correct ordering. Without this check, per the diagram, we would accept the transfer with TID=0 twice. The idle condition is needed to prohibit changing the active interface mid-transfer as it may cause both old and new transfers to be lost.

Can you please check this?

@serges147 serges147 self-assigned this Sep 24, 2024
pavel-kirienko added a commit that referenced this issue Sep 24, 2024
added one more case when it's allowed to "restart" transfer reassembly with different transport interface

see #228

---------

Co-authored-by: Pavel Kirienko <[email protected]>
github-merge-queue bot pushed a commit to OpenCyphal-Garage/libcyphal that referenced this issue Sep 25, 2024
… some todos. (#384)

- Addressed several TODOs:
  - // TODO: Switch to `transport::PayloadFragments`.
  - // TODO: Move `tryDeserialize` method to some common place
- Added new `TestCanSvcRxSessions::receive_multiple_tids_frames` -
reproduces exactly what is described at the
OpenCyphal/libcanard#228
- Added new `example_1_presentation_3_hb_getinfo_ping_linux_can.cpp`
example app - it's similar to
`example_1_presentation_1_ping_user_service_udp.cpp` but with CAN
transport.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants