Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-32822 Fix MP protocol error logged indefinitely in loop #19207

Merged

Conversation

jakesmith
Copy link
Member

@jakesmith jakesmith commented Oct 18, 2024

The detection of a MP packet with an invalid header threw an exception that was logged, but did not close the socket. When the client closed the socket, the MP server was notified but because it had nothing left to read of the [bad] header, it saw the invalid protocol error again and rethrew the same error. The epoll handler loop continuously notified the handler of the close event (because it did nothing with it), and the protocol error was continuously output. The socket should be close on this and any other exception. In case these events are too frequent, add a timer to log the protocol errors less frequently, and log the header bytes received to help diagnose what the source of these 'rogue' connections are.

Signed-off-by: Jake Smith [email protected]

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-32822

Jirabot Action Result:
Assigning user: [email protected]
Workflow Transition To: Merge Pending
Updated PR

@jakesmith jakesmith requested a review from ghalliday October 18, 2024 08:37
if (periodicTimer.hasElapsed())
{
StringBuffer packetHdrBytes("Packet Header bytes: ");
hexdump2string((byte const *)&hdr, sizeof(hdr), packetHdrBytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be std::min(packetHdrBytes, sizeRead)?

Also, should the number of bytes be logged in the message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is guaranteed to have read exactly sizeof(hdr) at this point, if it was less remaining would be >0 and it wouldn't reach here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should the number of bytes be logged in the message?

could make hexdump2string prefix message with # bytes..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have changed hexdump2string to include # bytes

@jakesmith jakesmith requested a review from ghalliday October 18, 2024 11:27
@jakesmith
Copy link
Member Author

in testing, there was another (likely the main) issue that causes repeated tracing of this issue.

After the exception, the socket was left opened.
When the client closed the socket, it hit notifySelected, but had no remaining bytes to read (so didn't read anything), but had a activemsg (with the invalid hdr) so repeated the check and threw another exception.
The epoll notify loop then repeated the notification of the close (0 bytes) and the process repeated infinitely!

@jakesmith jakesmith force-pushed the HPCC-32822-protocol-error branch from cea0719 to 285f454 Compare October 18, 2024 11:54
@jakesmith jakesmith changed the title HPCC-32822 Suppress frequent MP protocol error logging HPCC-32822 Fix MP protocol error logged indefinitely in loop Oct 18, 2024
The detection of a MP packet with an invalid header threw
an exception that was logged, but did not close the socket.
When the client closed the socket, the MP server was notified
but because it had nothing left to read of the [bad] header,
it saw the invalid protocol error again and rethrew the
same error.
The epoll handler loop continuously notified the handler
of the close event (because it did nothing with it), and
the protocol error was continuosly output.

The socket should be close on this and any other exception.

In case these events are too frequent, add a timer to log
the protocol errors less frequently, and log the header
bytes received to help diagnose what the source of these
'rogue' connections are.

Signed-off-by: Jake Smith <[email protected]>
@jakesmith jakesmith force-pushed the HPCC-32822-protocol-error branch from 285f454 to 81b3e2e Compare October 18, 2024 12:19
@ghalliday ghalliday merged commit afff3f8 into hpcc-systems:candidate-9.8.x Oct 18, 2024
49 checks passed
Copy link

Jirabot Action Result:
Added fix version: 9.8.32
Workflow Transition: 'Resolve issue'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants