Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace refactored #116

Merged
merged 27 commits into from
Feb 22, 2024
Merged

Trace refactored #116

merged 27 commits into from
Feb 22, 2024

Conversation

VladoLavor
Copy link
Collaborator

@VladoLavor VladoLavor commented Mar 17, 2023

The way how the API trace is initialized was changed - the trace is no longer initialized (and immediately disabled) by the connection. Instead, the user creates a new trace object and bounds it to a connection.

API changes:

  • trace is no longer available from the connection object via the Trace() method.
  • Enable(bool) was removed from the Trace API. Instead, the method NewTrace(connection, size) is added.
  • Added method Close() to release resources used by the tracer.

Workflow changes:

  • tracer has a maximum size
  • trace is done after the vppClient.SendMsg and the succeeded/failed status of the message is recorded
  • instead of appending to a list, the connection sends records to the buffered channel
  • GetRecords waits until all incoming messages are processed before returning the list

Signed-off-by: Vladimir Lavor [email protected]

Signed-off-by: Vladimir Lavor <[email protected]>
Signed-off-by: Vladimir Lavor <[email protected]>
Signed-off-by: Vladimir Lavor <[email protected]>
core/trace.go Outdated Show resolved Hide resolved
core/trace.go Outdated Show resolved Hide resolved
@ondrej-fabry
Copy link
Member

ondrej-fabry commented Jun 8, 2023

I have tested this locally repeatedly (-count=10000) with race checker enabled (-race) and it passed OK, so I assume this will resolve the issue with intermittent unit test failures.

However I have a few comments:

  1. The Tracer usage makes me a bit uneasy, it is referenced from the trace field of Connection, which is handled without any lock and could potentially run into race issues when the trace records are being created.
  2. The API of Tracer does seem a bit limiting. For example, what about the overwriting of records? It might be more important for the user to have the last N trace records when running into some issue. Otherwise it is needed to continually call Clear or re-create the tracer, while user cannot reliably know when the records storage is full.
  3. The Tracer always stores all of the requests occurring on the connection, which can easily pollute with records uninteresting to user (e.g. keepalive pings, other non-important channels, ..), since the filtering of records for a specific channel happens when user retrieves the records and not when they are being stored
  4. The Record is missing some info about the requests/replies which might be important to the user, to be specific:
    • request context - used for multiplexing multiple parallel requests on connection,
      it actually consists of these attributes: sequenceNumber + channelId + isMultiRequest
    • message ID - identifies the message (might be different on each VPP run)
    • data length - how much data is being sent/received
    • data - the actual data sent/received (might be useful for debugging a bug in encoding/decoding)

@sknat could you take a look and do a quick review as well?

@VladoLavor
Copy link
Collaborator Author

@ondrej-fabry here are some of my thoughts:

  1. Do you mean the issue that should be fixed by this patch? The trace manages all locks by itself.
  2. I agree with this point. The initial tracer was not implemented with all the bells and whistles and its API could be much more versatile.
  3. Valdi point. The per-channel filter was based on the specific scenario where all records were needed at first and later sorted by channel. But the sorting can be done easily by the user if needed. Filtering should be made before messages are stored.
  4. +1

Signed-off-by: Vladimir Lavor <[email protected]>
@ondrej-fabry ondrej-fabry mentioned this pull request Jul 10, 2023
@sknat
Copy link
Contributor

sknat commented Dec 18, 2023

Following up on the discussion we had last week - sorry for the really long review cycle -
and after having played with it quite a bit I guess my position so far align:

  1. I think I can also see edge cases happening where you would register a tracer with API messages in flight i.e. receive a message for which we didn't Add() the WaitGroup.
  2. Also aligned here, although it is still unclear to me whether we should aim at having a tracer implementation with bells and whistles in this repository or if we should just export a channel and let users implement their own traces.
  3. Filtering might not be required in all cases, although I am under the impression the current implementation is rather specific (i.e. asserting a certain number of messages in advance).

That said I think we can move ahead with this patch as is, and evolve it when adding other usecases.

sknat
sknat previously approved these changes Feb 8, 2024
Copy link
Contributor

@sknat sknat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

sknat
sknat previously approved these changes Feb 22, 2024
Copy link
Contributor

@sknat sknat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks

Signed-off-by: Vladimir Lavor <[email protected]>
@ondrej-fabry ondrej-fabry merged commit 553f5ca into master Feb 22, 2024
9 checks passed
@ondrej-fabry ondrej-fabry deleted the trace-fix branch February 22, 2024 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Investigate intermittent failure of TestTraceEnabled
3 participants