-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RUMM-1609 Invoke Kronos from a single thread #600
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we managed to reproduce this issue and are we sure this is the right fix 🤔💭?
This TSAN issue in Kronos is raised by two simultaneous operations accessing the same memory:
self.stableTime
write - coming fromClock.sync
completion (called multiple times while sync is pending!);self.stableTime
read - coming from accessingClock.now
.Digging deeper into the completion block execution in Kronos, it leads to this:
which looks like always running the completion callback on the main thread. This is in fact what I can observe when putting two basic
print()
statements in our code and runningExample
project:To add further, Kronos maintainers say they call both APIs from the main thread:
So, by using the
queue
, we only synchronize calls toClock.sync
andClock.now
(which is not the issue), but we do not force thecompletion
block to be called on the same thread as we later useClock.now
. The completion is always called on themain
thread.Before going further - does my observation make sense @maxep @buranmert ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point! This wasn't my understanding of this comment:
So technically right now we expect you to call .now and .sync from the same thread.
If actually the
.now
should be called from the same thread as the completion block of.sync
, then yes it should be called on the.main
. Seeing this, your observation makes total sense! Now, having to callNTPServerDateProvider.currentData
on the main thread will requireasync
for sure. I will also look at submitting a change to Kronos.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think these are valid choices.
IMO fixing the issue in Kronos seems more correct - right now it seems to enforce main thread access to
Clock.now
,Clock.annotatedNow
andClock.timestamp
to all users. As this is not stated in their public doc this might be a bug and we could contribute.However, if we want to fix it on our side, IMO
queue.async {}
is not an easy way to go. That might enforce quite refactoring to data collection system, as we readcurrentDate()
synchronously in many places.A better idea would be to use our
ValuePublisher
which is meant for building async / sync bridge in a performant way. It uses concurrent queue optimised for simultaneous reads and exclusive writes, which fits perfectly our usage ofClock.sync
(write) andClock.now
(read):Clock.sync
is used only once, inDatadog.initialize()
, to receive the real NTP time;Clock.now
is called from different threads for every event we create (log / span / rum).With above constraints, and
ValuePublisher
, we could try something this inNTPServerDateProvider
:This way, the real (NTP) time will be available after sync completes and all should be thread-safe. This would be regressive to what we have now, where
Clock.now
is available earlier than completion and it becomes more precise while NTP sync continues (to become stable when sync ends). To fill this gap further, we could have a look at Kronosfirst
completion API and even look closer at theirUserDefaults
capability. By using both to additionally feedserverTimePublisher
, we should achieve similar behaviour to what we have now, in a thread-safe manner and minimal change.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for pointing out the
ValuePublisher
, this would be a great approach indeed. I will first update this PR with a proposal and then look at modifying Kronos 👍