-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RUMM-1609 Invoke Kronos from a single thread #600
Conversation
@ncreated @buranmert we could also make |
/// Queue used for invoking Kronos clock. | ||
private let queue = DispatchQueue( | ||
label: "com.datadoghq.date-provider", | ||
target: .global(qos: .userInteractive) | ||
) | ||
|
||
func synchronize(with pool: String, completion: @escaping (Date?) -> Void) { | ||
queue.async { | ||
Clock.sync(from: pool, completion: { serverTime, _ in | ||
completion(serverTime) | ||
}) | ||
} | ||
} | ||
|
||
func currentDate() -> Date? { | ||
return Clock.now | ||
queue.sync { Clock.now } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we managed to reproduce this issue and are we sure this is the right fix 🤔💭?
This TSAN issue in Kronos is raised by two simultaneous operations accessing the same memory:
self.stableTime
write - coming fromClock.sync
completion (called multiple times while sync is pending!);self.stableTime
read - coming from accessingClock.now
.
Digging deeper into the completion block execution in Kronos, it leads to this:
CFHostSetClient(hostReference, callback, &clientContext)
CFHostScheduleWithRunLoop(hostReference, CFRunLoopGetMain(), CFRunLoopMode.commonModes.rawValue)
which looks like always running the completion callback on the main thread. This is in fact what I can observe when putting two basic print()
statements in our code and running Example
project:
To add further, Kronos maintainers say they call both APIs from the main thread:
In our app we only ever call this from the main thread. I think we'd be happy to support this if you wanted to submit a change for it though
So, by using the queue
, we only synchronize calls to Clock.sync
and Clock.now
(which is not the issue), but we do not force the completion
block to be called on the same thread as we later use Clock.now
. The completion is always called on the main
thread.
Before going further - does my observation make sense @maxep @buranmert ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point! This wasn't my understanding of this comment: So technically right now we expect you to call .now and .sync from the same thread.
If actually the .now
should be called from the same thread as the completion block of .sync
, then yes it should be called on the .main
. Seeing this, your observation makes total sense! Now, having to call NTPServerDateProvider.currentData
on the main thread will require async
for sure. I will also look at submitting a change to Kronos.
I'm not able to reproduce but I will do more investigation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, having to call NTPServerDateProvider.currentData on the main thread will require async for sure. I will also look at submitting a change to Kronos.
Yes, I think these are valid choices.
IMO fixing the issue in Kronos seems more correct - right now it seems to enforce main thread access to Clock.now
, Clock.annotatedNow
and Clock.timestamp
to all users. As this is not stated in their public doc this might be a bug and we could contribute.
However, if we want to fix it on our side, IMO queue.async {}
is not an easy way to go. That might enforce quite refactoring to data collection system, as we read currentDate()
synchronously in many places.
A better idea would be to use our ValuePublisher
which is meant for building async / sync bridge in a performant way. It uses concurrent queue optimised for simultaneous reads and exclusive writes, which fits perfectly our usage of Clock.sync
(write) and Clock.now
(read):
Clock.sync
is used only once, inDatadog.initialize()
, to receive the real NTP time;Clock.now
is called from different threads for every event we create (log / span / rum).
With above constraints, and ValuePublisher
, we could try something this in NTPServerDateProvider
:
internal class NTPServerDateProvider: ServerDateProvider {
private let serverTimePublisher: ValuePublisher<Date?> = ValuePublisher(initialValue: .empty)
func synchronize(with ntpPool: String, completion: @escaping (Date?) -> Void) {
Clock.sync(from: ntpPool, completion: { serverTime, _ in
self.serverTimePublisher.publishAsync(serverTime)
completion(serverTime)
})
}
func currentDate() -> Date? {
return serverTimePublisher.currentValue
}
}
This way, the real (NTP) time will be available after sync completes and all should be thread-safe. This would be regressive to what we have now, where Clock.now
is available earlier than completion and it becomes more precise while NTP sync continues (to become stable when sync ends). To fill this gap further, we could have a look at Kronos first
completion API and even look closer at their UserDefaults
capability. By using both to additionally feed serverTimePublisher
, we should achieve similar behaviour to what we have now, in a thread-safe manner and minimal change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for pointing out the ValuePublisher
, this would be a great approach indeed. I will first update this PR with a proposal and then look at modifying Kronos 👍
Closing in favor of #607 |
What and why?
As per this issue comment,
Clock.now
and.sync
from Kronos are expected to be called from the same thread.Fix #588
How?
Invoke Kronos from a dedicated
DispatchQueue
.Review checklist