-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BatchProcessor shutdown issue #2478
Comments
Good point, I observed this while doing unit-tests for dedicated thread span processor. The separate control channel for shutdown/flush looks to be good idea - reading control channel first. However, there may now be some log events in data-queue arrived before shutdown message. We need to somehow ensure to also drain those message for export, apart from one existing in internal vector. |
There is another issue with the current implementation of
I have created #2494 to show one possible approach to handle both the shutdown issue and the background thread's utilization issue. |
For missing shutdown/forceflush, probably we can just use a CondVar instead of a separate channel? As it should be fine to aggregate multiple flushes into one. Ideally, we could replace the channel here by our own circular buffer like C++/C# to avoid the potential excessive synchronization. |
There could also be another issue about BatchProcessor. As it spawns its own thread and the other threads are not blocked before it is ready, the shutdown or log will fail before that. I think we can add an extra synchronization when creating the background thread to notify the user thread that the BatchProcessor is ready. |
I don’t think this would really be an issue. In the current design, the background thread for the BatchLogProcessor is ready to handle requests as soon as the LoggerProvider (or OpenTelemetryTracingBridge) is fully initialized. It’s important for the application to make sure that shutdown or force_flush are only called after everything is properly set up. If those methods are called too early, they’d essentially run on Noop objects, which should be expected behavior. |
Is that something users have to worry about? They can call shutdown only on an instance of TracerProvider, and they won't have TracerProvider unless the processors (including Batch) are already setup, which would not be setup without the required threads being spun up. |
Yes, was thinking about |
BatchProcessor relies on a channel to pass messages (which includes Log/Span, and also control signals like Shutdown/Flush). If the channel is full, then Shutdown() signals will get lost. For example, the buffer got 500 items, and the thread is doing blocking export() call. During this time, 2000 items got into the channel. Any new items are dropped, which is by design. But if a shutdown is initiated during this time, that message is lost as well. When the thread is back to processing the queue, it'll drain the normal telemetry items, but shutdown signal is lost forever. Even when shutdown message is not lost, the message gets to the last position in the queue, so it'll be processed after all other export messages are processed (potential after multiple exports()).
May need to use a separate channel for control flow like shutdown/flush to mitigate the issue. This may not be easy to fully eliminate due to blocking call being made, and anything occurring during this time is hard to respond to.
PeriodicReader has a similar problem, but less severe as the Shutdown message will not be processed, but reacted to immediately after the blocking export is done.
The text was updated successfully, but these errors were encountered: