-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing values in traceQL metrics #4176
Comments
Hi thanks for the detailed information. Agree it does seem like metrics are missing, as the screenshot of search results has spans within the same time range as the metrics query. You have already covered the basics including enabling the local-blocks processor and flush_to_storage. The next thing to check is the in-memory queue between the distributors and the generators. If the queue is full then the distributors will not be able to send incoming spans to the generators, and therefore they will not get flushed to storage. It would show up as missing metrics, but search still works because it reads the data flushed by the ingesters. Please take a look at the following metrics and see if there are any discards:
Let's check these 2 metrics and see if those are happening. |
I can try to check what's inside the blocks in S3 with Pandas or other tools, but I don't know how to find blocks generated by metrics generator and distinguish them from ingester blocks |
ok. Looks like the issue was about "filter_server_spans". I set it to false and now it works as expected. As I understand now, the rare spikes in non-server spans I saw were parent spans of non-server kind. I hope I am the only one who was confused by "only parent spans" part because I knew about 'filter_server_spans' setting, but was sure I have no issues with it because I saw non-server spans in the output. Eventually, it appeared I missed that part about "only parent spans or spans with the SpanKind of |
Describe the bug
We are currently testing traceQL metrics as it's one of key features our developers need and I noticed strange behavior.
We have missing values with traceQL metrics. There are no values with any metrics function sometimes even when I definitely know there are spans (by using search without metrics functions). It looks like this (Last hour timeframe):
Just a short line. It does not matter if it's rate() or quantile_over_time
If I remove 'span.db.system="mongodb"' from condition I will see the 1 hour of metrics.
I am sure there are data when metrics are empty because if I select timeframe without metrics and just run a search, I easily find 500 traces (each contains multiple spans matching condition)
To Reproduce
Hard to say. Usually I see broken metrics when trying to get metrics from spans of "client" kind or "consumer". Not sure if it's related, maybe there are just less of them.
Expected behavior
If there are spans I want to see metrics generated from them.
Environment:
Additional Context
I don't see anything suspicious in logs. However, I have "warn" level. I'll try with info.
Seems like it does not matter if data is on metrics-generator or on backend. When I run the queries for the data which is definitely in S3 it looks a bit better, but still usually the beginning and the last 20 minutes of hour is empty.
Here are parts of my tempo.yaml:
The text was updated successfully, but these errors were encountered: