You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
We have been experiencing periodic slowness and spikes in resource usage on search node, and We can't seem to figure out why. There have been several ideas and theories postulated, but we do not really have a good way to understand what the search nodes are doing to confirm or deny them. Additionally, we are unable to correlate certain operations to slowness or resource spikes well enough to rule various theories out.
Describe the solution you'd like
Histogram of the number of splits utilized during the execution of a search query - We are trying to gain better insight into the number of splits being accessed per query. If we can assert that, in general search queries are targeting a high number of splits which can slow down search, we know we need to adjust what we are doing to get more docs per split.
Counter tracking the number of keys being evicted from the various caches, labeled by cache type. In our case , the split cache is (I think) the more important one. We routinely see fairly long periods of both high memory and CPU and the graph curve for them on a node is almost identical. Our split cache would appear to be full, and we have a decent hit ratio. But we aren't able to understand how frequently or how many items are being evicted from cache. Being able to correlate a high rate of cache evictions to slowness or high resource utilization in contrast to just a spike in cache misses it means we may need to expand how much disk is allocated for caching.
Histogram observing search execution time. Buckets ranging upward of 30sec. We have resorted to utilizing the robust monitoring infrastructure in our stack to figure out roughly how long a search query is taking. However, it shouldn't be expected of all users of quickwit to have that readily available or setup - quickwit should be able to report how long a search request is taking.
Describe alternatives you've considered
We rely on external monitoring tools to capture network timings and manually record the took value returned by the Elasticsearch API as an outward facing indicator.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
We have been experiencing periodic slowness and spikes in resource usage on search node, and We can't seem to figure out why. There have been several ideas and theories postulated, but we do not really have a good way to understand what the search nodes are doing to confirm or deny them. Additionally, we are unable to correlate certain operations to slowness or resource spikes well enough to rule various theories out.
Describe the solution you'd like
Histogram of the number of splits utilized during the execution of a search query - We are trying to gain better insight into the number of splits being accessed per query. If we can assert that, in general search queries are targeting a high number of splits which can slow down search, we know we need to adjust what we are doing to get more docs per split.
Counter tracking the number of keys being evicted from the various caches, labeled by cache type. In our case , the split cache is (I think) the more important one. We routinely see fairly long periods of both high memory and CPU and the graph curve for them on a node is almost identical. Our split cache would appear to be full, and we have a decent hit ratio. But we aren't able to understand how frequently or how many items are being evicted from cache. Being able to correlate a high rate of cache evictions to slowness or high resource utilization in contrast to just a spike in cache misses it means we may need to expand how much disk is allocated for caching.
Histogram observing search execution time. Buckets ranging upward of 30sec. We have resorted to utilizing the robust monitoring infrastructure in our stack to figure out roughly how long a search query is taking. However, it shouldn't be expected of all users of quickwit to have that readily available or setup - quickwit should be able to report how long a search request is taking.
Describe alternatives you've considered
We rely on external monitoring tools to capture network timings and manually record the
took
value returned by the Elasticsearch API as an outward facing indicator.The text was updated successfully, but these errors were encountered: