-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregator is too costly #173
Comments
what is the cost of one thread? |
We just doubt it causes too much context switch. In our server, there is only statsd server and carbon-c-relay, but the
the csw is too high compared to other servers. The context switch is currently not causing any problem, it is just unexpected one rule will start a thread. How about this thread model:
Currently, the thread 3 may be a bottleneck if there are too many rules. For example, aggregate #168. We may solve it by this:
Aggregator will also cache |
Does csw include thread switches? A thread is not a process. Your thread model is close to how it is currently implemented with -w1. The aggregator thread is necessary to "expire" the metrics. This is unrelated to the input, therefore a separate thread. Sharing the aggregator work is very hard, because the load from aggregations doesn't come from multiple aggregation rules, but from thousands or more expansions (computes) from a single aggregation rule. |
If the regex matching result is cached, aggregator computing is lightweight, there only remains |
Every rule starts a thread just seems scary. When the rules are many, then the communication and synchronization will become the bottleneck, because every metric has to be sent to all aggregators and the result of aggregators have to be sent to the frontend. Currently, the bottleneck may be the regex match, I guess the reason for many aggregators is spreading the match work along many cpu. But that is not needed, if we cache the target bucket of every metric name, the work is only done for the first come of a new metric name. The problem may be just carbon-c-relay matching regex every time, besides this, I can not think of any costly operation that needs more than one cpu. The cpu should only be in input protocol parsing, which can not be reduced anyway. All regex matching result can be cached, including the result of |
I think there is a misunderstanding. The relay has a static number of threads, which is main + workers + servers + submission-server + aggregator. So, no matter how many aggregations you have, there's only one expiry aggregator thread. This one in my opinion should go, and the workers should do the job, but for that, expiry should be able to be done in parallel. Due to the way aggregations currently work, and the perfectly logical imbalance that happens there, the workers very quickly suffer from lock-contention when they deal with aggregations (because the most popular case is 1 aggregation rule resulting in 10K+ individual aggregations). |
It seems one
aggregate
rule starts a aggregator thread. Should it be done more efficiently?The text was updated successfully, but these errors were encountered: