-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aggregated-consistent-hashing relay rule #168
Comments
Hey Erin, here's my experience, for what its worth:
Poking through the aggregator_putmetric code, you can see a number of pthread mutex calls to protect access to the shared data structure holding aggregated metrics: https://github.com/grobian/carbon-c-relay/blob/master/aggregator.c#L172
I think that a focus on aggregation throughput, and the addition of aggregated-consistent-hashing, would help make carbon-c-relay more scalable. |
Indeed, the aggregator is severely hampered by shared data structures. I had plans to build separate sections for the worker threads so they can do their stuff in parallel, etc. but never got to it sofar. The real problem is that all threads need to go through a shared piece which determines if a certain aggregate already exists or not – and if not, create it. To implement aggregated-consistent-hashing, I would need to know what exactly it splits on, and how it tries to do that. Currently, using something like fnv1a_ch would relay all input metrics to the same aggregator, but this will fail if your inputs are complex matches that take multiple spaces. That said, it seems like what is achieved here, is a match-rule that is identical to the aggregation rule with a single target being the aggregation. So I understand your scalability problem. I myself think it is a waste c-relay can't take advantage of multiple cores in this case, and would like to solve that first. The hashing technique seems like a workaround which will help, but cannot resolve the problem (for instance when you only have 1 aggregation). |
Not sure if this helps, but what works for me on dealing with aggregations. We handle about 25M metrics per minute at this point, and almost all of them get sum'd to some degree. Our metrics are in the format of
This lets me have a front end cluster of several nodes catch and pre-aggregate all the metrics, hash them, send intermediately aggregated metrics to the back end storage host, which re-sums the smaller number of things to aggregate, then passes them on to the carbon caches. Obviously, it only works for some of the aggregation functions, but we mostly use sum, so it works for me. But this lets me fan out the aggregations. |
We are currently using the original graphite python in our existing production environment. We are ingesting metrics from hundreds of servers and have the metrics balancing over multiple relays which then balance to multiple aggregators. The graphite relays have a flag for aggregated-consistent-hashing which came out of graphite-project/carbon#32 . Is there a way to implement something like this using carbon-c-relay?
I have started mirroring my production data over to a new carbon-c-relay cluster. I currently have a single relay passing data to multiple aggregators. It seems like the aggregation portion of carbon-c-relay is single threaded and I need to run more than one instance of it to process all of my data. I have tried both hashing methods carbon_ch & fnv1a_ch on the relay, but it sends similar types of data to different aggregators, then the aggregators aggregate their portion of the data and write it to the backend. The data being written to the backend isn't the full aggregation since that aggregator didn't collect all of the data.
An example:
aggregate ^([^.]+)\.timers\.(.*pdx.*)\.([^_.]+)\.count_ps$ every 10 seconds expire after 30 seconds compute sum write to \1.timers.\2._totals._pdx.count_ps._sum send to whisper_cache_b0 stop ;
Is there a way for me to implement something similar to aggregated-consistent-hashing using carbon-c-relay?
Is the aggregation portion of carbon-c-relay really only single threaded? Is there a way for me to make use of the additional cpu resources on the server?
Thank you for any assistance you could provide.
-Erin Willingham
The text was updated successfully, but these errors were encountered: