-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Q] Why one of the 8 go-carbon nodes in cluster is experience too much read load on CPU while others are normal? #507
Comments
Hi @nadeem1701 Different load means that read or write load is skewed somehow, and usually that happens because of read and write configuration (i.e. your relay and graphite-web) and not go-carbon itself. Are you sure that node 7 is participating in reads coming from graphite web? Could you please share (anonimized) config for both your relay and graphite-web? |
Ah, I misread graph. Node 7 almost getting no traffic and node 2 is overloaded. Well, default graphite sharding is not really uniform, it's better to use jump hash for that. But please note that graphite-web do not support jump hash directly, you'll need to connect graphite-web to carbonservers (poprt 8080) on go-carbon using CLUSTER_SERVERS then. |
Thank you @deniszh for your very quick response. The metric values in the legend are the last values at a given time, so we cannot say that Node#7 is getting the least/no traffic. gets a relatively fair amount of traffic (cyan-colored line). We do not use carbonserver to fetch metrics from the cluster. We have graphite-webapp running on all worker nodes and graphite-webapp with relay configurations on relay-nodes. We can way that we use go-carbon to write metrics and graphite-webapp to read them. If Python based webapp was causing read load on CPU, it could have been understandable. In this case, go -carbon is stressing CPU with READ. We use fnv1a for hashing and did not expect this much imbalance. relay-configs: Graphite-web |
Hi @nadeem1701 |
@nadeem1701 : ah, got it. Is main graphite-web has same config as you post above? |
Yes. the graphite-web configs shared earlier are relay-graphite-web's. It queries the graphite-web running on all 8 worker nodes and returns collected metrics. |
if local graphite-web share same set of IPs I think you need to set |
We have a carbon-graphite cluster with 2 carbon-c-relays and 8 go-carbon nodes. Recently, we are noticing alarms for high CPU load on one of the worker nodes. Upon investigation. Upon investigation, we found that go-carbon is putting too much I/O read load. It is having read load approximately equivalent to the other 7 together.
It is to be noted that we do not use go-carbon to fetch the metrics from the cluster. We use graphite-app (python version) for this purpose. It is not causing the IO issue as we have done per-process CPU analysis.
I need help identifying RCA for this abnormality in that one of the worker nodes with the same HW and SW configuration behaves differently.
go-carbon version: 0.14.0
graphite-webapp: 1.2.0
The text was updated successfully, but these errors were encountered: