-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kindling_tcp_connect_total无法真实反应容器之间是否有tcp建联失败 #548
Comments
请问是怎么确定这些数据是“误报”的?这些调用根本不存在还是存在调用但没有发生“建连失败”? |
这些调用存在,但是没有发生“建联失败”的情况,我们服务的调用及日志都没有任何的异常,但是通过kindling采集出来的数据,却时不时的会有显示tcp建联失败 |
我们应用的场景也比较简单,无论是集群服务之间的调用,还是集群服务与集群外部中间件之间的调用,都会不定时的会显示tcp建连失败的数据,但是我们排查了业务的日志,发现根本没有任何的错误输出,并且不只一个业务会出现这种问题,所以怀疑采集出来的数据有问题 |
麻烦打开debug日志,然后把日志发出来,我看一下 方法为在配置文件中修改 这个日志建议打印5分钟,这段时间内要出现过“误报的建连失败”指标。 |
在采集的数据中kindling_tcp_connect_total{errno="-2",success="false"},errno的value为-2,这个报错会在UnixSocketDomain类型下发生,应该把socket类型是AF_UNIX的过滤掉,这类不算TCP |
Describe the bug
prosql:increase(kindling_tcp_connect_total{success="false"}[2m])
在服务与服务之间,总是有数值出现
How to reproduce?
部署kubernetes集群,网络采用calico的ipip的overlay网络模式,部署任意java程序之间调用即可复现
What did you expect to see?
increase(kindling_tcp_connect_total{success="false"}[2m]) 这个指标可以真实的反应两个pod之间是否tcp链接失败的情形,数据准确性提高
What did you see instead?
框中的数据都是误报出来的数据
Screenshots
What config did you use?
kindlingproject/kindling-agent:latesttest
kindlingproject/kindling-grafana:latesttest
Logs
Environment (please complete the following information)
Additional context
The text was updated successfully, but these errors were encountered: