-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MeshAccessLog - add support for timeout and retry for TCP backend #8348
Comments
Here is an example of exception raised:
|
|
@jakubdyszkiewicz but opentelemetry are for metrics, and accessLogStreamer is for logs, isnt it? |
OpenTelemetry can also be used for logs. Here we support this in MeshAccessLog https://kuma.io/docs/2.5.x/policies/meshaccesslog/#opentelemetry And it seems that Datadog agent now has a builtin support for it |
@jakubdyszkiewicz thank you for the hint. I will check proposed configuration and will let you know if it works. I was using datadog tcp collector as i think it was recommended in the docs, but i would explore solution proposed above. |
Thanks @jakubdyszkiewicz for the suggestion, not sure how OpenTelemetry would help here. |
@jakubdyszkiewicz i see that if oltp logs are specified |
@jakubdyszkiewicz I was able to get OTLP logs working, however, it also has own problems:
Could you please help with that? |
Actually https://github.com/kumahq/kuma/blob/master/pkg/plugins/policies/meshaccesslog/plugin/xds/configurer.go#L313 missing interpolations |
@samm-git could you copy/paste an example of what you mean? |
@michaelbeaumont sure. openTelemetry:
body: '{"test":"value"}' would render as escaped
openTelemetry:
body:
test: "value" gets ignored and default is in use (why?)
Actually gets rendered to the array
what again prevents datadog from creating an index. |
Also, i did an experiment - i tried to kill Datadog daemonset to see how errors are handled. From what i see - envoy silently dropped all messages when datadog agent was down. No errors were reported, nothing. So it's even worse compared to the original behavior, at least we had some reference that data is incomplete. |
From what i see in Envoy rendered config - MeshAccessLogs does not configure any retry settings {
"log_name": ...,
"grpc_service": {...},
"transport_api_version": ...,
"buffer_flush_interval": {...},
"buffer_size_bytes": {...},
"filter_state_objects_to_log": [],
"grpc_stream_retry_policy": {...},
"custom_tags": []
} Most of the things related to retry and log flush are configured via |
Hey, I did some investigation on the "body" and our docs are lacking, or rather examples. Our docs says
This body field is field from OTEL and it's quite complex. If you want to do key value you can do this
KUMA_MESH is then interpolated just fine. We should also do a better job at failing when the format is not right instead of fallback to a default. I agree that it's confusing. When it comes to retries. I think |
@jakubdyszkiewicz thank you. May i kindly ask to add interpolation to headers as well? As in fact, we don't need a body at all, these all could be injected as OTLP attributes. Plus it is very confusing to have different rules for headers and for the body |
@jakubdyszkiewicz JFYI - example default:
backends:
- type: OpenTelemetry
openTelemetry:
endpoint: otel-collector.observability.svc:4317
body:
kvlistValue:
values:
- key: "mesh"
value:
stringValue: "%KUMA_MESH%" works as expected and DataDog is happy with it and indexing the |
Ok, so to test retry you can check this proxypatch
I agree on docs. Created issue kumahq/kuma-website#1538 Let me know how this retry goes. |
I added above example to OpenAPI schema of MeshAccessLog policy: https://github.com/kumahq/kuma/pull/8533/files#diff-ed8e7e0fddb480be4d601ae35bb953cddd53659fb3b9d306b43996a8f322cb38R82 |
This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. |
This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. |
|
This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. |
Description
Hi,
When sending Kuma logs via DataDog service in K8s we get sporadically error messages from
kuma-sidekar
with failures to send logs.It would be nice to add support for configuring timeout & nr. of retries in case of TCP connection failure.
The text was updated successfully, but these errors were encountered: