Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the logs are not sent after "something" happened. #68

Open
cragia opened this issue Aug 4, 2021 · 4 comments
Open

the logs are not sent after "something" happened. #68

cragia opened this issue Aug 4, 2021 · 4 comments

Comments

@cragia
Copy link

cragia commented Aug 4, 2021

This is more an open question than an issue...

I have a couple of services that use winston-loki as transport to send the logs data to Loki. It's configured to send logs as batch (as default).

I noticed that sometimes (in moments where one of the services is a little bit "stressed" and received many messages) the transport simply stops to send logs to Loki, and therefore, I find no logs in there. The service is continuing to do stuff, as I can see that the Console transport of winston is still working, and displaying correctly things on the console.
The problem is that it never sends a log again, so I've lost even days of logs...

So my question is: what can be the reasons why the batching stopped sending logs? And when it stopped, how can I automatically make it restart sending the logs that it has stacked until that moment?

thank you,
Giacomo.

@JaniAnttonen
Copy link
Owner

Hi! Sorry for not answering earlier. I have a hunch that the issue is probably "caused" by the latest patch, but has been there for a while, just in another form. This is probably fixable by introducing a new queue for serialized logs ready for sending.

@JaniAnttonen
Copy link
Owner

A fix that works as of now is to switch from protobuf to JSON / disable batching for protobuf.

@cragia
Copy link
Author

cragia commented Sep 10, 2021

I already use JSON... this is my configuration:
{ level: 'info', json: true, host: process.env.LOKI_URL || 'http://loki:3100', labels: { service: '***', pod: process.env.POD_NAME || 'pod', }, }

What else could I do?

@jonim8or
Copy link

Hi, I think I found the cause. For me, the "something" that breaks it, is a loki restart. And the cause is that the prepareJsonBatch does the changes in the original batch (instead of creating a new prepared json). If the message is not sent, but also clearOnError was not set, you have a batch object which has a mix of entries:
image
Some are the input for prepareJsonBatch, some are the output of prepareJsonBatch.
I'll see if I can make a PR for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants