Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log lines with same timestamp are not in order after ingesting to Loki #13295

Open
duj4 opened this issue Jun 24, 2024 · 9 comments
Open

Log lines with same timestamp are not in order after ingesting to Loki #13295

duj4 opened this issue Jun 24, 2024 · 9 comments
Labels
component/fluent-bit-plugin type/bug Somehing is not working as expected

Comments

@duj4
Copy link

duj4 commented Jun 24, 2024

Describe the bug
I am ingesting the log files to Loki via Fluent-bit loki output plugin, the log lines with same timestamp were not in the order as they were in the original logs.

Original log files:

[D 2024-06-19 17:25:02.417 26256 17660 - - ldap3:118] BASIC:instantiated Tls: <Tls(validate=<VerifyMode.CERT_NONE: 0>)>
[D 2024-06-19 17:25:02.417 26256 17660 - - ldap3:118] BASIC:instantiated Server: <Server(host=‘shgmwcdc01.cod.ms.com.cn’, port=636, use_ssl=True, allowed_referral_hosts=[(‘’, True)], tls=Tls(validate=<VerifyMode.CERT_NONE: 0>), get_info=‘SCHEMA’, mode=‘IP_V6_PREFERRED’)>
[D 2024-06-19 17:25:02.417 26256 17660 - - ldap3:118] ERROR:unknown strategy for <ldaps://shgmwcdc01.cod.ms.com.cn:636 - ssl - user: itsm_api_qa - not lazy - unbound - closed - - tls not started - not listening - No strategy - internal decoder>
[D 2024-06-19 17:25:02.417 26256 17660 - - ldap3:118] BASIC:instantiated : <ldaps://shgmwcdc01.cod.ms.com.cn:636 - ssl - user: itsm_api_qa - not lazy - unbound - closed - - tls not started - not listening - No strategy - internal decoder - async - real DSA - not pooled - cannot stream output>
[D 2024-06-19 17:25:02.417 26256 17660 - - ldap3:118] BASIC:instantiated Connection: <Connection(server=Server(host=‘shgmwcdc01.cod.ms.com.cn’, port=636, use_ssl=True, allowed_referral_hosts=[(‘
’, True)], tls=Tls(validate=<VerifyMode.CERT_NONE: 0>), get_info=‘SCHEMA’, mode=‘IP_V6_PREFERRED’), user=‘itsm_api_qa’, auto_bind=‘NO_TLS’, version=3, authentication=‘SIMPLE’, client_strategy=‘RESTARTABLE_ASYNC’, auto_referrals=True, check_names=True, read_only=False, lazy=False, raise_exceptions=False, fast_decoder=True, auto_range=True, receive_timeout=15, return_empty_attributes=True, auto_encode=True, auto_escape=True, use_referral_cache=False)>

Queried from Grafana:
image

Output with Fluent-bit stdout plugin:
image

Per the suggestion in cannot order loki entries with same timestamp · Issue #56071 · grafana/grafana · GitHub, I set increment_duplicate_timestamp as true, but still no luck.

To Reproduce
Steps to reproduce the behavior:

  1. Started Loki
  2. Ingested log snippet above by retrieving the timestamp via regex

Expected behavior
Log lines with same timestamp should be kept in order as they were in the original log files.

Environment:

  • Infrastructure: K8S
  • Deployment tool: helm
  • Loki Version: 2.9.8

Screenshots, Promtail config, or terminal output
Uploaded in above sections.

@duj4
Copy link
Author

duj4 commented Jun 24, 2024

If I toggle Newest first in Grafana, the log lines are showing as expected, in the correct reversed order though:
image

@duj4
Copy link
Author

duj4 commented Jun 26, 2024

Either logcli or Loki API could retrieve the log lines in correct order, but just Grafana, now I am suspecting whether this is a Grafana issue.

@duj4
Copy link
Author

duj4 commented Jun 26, 2024

It seems this is owning to Grafana log visualization:

Order - Display results in descending or ascending time order. The default is Descending, showing the newest logs first. Set to Ascending to show the oldest log lines first.

Thanks @tonyswumac

@duj4 duj4 closed this as completed Jun 26, 2024
@duj4 duj4 reopened this Jun 26, 2024
@duj4
Copy link
Author

duj4 commented Jun 26, 2024

I am afraid I have to re-open this issue as the function of increment_duplicate_timestamp was not functioning at all.

@JStickler JStickler added component/fluent-bit-plugin type/bug Somehing is not working as expected labels Jul 1, 2024
@ravishankar15
Copy link
Contributor

ravishankar15 commented Jul 30, 2024

@duj4 I think the increment_duplicate_timestamp will work only if the log line is identical as well. In this case it is not hence it is expected to not increment

@duj4
Copy link
Author

duj4 commented Jul 31, 2024

Thanks for the reply @ravishankar15 , per the illustration of this config:

Alter the log line timestamp during ingestion when the timestamp is the same as the previous entry for the same stream. When enabled, if a log line in a push request has the same timestamp as the previous line for the same stream, one nanosecond is added to the log line.

This should work for the log lines with same timestamp instead of the whole line, otherwise the subsequent ones will be discarded .

@ravishankar15
Copy link
Contributor

ravishankar15 commented Jul 31, 2024

@duj4 I think that will cause breakage when we receive out of order events,

Alter the log line timestamp during ingestion when the timestamp is the same as the previous entry for the same stream. When enabled, if a log line in a push request has the same timestamp as the previous line for the same stream, one nanosecond is added to the log line. This will preserve the received order of log lines with the exact same timestamp when they are queried, by slightly altering their stored timestamp. NOTE: This is imperfect, because Loki accepts out of order writes, and another push request for the same stream could contain duplicate timestamps to existing entries and they will not be incremented.
CLI flag: -validation.increment-duplicate-timestamps
[increment_duplicate_timestamp: | default = false]

To dedup the out of order events we consider the log line as well for incrementing the timestamp. May be a new setting which does not care of duplicate out of order events can help here but that is a hacky approach AFAIK.

@duj4
Copy link
Author

duj4 commented Aug 1, 2024

thanks @ravishankar15 , I tested that function in 3.1.0 but still no luck

@ravishankar15
Copy link
Contributor

I think the function is working as expected, I can see the timestamp incremented verified from the API,

Push Request,

curl --location 'http://localhost:3100/loki/api/v1/push' \
--header 'Content-Type: application/json' \
--data '{
    "streams": [
        {
            "stream": {
                "lokiissuetype": "sametimestamp05"
            },
            "values": [
                [
                    "1722556800000000000",
                    "{\"level\":\"DEBUG\",\"message\":\"test log levels 001\"}"
                ],
                [
                    "1722556800000000000",
                    "{\"level\":\"INFO\",\"message\":\"test log levels 002\"}"
                ],
                [
                    "1722556800000000000",
                    "{\"level\":\"INFO\", \"message\":\"test log levels 003\"}"
                ]
            ]
        }
    ]
}'

Query Request

curl --location --globoff 'localhost:3100/loki/api/v1/query_range?query={lokiissuetype%3D%22sametimestamp05%22}&start=1722470400000000000&end=1722816000000000000'

Query Result

{
    "status": "success",
    "data": {
        "resultType": "streams",
        "result": [
            {
                "stream": {
                    "detected_level": "info",
                    "lokiissuetype": "sametimestamp05",
                    "service_name": "unknown_service"
                },
                "values": [
                    [
                        "1722556800000000002", --> Timestamp incremented
                        "{\"level\":\"INFO\", \"message\":\"test log levels 003\"}"
                    ],
                    [
                        "1722556800000000001", --> Timestamp incremented
                        "{\"level\":\"INFO\",\"message\":\"test log levels 002\"}"
                    ]
                ]
            },
            {
                "stream": {
                    "detected_level": "debug",
                    "lokiissuetype": "sametimestamp05",
                    "service_name": "unknown_service"
                },
                "values": [
                    [
                        "1722556800000000000",
                        "{\"level\":\"DEBUG\",\"message\":\"test log levels 001\"}"
                    ]
                ]
            }
        ],
        "stats": {
            "summary": {
                "bytesProcessedPerSecond": 2521,
                "linesProcessedPerSecond": 25,
                "totalBytesProcessed": 296,
                "totalLinesProcessed": 3,
                "execTime": 0.117371,
                "queueTime": 0.003975,
                "subqueries": 0,
                "totalEntriesReturned": 3,
                "splits": 96,
                "shards": 96,
                "totalPostFilterLines": 3,
                "totalStructuredMetadataBytesProcessed": 90
            },
            "querier": {
                "store": {
                    "totalChunksRef": 1,
                    "totalChunksDownloaded": 1,
                    "chunksDownloadTime": 1656376,
                    "queryReferencedStructuredMetadata": false,
                    "chunk": {
                        "headChunkBytes": 0,
                        "headChunkLines": 0,
                        "decompressedBytes": 296,
                        "decompressedLines": 3,
                        "compressedBytes": 113,
                        "totalDuplicates": 0,
                        "postFilterLines": 3,
                        "headChunkStructuredMetadataBytes": 0,
                        "decompressedStructuredMetadataBytes": 90
                    },
                    "chunkRefsFetchTime": 3516082,
                    "congestionControlLatency": 0,
                    "pipelineWrapperFilteredLines": 0
                }
            },
            "ingester": {
                "totalReached": 12,
                "totalChunksMatched": 0,
                "totalBatches": 12,
                "totalLinesSent": 0,
                "store": {
                    "totalChunksRef": 0,
                    "totalChunksDownloaded": 0,
                    "chunksDownloadTime": 0,
                    "queryReferencedStructuredMetadata": false,
                    "chunk": {
                        "headChunkBytes": 0,
                        "headChunkLines": 0,
                        "decompressedBytes": 0,
                        "decompressedLines": 0,
                        "compressedBytes": 0,
                        "totalDuplicates": 0,
                        "postFilterLines": 0,
                        "headChunkStructuredMetadataBytes": 0,
                        "decompressedStructuredMetadataBytes": 0
                    },
                    "chunkRefsFetchTime": 319333,
                    "congestionControlLatency": 0,
                    "pipelineWrapperFilteredLines": 0
                }
            },
            "cache": {
                "chunk": {
                    "entriesFound": 0,
                    "entriesRequested": 1,
                    "entriesStored": 2,
                    "bytesReceived": 0,
                    "bytesSent": 239,
                    "requests": 2,
                    "downloadTime": 3375,
                    "queryLengthServed": 0
                },
                "index": {
                    "entriesFound": 0,
                    "entriesRequested": 0,
                    "entriesStored": 0,
                    "bytesReceived": 0,
                    "bytesSent": 0,
                    "requests": 0,
                    "downloadTime": 0,
                    "queryLengthServed": 0
                },
                "result": {
                    "entriesFound": 0,
                    "entriesRequested": 0,
                    "entriesStored": 0,
                    "bytesReceived": 0,
                    "bytesSent": 0,
                    "requests": 0,
                    "downloadTime": 0,
                    "queryLengthServed": 0
                },
                "statsResult": {
                    "entriesFound": 85,
                    "entriesRequested": 90,
                    "entriesStored": 90,
                    "bytesReceived": 18275,
                    "bytesSent": 0,
                    "requests": 180,
                    "downloadTime": 277089,
                    "queryLengthServed": 2460000000000
                },
                "volumeResult": {
                    "entriesFound": 0,
                    "entriesRequested": 0,
                    "entriesStored": 0,
                    "bytesReceived": 0,
                    "bytesSent": 0,
                    "requests": 0,
                    "downloadTime": 0,
                    "queryLengthServed": 0
                },
                "seriesResult": {
                    "entriesFound": 0,
                    "entriesRequested": 0,
                    "entriesStored": 0,
                    "bytesReceived": 0,
                    "bytesSent": 0,
                    "requests": 0,
                    "downloadTime": 0,
                    "queryLengthServed": 0
                },
                "labelResult": {
                    "entriesFound": 0,
                    "entriesRequested": 0,
                    "entriesStored": 0,
                    "bytesReceived": 0,
                    "bytesSent": 0,
                    "requests": 0,
                    "downloadTime": 0,
                    "queryLengthServed": 0
                },
                "instantMetricResult": {
                    "entriesFound": 0,
                    "entriesRequested": 0,
                    "entriesStored": 0,
                    "bytesReceived": 0,
                    "bytesSent": 0,
                    "requests": 0,
                    "downloadTime": 0,
                    "queryLengthServed": 0
                }
            },
            "index": {
                "totalChunks": 0,
                "postFilterChunks": 0,
                "shardsDuration": 0
            }
        }
    }
}

I also tried with fluent-bit for the above logs. The timestamp is taken as the current timestamp

Couple of things you can check,

  1. Make sure you have configured the increment_duplicate_timestamp
  2. If you have a customised fluent-bit.conf file you can verify the configs there
  3. Also if you have custom methods to extract timestamp you can check there as well. Ref

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/fluent-bit-plugin type/bug Somehing is not working as expected
Projects
None yet
Development

No branches or pull requests

3 participants