Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/stanza] Introduce batching logs in File consumer #36663

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

andrzej-stencel
Copy link
Member

@andrzej-stencel andrzej-stencel commented Dec 4, 2024

Description

Modifies the File consumer to emit logs in batches as opposed to sending each log individually through the Stanza pipeline and on to the Log Emitter.

Here are the changes introduced:

  • 6b4c9fe Changed the Reader::ReadToEnd method in File consumer to collect the tokens scanned from the file into batches. At this point, the Reader still emits each token individually, as the emit.Callback function only accepts a single token.
  • c206995 Changed emit.Callback function signature to accept a slice of tokens as opposed to a single token, and changed the Reader to emit a batch of tokens in one request. At this point, the batches are still split into individual tokens inside the emit function, because the Stanza operators can only process one entry at a time.
  • aedda3a Added ProcessBatch method to Stanza operators and used it in the emit function. At this point, the batch of tokens is translated to a batch of entries and passed to Log Emitter as a whole batch. The batch is still split in the Log Emitter, which calls consumeFunc for each entry in a loop.
  • 13d6054 Changed the LogEmitter to add the whole batch to its buffer, as opposed to adding entries one by one.

Slice of entries []entry.Entry vs. slice of pointers []*entry.Entry

I considered whether the ProcessBatch method in the Operator interface should accept a slice of structs []entry.Entry or a slice of pointers []*entry.Entry. I ran some tests (similar to #35454) and they showed a 7-10% performance loss when using a slice of structs vs. a slice of pointers. That's why I decided to use the slice of pointers []*entry.Entry.

Link to tracking issue

Testing

No changes in tests. The goal is for the functionality to not change and for performance to not decrease.

Documentation

These are internal changes, no user documentation needs changing.

But send each token of the batch individually via `emit`.
But still emit each Entry individually into the Stanza pipeline
The File input's `emitBatch` function now calls `ProcessBatch` instead of `Process`.

The added `ProcessBatch` method will make each Stanza operator
capable of accepting a batch of entries.

At this stage, all the implementations of `ProcessBatch` just call `Process` in a loop.
…f `[]entry.Entry`

According to testing, this results in 7-10% performance improvement.
Instead of adding each entry in a batch separately to the LogEmitter's buffer, add the whole batch in one operation.
Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the performance impact of these changes?

pkg/stanza/fileconsumer/internal/reader/reader.go Outdated Show resolved Hide resolved
err = r.emitFunc(ctx, emit.NewToken(token, r.FileAttributes))
if err != nil {
r.set.Logger.Error("failed to process token", zap.Error(err))
tokens = append(tokens, emit.NewToken(copyBody(token), copyAttributes(r.FileAttributes)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What specifically introduces a need to make copies of the body and attributes?

It seems like a lot of overhead - are we sure it's necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without copying, for each token in the tokens collection, its Body would point to the same place in memory, specifically the Decoder's decodeBuffer.

This isn't a problem when the Reader emits each token before creating a new token, as the token's Body is copied in the emit function (specifically in the toBody function - the conversion from []byte to string copies the underlying array).

Similar with the attributes, the reader::FileAttributes map would be reused by all tokens if not copied.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly then, we're just making the copies in a different place, so this does not introduce new work into the hot path?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid it does introduce new work.

How I believe this worked before this change:

  • Every time a new token is read from the file, it's placed in the same area in memory - the Decoder's decodeBuffer.
  • A pointer to that place in memory is passed to the emit function
  • The emit function reads the bytes and creates a new Entry, copying the bytes into a new area in memory.

How this works after this change:

  • Every time a new token is read from the file, it's placed in the same area in memory - the Decoder's decodeBuffer.
  • In the Reader's readContent function, a batch of tokens is accumulated before calling the emit function, so the bytes from decodeBuffer need to be copied into a new area in memory for every new token
  • the accumulated slice of tokens is passed to the emit function, which creates a new Entry for every Token, copying the bytes for the second time.

pkg/stanza/fileconsumer/benchmark_test.go Outdated Show resolved Hide resolved
func (i *Input) emit(ctx context.Context, token emit.Token) error {
if len(token.Body) == 0 {
return nil
func (i *Input) emitBatch(ctx context.Context, tokens []emit.Token) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is in the hot path, we might be better off flattening these functions. It seems like a lot of extra error checking and wrapping, where we really just need to put a loop around what was already there, switch from return to append err + continue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have inlined the convertToken function in de66483. Should I inline the convertTokens function as well?

Comment on lines 71 to 77
func (p *Parser) ProcessBatch(ctx context.Context, entries []*entry.Entry) error {
var errs []error
for i := range entries {
errs = append(errs, p.Process(ctx, entries[i]))
}
return errors.Join(errs...)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this exactly the same for all parsers and transformers? If so, can we move it to helper/transformer.go, where one implementation can be inherited by all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Riiiiight, this probably makes a lot of sense 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh but wait, the ProcessBatch method calls the Process method, and Go does not support method overriding, so it will always call the Process method of the helper.TransformerOperator struct, right? Is there a simple way around it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. We might not be able to fully eliminate the need for a ProcessBatch implementation in each operator, but I think at least the implementation details can be consolidated. Same idea as ParseWith here.

// transformer.go
type ProcessFunc func(context.Context, []*entry.Entry) error

func ProcessBatchWith(ctx context.Context, entries []*entry.Entry, f ProcessFunc) error {
	var errs []error
	for i := range entries {
		errs = append(errs, f(ctx, entries[i]))
	}
	return errors.Join(errs...)
}

// each implemenation of Transformer
func (p *Parser) ProcessBatch(ctx context.Context, entries []*entry.Entry) error {
	return p.ProcessBatchWith(ctx, entries, p.Process)
}

Even though the implementation of ProcessBatch is pretty simple, I still think it's worth consolidating because any changes in the future could easily be made inconsistently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in d8704ab.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I just realized I only updated transformers and missed parsers in the above commit. Added parsers in 5734fcc.

@djaglowski
Copy link
Member

What is the performance impact of these changes?

Do we have a benchmark or benchmarks that we can use to evaluate the net effect of these changes? A lot of the changes appear to have subtle implications so it's difficult to decide if they are worth it or not without some measurements.

@andrzej-stencel
Copy link
Member Author

andrzej-stencel commented Dec 16, 2024

What is the performance impact of these changes?

I ran the EndToEnd benchmark (not sure how relevant it is) and the "end to end" test collecting 1 GB of logs from a single file with the collector binary. See below.

BenchmarkEndToEnd

pkg/stanza/adapter/BenchmarkEndToEnd
$ go test -run=xxx -bench=EndToEnd
goos: linux
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/adapter
cpu: 13th Gen Intel(R) Core(TM) i7-13700H
BenchmarkEndToEnd/workerCount=1,maxBatchSize=1,flushInterval=10ms-20                   2         733116691 ns/op        460448128 B/op   8800782 allocs/op
BenchmarkEndToEnd/workerCount=1,maxBatchSize=1,flushInterval=100ms-20                  2         647384276 ns/op        460445164 B/op   8800702 allocs/op
BenchmarkEndToEnd/workerCount=1,maxBatchSize=10,flushInterval=10ms-20                  2         600991822 ns/op        382361976 B/op   7020680 allocs/op
BenchmarkEndToEnd/workerCount=1,maxBatchSize=10,flushInterval=100ms-20                 2         604421607 ns/op        382354476 B/op   7020603 allocs/op
BenchmarkEndToEnd/workerCount=1,maxBatchSize=100,flushInterval=10ms-20                 2         551094092 ns/op        347132024 B/op   6206621 allocs/op
BenchmarkEndToEnd/workerCount=1,maxBatchSize=100,flushInterval=100ms-20                2         558686398 ns/op        347128024 B/op   6206598 allocs/op
BenchmarkEndToEnd/workerCount=1,maxBatchSize=1000,flushInterval=10ms-20                2         541430882 ns/op        343303492 B/op   6112393 allocs/op
BenchmarkEndToEnd/workerCount=1,maxBatchSize=1000,flushInterval=100ms-20               2         522679270 ns/op        343306200 B/op   6112372 allocs/op
BenchmarkEndToEnd/workerCount=1,maxBatchSize=10000,flushInterval=10ms-20               2         543155004 ns/op        344406224 B/op   6101974 allocs/op
BenchmarkEndToEnd/workerCount=1,maxBatchSize=10000,flushInterval=100ms-20              2         557043344 ns/op        344405516 B/op   6101967 allocs/op
BenchmarkEndToEnd/workerCount=2,maxBatchSize=1,flushInterval=10ms-20                   2         695154474 ns/op        460446040 B/op   8800736 allocs/op
BenchmarkEndToEnd/workerCount=2,maxBatchSize=1,flushInterval=100ms-20                  2         654490082 ns/op        460444880 B/op   8800742 allocs/op
BenchmarkEndToEnd/workerCount=2,maxBatchSize=10,flushInterval=10ms-20                  2         700169062 ns/op        382358900 B/op   7020720 allocs/op
BenchmarkEndToEnd/workerCount=2,maxBatchSize=10,flushInterval=100ms-20                 2         568586705 ns/op        382354560 B/op   7020616 allocs/op
BenchmarkEndToEnd/workerCount=2,maxBatchSize=100,flushInterval=10ms-20                 2         585962324 ns/op        347131136 B/op   6206675 allocs/op
BenchmarkEndToEnd/workerCount=2,maxBatchSize=100,flushInterval=100ms-20                2         537941018 ns/op        347125700 B/op   6206561 allocs/op
BenchmarkEndToEnd/workerCount=2,maxBatchSize=1000,flushInterval=10ms-20                2         529003978 ns/op        343313384 B/op   6112433 allocs/op
BenchmarkEndToEnd/workerCount=2,maxBatchSize=1000,flushInterval=100ms-20               2         533420602 ns/op        343308764 B/op   6112397 allocs/op
BenchmarkEndToEnd/workerCount=2,maxBatchSize=10000,flushInterval=10ms-20               2         520374890 ns/op        344423928 B/op   6102022 allocs/op
BenchmarkEndToEnd/workerCount=2,maxBatchSize=10000,flushInterval=100ms-20              2         524670569 ns/op        344409080 B/op   6101958 allocs/op
BenchmarkEndToEnd/workerCount=4,maxBatchSize=1,flushInterval=10ms-20                   2         729801709 ns/op        460443916 B/op   8800771 allocs/op
BenchmarkEndToEnd/workerCount=4,maxBatchSize=1,flushInterval=100ms-20                  2         695331941 ns/op        460445152 B/op   8800753 allocs/op
BenchmarkEndToEnd/workerCount=4,maxBatchSize=10,flushInterval=10ms-20                  2         609127970 ns/op        382356496 B/op   7020632 allocs/op
BenchmarkEndToEnd/workerCount=4,maxBatchSize=10,flushInterval=100ms-20                 2         561177946 ns/op        382353248 B/op   7020594 allocs/op
BenchmarkEndToEnd/workerCount=4,maxBatchSize=100,flushInterval=10ms-20                 2         580258299 ns/op        347130192 B/op   6206621 allocs/op
BenchmarkEndToEnd/workerCount=4,maxBatchSize=100,flushInterval=100ms-20                2         540122238 ns/op        347130740 B/op   6206614 allocs/op
BenchmarkEndToEnd/workerCount=4,maxBatchSize=1000,flushInterval=10ms-20                3         490890373 ns/op        343297152 B/op   6112335 allocs/op
BenchmarkEndToEnd/workerCount=4,maxBatchSize=1000,flushInterval=100ms-20               2         516348068 ns/op        343304052 B/op   6112370 allocs/op
BenchmarkEndToEnd/workerCount=4,maxBatchSize=10000,flushInterval=10ms-20               2         521940200 ns/op        344406800 B/op   6101944 allocs/op
BenchmarkEndToEnd/workerCount=4,maxBatchSize=10000,flushInterval=100ms-20              2         540817798 ns/op        344405140 B/op   6101957 allocs/op
BenchmarkEndToEnd/workerCount=8,maxBatchSize=1,flushInterval=10ms-20                   2         684520518 ns/op        460444084 B/op   8800734 allocs/op
BenchmarkEndToEnd/workerCount=8,maxBatchSize=1,flushInterval=100ms-20                  2         657656100 ns/op        460443584 B/op   8800704 allocs/op
BenchmarkEndToEnd/workerCount=8,maxBatchSize=10,flushInterval=10ms-20                  2         589294887 ns/op        382353692 B/op   7020618 allocs/op
BenchmarkEndToEnd/workerCount=8,maxBatchSize=10,flushInterval=100ms-20                 2         590353082 ns/op        382354828 B/op   7020627 allocs/op
BenchmarkEndToEnd/workerCount=8,maxBatchSize=100,flushInterval=10ms-20                 2         520960143 ns/op        347128612 B/op   6206573 allocs/op
BenchmarkEndToEnd/workerCount=8,maxBatchSize=100,flushInterval=100ms-20                2         519195290 ns/op        347129496 B/op   6206587 allocs/op
BenchmarkEndToEnd/workerCount=8,maxBatchSize=1000,flushInterval=10ms-20                2         557338458 ns/op        343316852 B/op   6112457 allocs/op
BenchmarkEndToEnd/workerCount=8,maxBatchSize=1000,flushInterval=100ms-20               2         526733440 ns/op        343303436 B/op   6112382 allocs/op
BenchmarkEndToEnd/workerCount=8,maxBatchSize=10000,flushInterval=10ms-20               2         553058060 ns/op        344406392 B/op   6101964 allocs/op
BenchmarkEndToEnd/workerCount=8,maxBatchSize=10000,flushInterval=100ms-20              2         544634770 ns/op        344406668 B/op   6101979 allocs/op
BenchmarkEndToEnd/workerCount=16,maxBatchSize=1,flushInterval=10ms-20                  2         692173162 ns/op        460445540 B/op   8800773 allocs/op
BenchmarkEndToEnd/workerCount=16,maxBatchSize=1,flushInterval=100ms-20                 2         691081284 ns/op        460443676 B/op   8800737 allocs/op
BenchmarkEndToEnd/workerCount=16,maxBatchSize=10,flushInterval=10ms-20                 2         613109702 ns/op        382355700 B/op   7020641 allocs/op
BenchmarkEndToEnd/workerCount=16,maxBatchSize=10,flushInterval=100ms-20                2         590538412 ns/op        382359592 B/op   7020622 allocs/op
BenchmarkEndToEnd/workerCount=16,maxBatchSize=100,flushInterval=10ms-20                2         567399749 ns/op        347130360 B/op   6206644 allocs/op
BenchmarkEndToEnd/workerCount=16,maxBatchSize=100,flushInterval=100ms-20               2         530738212 ns/op        347125680 B/op   6206576 allocs/op
BenchmarkEndToEnd/workerCount=16,maxBatchSize=1000,flushInterval=10ms-20               2         537735602 ns/op        343303236 B/op   6112361 allocs/op
BenchmarkEndToEnd/workerCount=16,maxBatchSize=1000,flushInterval=100ms-20              2         513183656 ns/op        343302976 B/op   6112356 allocs/op
BenchmarkEndToEnd/workerCount=16,maxBatchSize=10000,flushInterval=10ms-20              2         520426649 ns/op        344404100 B/op   6101938 allocs/op
BenchmarkEndToEnd/workerCount=16,maxBatchSize=10000,flushInterval=100ms-20             2         501402028 ns/op        344402404 B/op   6101928 allocs/op

Read 1 GB log file

I ran the "end-to-end" test collecting 1 GB of logs from a single file, as described in #35454 (comment). This (somewhat surprisingly to me) seems to result in a significant performance improvement of about 15%. I ran the tests on my laptop interchangeably, and consistently the main branch collector took 18 seconds to collect all logs, while the batching one took 15 seconds.

main-1
=== Step 0: 2024-12-16T15:46:55,431446156+01:00
=== Step 1: 2024-12-16T15:46:57,776110136+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="19ae092a-e6ce-4e2e-8e41-b429a8baa599",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 580086
=== Step 2: 2024-12-16T15:46:58,795399607+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="19ae092a-e6ce-4e2e-8e41-b429a8baa599",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.51749e+06
=== Step 3: 2024-12-16T15:46:59,811430467+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="19ae092a-e6ce-4e2e-8e41-b429a8baa599",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 2.525834e+06
=== Step 4: 2024-12-16T15:47:00,826117615+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="19ae092a-e6ce-4e2e-8e41-b429a8baa599",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 3.51771e+06
=== Step 5: 2024-12-16T15:47:01,837294131+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="19ae092a-e6ce-4e2e-8e41-b429a8baa599",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 4.494632e+06
=== Step 6: 2024-12-16T15:47:02,851706243+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="19ae092a-e6ce-4e2e-8e41-b429a8baa599",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 5.503646e+06
=== Step 7: 2024-12-16T15:47:03,867007622+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="19ae092a-e6ce-4e2e-8e41-b429a8baa599",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 6.465843e+06
=== Step 8: 2024-12-16T15:47:04,880762477+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="19ae092a-e6ce-4e2e-8e41-b429a8baa599",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 7.484097e+06
=== Step 9: 2024-12-16T15:47:05,896817941+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="19ae092a-e6ce-4e2e-8e41-b429a8baa599",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 8.34754e+06
=== Step 10: 2024-12-16T15:47:06,918197128+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="19ae092a-e6ce-4e2e-8e41-b429a8baa599",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 9.190601e+06
=== Step 11: 2024-12-16T15:47:07,947497979+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="19ae092a-e6ce-4e2e-8e41-b429a8baa599",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.0064757e+07
=== Step 12: 2024-12-16T15:47:08,970509380+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="19ae092a-e6ce-4e2e-8e41-b429a8baa599",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.093098e+07
=== Step 13: 2024-12-16T15:47:10,006788214+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="19ae092a-e6ce-4e2e-8e41-b429a8baa599",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.1763265e+07
=== Step 14: 2024-12-16T15:47:11,031971467+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="19ae092a-e6ce-4e2e-8e41-b429a8baa599",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.2743549e+07
=== Step 15: 2024-12-16T15:47:12,048653620+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="19ae092a-e6ce-4e2e-8e41-b429a8baa599",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.3799651e+07
=== Step 16: 2024-12-16T15:47:13,061609270+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="19ae092a-e6ce-4e2e-8e41-b429a8baa599",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.4748804e+07
=== Step 17: 2024-12-16T15:47:14,078500681+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="19ae092a-e6ce-4e2e-8e41-b429a8baa599",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.5801047e+07
=== Step 18: 2024-12-16T15:47:15,101722837+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="19ae092a-e6ce-4e2e-8e41-b429a8baa599",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.6519105e+07
main-2
=== Step 0: 2024-12-16T15:49:59,937734237+01:00
=== Step 1: 2024-12-16T15:50:00,960695129+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="60b56197-0e4c-4db9-b9ed-7243520643c1",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 675923
=== Step 2: 2024-12-16T15:50:01,974959315+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="60b56197-0e4c-4db9-b9ed-7243520643c1",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.613065e+06
=== Step 3: 2024-12-16T15:50:02,988684935+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="60b56197-0e4c-4db9-b9ed-7243520643c1",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 2.594632e+06
=== Step 4: 2024-12-16T15:50:04,002396877+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="60b56197-0e4c-4db9-b9ed-7243520643c1",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 3.626522e+06
=== Step 5: 2024-12-16T15:50:05,015851940+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="60b56197-0e4c-4db9-b9ed-7243520643c1",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 4.649102e+06
=== Step 6: 2024-12-16T15:50:06,031987112+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="60b56197-0e4c-4db9-b9ed-7243520643c1",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 5.65826e+06
=== Step 7: 2024-12-16T15:50:07,050476545+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="60b56197-0e4c-4db9-b9ed-7243520643c1",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 6.652504e+06
=== Step 8: 2024-12-16T15:50:08,069014012+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="60b56197-0e4c-4db9-b9ed-7243520643c1",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 7.601671e+06
=== Step 9: 2024-12-16T15:50:09,084814115+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="60b56197-0e4c-4db9-b9ed-7243520643c1",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 8.559654e+06
=== Step 10: 2024-12-16T15:50:10,098834674+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="60b56197-0e4c-4db9-b9ed-7243520643c1",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 9.47875e+06
=== Step 11: 2024-12-16T15:50:11,115570101+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="60b56197-0e4c-4db9-b9ed-7243520643c1",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.0519824e+07
=== Step 12: 2024-12-16T15:50:12,128264343+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="60b56197-0e4c-4db9-b9ed-7243520643c1",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.1482479e+07
=== Step 13: 2024-12-16T15:50:13,146768132+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="60b56197-0e4c-4db9-b9ed-7243520643c1",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.2395424e+07
=== Step 14: 2024-12-16T15:50:14,163070435+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="60b56197-0e4c-4db9-b9ed-7243520643c1",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.341538e+07
=== Step 15: 2024-12-16T15:50:15,179260462+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="60b56197-0e4c-4db9-b9ed-7243520643c1",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.4367245e+07
=== Step 16: 2024-12-16T15:50:16,194354312+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="60b56197-0e4c-4db9-b9ed-7243520643c1",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.5339147e+07
=== Step 17: 2024-12-16T15:50:17,219358900+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="60b56197-0e4c-4db9-b9ed-7243520643c1",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.6343188e+07
=== Step 18: 2024-12-16T15:50:18,236216987+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="60b56197-0e4c-4db9-b9ed-7243520643c1",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.6519105e+07
main-3
=== Step 0: 2024-12-16T15:51:47,004466315+01:00
=== Step 1: 2024-12-16T15:51:48,028078945+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="ccfbbd7f-23d8-4ed3-8e67-10121297c203",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 718363
=== Step 2: 2024-12-16T15:51:49,049324584+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="ccfbbd7f-23d8-4ed3-8e67-10121297c203",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.566481e+06
=== Step 3: 2024-12-16T15:51:50,065647473+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="ccfbbd7f-23d8-4ed3-8e67-10121297c203",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 2.723456e+06
=== Step 4: 2024-12-16T15:51:51,089223968+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="ccfbbd7f-23d8-4ed3-8e67-10121297c203",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 3.659562e+06
=== Step 5: 2024-12-16T15:51:52,109708158+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="ccfbbd7f-23d8-4ed3-8e67-10121297c203",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 4.67665e+06
=== Step 6: 2024-12-16T15:51:53,127121621+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="ccfbbd7f-23d8-4ed3-8e67-10121297c203",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 5.788852e+06
=== Step 7: 2024-12-16T15:51:54,140868674+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="ccfbbd7f-23d8-4ed3-8e67-10121297c203",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 6.818451e+06
=== Step 8: 2024-12-16T15:51:55,157193263+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="ccfbbd7f-23d8-4ed3-8e67-10121297c203",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 7.739786e+06
=== Step 9: 2024-12-16T15:51:56,170706945+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="ccfbbd7f-23d8-4ed3-8e67-10121297c203",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 8.769256e+06
=== Step 10: 2024-12-16T15:51:57,190322380+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="ccfbbd7f-23d8-4ed3-8e67-10121297c203",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 9.753002e+06
=== Step 11: 2024-12-16T15:51:58,206357895+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="ccfbbd7f-23d8-4ed3-8e67-10121297c203",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.068764e+07
=== Step 12: 2024-12-16T15:51:59,234101493+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="ccfbbd7f-23d8-4ed3-8e67-10121297c203",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.167782e+07
=== Step 13: 2024-12-16T15:52:00,250850487+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="ccfbbd7f-23d8-4ed3-8e67-10121297c203",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.2714644e+07
=== Step 14: 2024-12-16T15:52:01,265582713+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="ccfbbd7f-23d8-4ed3-8e67-10121297c203",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.3643248e+07
=== Step 15: 2024-12-16T15:52:02,276497572+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="ccfbbd7f-23d8-4ed3-8e67-10121297c203",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.4622953e+07
=== Step 16: 2024-12-16T15:52:03,295299153+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="ccfbbd7f-23d8-4ed3-8e67-10121297c203",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.5473991e+07
=== Step 17: 2024-12-16T15:52:04,308069522+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="ccfbbd7f-23d8-4ed3-8e67-10121297c203",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.6489661e+07
=== Step 18: 2024-12-16T15:52:05,325366799+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="ccfbbd7f-23d8-4ed3-8e67-10121297c203",service_name="otelcontribcol",service_version="0.115.0-dev",transport=""} 1.6519105e+07
batching-1
=== Step 0: 2024-12-16T15:49:25,451767918+01:00
=== Step 1: 2024-12-16T15:49:26,473595098+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="1018bef3-4b44-4f54-aa68-45fc159ee265",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 757945
=== Step 2: 2024-12-16T15:49:27,489701267+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="1018bef3-4b44-4f54-aa68-45fc159ee265",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 1.744846e+06
=== Step 3: 2024-12-16T15:49:28,502907791+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="1018bef3-4b44-4f54-aa68-45fc159ee265",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 2.938626e+06
=== Step 4: 2024-12-16T15:49:29,518975110+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="1018bef3-4b44-4f54-aa68-45fc159ee265",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 4.047107e+06
=== Step 5: 2024-12-16T15:49:30,541134228+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="1018bef3-4b44-4f54-aa68-45fc159ee265",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 5.224338e+06
=== Step 6: 2024-12-16T15:49:31,563921043+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="1018bef3-4b44-4f54-aa68-45fc159ee265",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 6.338091e+06
=== Step 7: 2024-12-16T15:49:32,576716675+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="1018bef3-4b44-4f54-aa68-45fc159ee265",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 7.521036e+06
=== Step 8: 2024-12-16T15:49:33,596550216+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="1018bef3-4b44-4f54-aa68-45fc159ee265",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 8.649884e+06
=== Step 9: 2024-12-16T15:49:34,620456460+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="1018bef3-4b44-4f54-aa68-45fc159ee265",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 9.814609e+06
=== Step 10: 2024-12-16T15:49:35,633902307+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="1018bef3-4b44-4f54-aa68-45fc159ee265",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 1.0996732e+07
=== Step 11: 2024-12-16T15:49:36,647248179+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="1018bef3-4b44-4f54-aa68-45fc159ee265",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 1.2160201e+07
=== Step 12: 2024-12-16T15:49:37,663024249+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="1018bef3-4b44-4f54-aa68-45fc159ee265",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 1.3322686e+07
=== Step 13: 2024-12-16T15:49:38,682719257+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="1018bef3-4b44-4f54-aa68-45fc159ee265",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 1.447508e+07
=== Step 14: 2024-12-16T15:49:39,696938377+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="1018bef3-4b44-4f54-aa68-45fc159ee265",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 1.5622622e+07
=== Step 15: 2024-12-16T15:49:40,712562869+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="1018bef3-4b44-4f54-aa68-45fc159ee265",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 1.6519105e+07
batching-2
=== Step 0: 2024-12-16T15:50:48,180828422+01:00
=== Step 1: 2024-12-16T15:50:49,202082470+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="8714502b-68f4-495d-ac04-43daaac29e69",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 766985
=== Step 2: 2024-12-16T15:50:50,218864887+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="8714502b-68f4-495d-ac04-43daaac29e69",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 1.712576e+06
=== Step 3: 2024-12-16T15:50:51,238460470+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="8714502b-68f4-495d-ac04-43daaac29e69",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 2.846777e+06
=== Step 4: 2024-12-16T15:50:52,253083715+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="8714502b-68f4-495d-ac04-43daaac29e69",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 4.052889e+06
=== Step 5: 2024-12-16T15:50:53,267904710+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="8714502b-68f4-495d-ac04-43daaac29e69",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 5.220396e+06
=== Step 6: 2024-12-16T15:50:54,283451381+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="8714502b-68f4-495d-ac04-43daaac29e69",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 6.378984e+06
=== Step 7: 2024-12-16T15:50:55,304076543+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="8714502b-68f4-495d-ac04-43daaac29e69",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 7.475826e+06
=== Step 8: 2024-12-16T15:50:56,318229950+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="8714502b-68f4-495d-ac04-43daaac29e69",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 8.644039e+06
=== Step 9: 2024-12-16T15:50:57,332883110+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="8714502b-68f4-495d-ac04-43daaac29e69",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 9.767176e+06
=== Step 10: 2024-12-16T15:50:58,345458786+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="8714502b-68f4-495d-ac04-43daaac29e69",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 1.0924412e+07
=== Step 11: 2024-12-16T15:50:59,359149580+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="8714502b-68f4-495d-ac04-43daaac29e69",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 1.2103512e+07
=== Step 12: 2024-12-16T15:51:00,373009015+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="8714502b-68f4-495d-ac04-43daaac29e69",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 1.3215526e+07
=== Step 13: 2024-12-16T15:51:01,383947577+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="8714502b-68f4-495d-ac04-43daaac29e69",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 1.4357692e+07
=== Step 14: 2024-12-16T15:51:02,397871874+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="8714502b-68f4-495d-ac04-43daaac29e69",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 1.5498857e+07
=== Step 15: 2024-12-16T15:51:03,419258504+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="8714502b-68f4-495d-ac04-43daaac29e69",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 1.6519105e+07
batching-3
=== Step 0: 2024-12-16T15:52:21,557555315+01:00
=== Step 1: 2024-12-16T15:52:22,588161825+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="f4546346-b6dd-4d45-a3a7-40ddf4698e75",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 733360
=== Step 2: 2024-12-16T15:52:23,598105509+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="f4546346-b6dd-4d45-a3a7-40ddf4698e75",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 1.707804e+06
=== Step 3: 2024-12-16T15:52:24,609804063+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="f4546346-b6dd-4d45-a3a7-40ddf4698e75",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 2.809491e+06
=== Step 4: 2024-12-16T15:52:25,625337319+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="f4546346-b6dd-4d45-a3a7-40ddf4698e75",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 3.927862e+06
=== Step 5: 2024-12-16T15:52:26,636396041+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="f4546346-b6dd-4d45-a3a7-40ddf4698e75",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 5.110002e+06
=== Step 6: 2024-12-16T15:52:27,652329616+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="f4546346-b6dd-4d45-a3a7-40ddf4698e75",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 6.302457e+06
=== Step 7: 2024-12-16T15:52:28,662959666+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="f4546346-b6dd-4d45-a3a7-40ddf4698e75",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 7.43236e+06
=== Step 8: 2024-12-16T15:52:29,675925365+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="f4546346-b6dd-4d45-a3a7-40ddf4698e75",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 8.598355e+06
=== Step 9: 2024-12-16T15:52:30,689062153+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="f4546346-b6dd-4d45-a3a7-40ddf4698e75",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 9.797821e+06
=== Step 10: 2024-12-16T15:52:31,704479779+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="f4546346-b6dd-4d45-a3a7-40ddf4698e75",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 1.0911129e+07
=== Step 11: 2024-12-16T15:52:32,718040442+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="f4546346-b6dd-4d45-a3a7-40ddf4698e75",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 1.206504e+07
=== Step 12: 2024-12-16T15:52:33,728085856+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="f4546346-b6dd-4d45-a3a7-40ddf4698e75",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 1.3183748e+07
=== Step 13: 2024-12-16T15:52:34,740337198+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="f4546346-b6dd-4d45-a3a7-40ddf4698e75",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 1.4287112e+07
=== Step 14: 2024-12-16T15:52:35,754972220+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="f4546346-b6dd-4d45-a3a7-40ddf4698e75",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 1.5495808e+07
=== Step 15: 2024-12-16T15:52:36,772794534+01:00
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="f4546346-b6dd-4d45-a3a7-40ddf4698e75",service_name="otelcontribcol",service_version="0.114.0-dev",transport=""} 1.6519105e+07

@andrzej-stencel
Copy link
Member Author

What is the performance impact of these changes?

Do we have a benchmark or benchmarks that we can use to evaluate the net effect of these changes? A lot of the changes appear to have subtle implications so it's difficult to decide if they are worth it or not without some measurements.

Great question. I'd love to have a synthetic benchmark that can mirror "real-world" performance. For now, I've been relying on the manual labor of building the binary and running it on an actual file.

@djaglowski
Copy link
Member

I ran the fileconsumer benchmarks on main and this branch and it does appear that there are some substantial memory & allocation impacts from the changes.

# on batch-logs-in-file-consumer
> (cd pkg/stanza/fileconsumer && go test -benchmem -run=^$ -bench ^BenchmarkFileInput$ github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer -race -v -count=10 | tee /tmp/batch.txt)
...
# on main
> (cd pkg/stanza/fileconsumer && go test -benchmem -run=^$ -bench ^BenchmarkFileInput$ github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer -race -v -count=10 | tee /tmp/main.txt) 

> benchstat /tmp/main.txt /tmp/batch.txt
                           │ /tmp/main.txt │            /tmp/batch.txt             │
                           │    sec/op     │    sec/op      vs base                │
FileInput/Single-10           97.81µ ±  1%   108.12µ ±  1%  +10.54% (p=0.000 n=10)
FileInput/Glob-10             106.2µ ±  1%    126.8µ ±  1%  +19.44% (p=0.000 n=10)
FileInput/MultiGlob-10        107.4µ ±  1%    128.5µ ±  3%  +19.67% (p=0.000 n=10)
FileInput/MaxConcurrent-10    423.0µ ± 26%    433.3µ ±  1%        ~ (p=0.739 n=10)
FileInput/FngrPrntLarge-10    98.09µ ±  6%   106.24µ ±  0%   +8.31% (p=0.002 n=10)
FileInput/FngrPrntSmall-10    98.01µ ±  2%   106.82µ ±  5%   +8.99% (p=0.000 n=10)
FileInput/NoFlush-10          99.09µ ±  4%   104.83µ ±  1%   +5.79% (p=0.001 n=10)
FileInput/Many-10             3.964m ± 24%    6.326m ± 45%  +59.58% (p=0.001 n=10)
geomean                       191.1µ          221.3µ        +15.76%

                           │ /tmp/main.txt  │               /tmp/batch.txt                │
                           │      B/op      │      B/op        vs base                    │
FileInput/Single-10            11.00 ±  36%    13610.00 ±  0%  +123627.27% (p=0.000 n=10)
FileInput/Glob-10              25.00 ± 116%    54451.00 ±  0%  +217704.00% (p=0.000 n=10)
FileInput/MultiGlob-10         24.50 ±  71%    54447.50 ±  0%  +222134.69% (p=0.000 n=10)
FileInput/MaxConcurrent-10     181.0 ±  90%     54526.0 ±  0%   +30024.86% (p=0.000 n=10)
FileInput/FngrPrntLarge-10     25.00 ±  56%    13635.00 ±  0%   +54440.00% (p=0.000 n=10)
FileInput/FngrPrntSmall-10     8.000 ±  88%   13611.000 ±  0%  +170037.50% (p=0.000 n=10)
FileInput/NoFlush-10           9.500 ±  58%   13612.500 ±  0%  +143189.47% (p=0.000 n=10)
FileInput/Many-10            43.20Ki ± 237%   1466.05Ki ± 13%    +3293.42% (p=0.000 n=10)
geomean                        56.41            40.26Ki         +72985.20%

                           │ /tmp/main.txt  │             /tmp/batch.txt             │
                           │   allocs/op    │  allocs/op    vs base                  │
FileInput/Single-10           0.00 ±   0%       30.00 ± 0%          ? (p=0.000 n=10)
FileInput/Glob-10              0.0 ±   0%       120.0 ± 0%          ? (p=0.000 n=10)
FileInput/MultiGlob-10         0.0 ±   0%       120.0 ± 0%          ? (p=0.000 n=10)
FileInput/MaxConcurrent-10     0.0 ±   0%       120.0 ± 0%          ? (p=0.000 n=10)
FileInput/FngrPrntLarge-10    0.00 ±   0%       30.00 ± 0%          ? (p=0.000 n=10)
FileInput/FngrPrntSmall-10    0.00 ±   0%       30.00 ± 0%          ? (p=0.000 n=10)
FileInput/NoFlush-10          0.00 ±   0%       30.00 ± 0%          ? (p=0.000 n=10)
FileInput/Many-10            43.50 ± 213%     3116.50 ± 6%  +7064.37% (p=0.000 n=10)
geomean                                   ¹     90.15       ?
¹ summaries must be >0 to compute geomean

This benchmark doesn't actually perform any work in the emitFunc but I think the work that is normally performed would be equivalent in either case. (For each token, copy the []byte into a string and create an entry.Entry.) Unless I'm misunderstanding something, I think these are the best benchmarks we have for these changes. Maybe we can find some more improvements that lessen the impact?

Copy link
Contributor

github-actions bot commented Jan 2, 2025

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@andrzej-stencel
Copy link
Member Author

andrzej-stencel commented Jan 22, 2025

EDIT: The original benchstat results I posted in this comment were flawed, I have now updated them with correct results, which are not as bad.

I tried to improve the allocations stats by pre-allocating memory for the batched entries in the following commits:

Here are the stats comparing main and current state. I'll look into using a pool of buffers to further enhance performance.

$ benchstat /tmp/main.txt /tmp/batch.txt
goos: linux
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer
cpu: 13th Gen Intel(R) Core(TM) i7-13700H
                           │ /tmp/main.txt │            /tmp/batch.txt            │
                           │    sec/op     │    sec/op     vs base                │
FileInput/Single-20           146.7µ ±  2%   152.0µ ± 55%   +3.59% (p=0.004 n=10)
FileInput/Glob-20             161.2µ ±  3%   170.9µ ±  5%   +6.01% (p=0.002 n=10)
FileInput/MultiGlob-20        163.4µ ±  5%   181.5µ ± 10%  +11.09% (p=0.000 n=10)
FileInput/MaxConcurrent-20    589.5µ ±  7%   652.1µ ± 11%  +10.61% (p=0.000 n=10)
FileInput/FngrPrntLarge-20    147.0µ ±  5%   155.4µ ±  5%   +5.75% (p=0.005 n=10)
FileInput/FngrPrntSmall-20    146.6µ ±  1%   154.0µ ±  9%   +5.06% (p=0.000 n=10)
FileInput/NoFlush-20          148.2µ ±  2%   152.3µ ±  6%   +2.77% (p=0.029 n=10)
FileInput/Many-20             2.207m ± 11%   2.594m ± 62%  +17.56% (p=0.000 n=10)
geomean                       251.6µ         271.0µ         +7.71%

                           │ /tmp/main.txt  │              /tmp/batch.txt               │
                           │      B/op      │      B/op        vs base                  │
FileInput/Single-20            15.00 ±  33%     100.50 ±  50%   +570.00% (p=0.000 n=10)
FileInput/Glob-20              79.50 ±  64%     345.00 ± 238%   +333.96% (p=0.000 n=10)
FileInput/MultiGlob-20         69.00 ± 101%     383.50 ±  67%   +455.80% (p=0.000 n=10)
FileInput/MaxConcurrent-20     151.0 ± 160%     4758.0 ±  54%  +3050.99% (p=0.000 n=10)
FileInput/FngrPrntLarge-20     43.50 ±  45%     167.50 ±  78%   +285.06% (p=0.000 n=10)
FileInput/FngrPrntSmall-20     12.00 ± 100%      70.50 ± 191%   +487.50% (p=0.000 n=10)
FileInput/NoFlush-20           13.00 ±  46%     133.50 ±  35%   +926.92% (p=0.000 n=10)
FileInput/Many-20            28.93Ki ±  65%   370.52Ki ± 340%  +1180.93% (p=0.000 n=10)
geomean                        84.08             664.1          +689.88%

                           │ /tmp/main.txt │              /tmp/batch.txt              │
                           │   allocs/op   │   allocs/op    vs base                   │
FileInput/Single-20          0.000 ±  0%      0.000 ±   0%         ~ (p=1.000 n=10) ¹
FileInput/Glob-20            0.000 ±  0%      0.000 ±   0%         ~ (p=1.000 n=10)
FileInput/MultiGlob-20       0.000 ±  0%      0.000 ±   0%         ~ (p=1.000 n=10) ¹
FileInput/MaxConcurrent-20   0.000 ±  0%      3.000 ±  67%         ? (p=0.000 n=10)
FileInput/FngrPrntLarge-20   0.000 ±  0%      0.000 ±   0%         ~ (p=1.000 n=10) ¹
FileInput/FngrPrntSmall-20   0.000 ±  0%      0.000 ±   0%         ~ (p=1.000 n=10) ¹
FileInput/NoFlush-20         0.000 ±  0%      0.000 ±   0%         ~ (p=1.000 n=10) ¹
FileInput/Many-20            28.50 ± 58%     264.50 ± 336%  +828.07% (p=0.000 n=10)
geomean                                  ²                  ?                       ²
¹ all samples are equal
² summaries must be >0 to compute geomean

@andrzej-stencel
Copy link
Member Author

andrzej-stencel commented Jan 23, 2025

I took a stab at implementing pooling of the buffers, as opposed to pre-allocating them. Comparing the benchmarks, it doesn't seem to give any performance gains though. Let me know if I missed something.

benchstat /tmp/batch.txt /tmp/pool.txt 
goos: linux
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer
cpu: 13th Gen Intel(R) Core(TM) i7-13700H
                           │ /tmp/batch.txt │              /tmp/pool.txt              │
                           │   sec/op    │    sec/op      vs base                │
FileInput/Single-20          84.73µ ± 1%   100.77µ ±  2%  +18.92% (p=0.000 n=10)
FileInput/Glob-20            93.76µ ± 2%   125.43µ ±  3%  +33.78% (p=0.000 n=10)
FileInput/MultiGlob-20       94.60µ ± 4%   126.93µ ±  2%  +34.17% (p=0.000 n=10)
FileInput/MaxConcurrent-20   341.7µ ± 1%    406.9µ ±  3%  +19.10% (p=0.000 n=10)
FileInput/FngrPrntLarge-20   85.16µ ± 1%   101.72µ ±  2%  +19.45% (p=0.000 n=10)
FileInput/FngrPrntSmall-20   85.09µ ± 1%   101.94µ ±  1%  +19.81% (p=0.000 n=10)
FileInput/NoFlush-20         84.26µ ± 1%   100.80µ ±  1%  +19.63% (p=0.000 n=10)
FileInput/Many-20            1.297m ± 8%    1.585m ± 18%  +22.24% (p=0.000 n=10)
geomean                      145.7µ         179.6µ        +23.24%

                           │  /tmp/batch.txt   │               /tmp/pool.txt                │
                           │      B/op      │     B/op       vs base                   │
FileInput/Single-20            73.50 ±  48%   10406.50 ± 1%  +14058.50% (p=0.000 n=10)
FileInput/Glob-20              317.0 ±  47%    41682.0 ± 0%  +13048.90% (p=0.000 n=10)
FileInput/MultiGlob-20         466.5 ±  66%    41686.0 ± 0%   +8835.91% (p=0.000 n=10)
FileInput/MaxConcurrent-20   1.494Ki ±  60%   40.941Ki ± 1%   +2640.10% (p=0.000 n=10)
FileInput/FngrPrntLarge-20     86.50 ± 108%   10393.50 ± 1%  +11915.61% (p=0.000 n=10)
FileInput/FngrPrntSmall-20     72.00 ±  47%   10359.50 ± 0%  +14288.19% (p=0.000 n=10)
FileInput/NoFlush-20           72.00 ±  47%   10366.00 ± 0%  +14297.22% (p=0.000 n=10)
FileInput/Many-20            143.7Ki ± 136%   1046.1Ki ± 8%    +627.93% (p=0.000 n=10)
geomean                        426.6           30.50Ki        +7221.78%

                           │   /tmp/batch.txt   │               /tmp/pool.txt                │
                           │    allocs/op    │   allocs/op     vs base                  │
FileInput/Single-20           0.000 ±   0%        5.000 ±  0%          ? (p=0.000 n=10)
FileInput/Glob-20              0.00 ±   0%        20.00 ±  5%          ? (p=0.000 n=10)
FileInput/MultiGlob-20         0.00 ±   0%        20.00 ±  5%          ? (p=0.000 n=10)
FileInput/MaxConcurrent-20   500.0m ±    ?     21000.0m ±  5%  +4100.00% (p=0.000 n=10)
FileInput/FngrPrntLarge-20    0.000 ±   0%        5.000 ±  0%          ? (p=0.000 n=10)
FileInput/FngrPrntSmall-20    0.000 ±   0%        5.000 ±  0%          ? (p=0.000 n=10)
FileInput/NoFlush-20          0.000 ±   0%        5.000 ±  0%          ? (p=0.000 n=10)
FileInput/Many-20             103.0 ± 133%        574.5 ± 61%   +457.77% (p=0.000 n=10)
geomean                                    ¹      15.31        ?
¹ summaries must be >0 to compute geomean

@djaglowski
Copy link
Member

djaglowski commented Jan 28, 2025

One thing I notice is that we're now making a copy of the attributes map for every token. This might not be necessary. Generally, the attributes describe the file, so should be the same for every token emitted by a given reader. However,
we somewhat recently added a record number attribute which breaks this assumption.

What if instead of func(ctx context.Context, tokens []Token) error, we had func(ctx context.Context, tokens [][]byte, attributes map[string]string, lastRecord int64) error. Then when converting to entries, we can check the "include record number" flag and if necessary use lastRecord and len(tokens) to assign a record number to each entry.

Some nuances to this:

  • Since the attribute map would be shared, the emit func shouldn't modify it. I don't think this is a problem immediately but something probably worth noting in comments at least.
  • If you follow how the attribute map gets converted into an entry.Entry's attributes, it recreates its own model for keys, but assigns the value directly. For this reason, I think we can't just use any for the map's values, since this is a reference type. Since all the file attributes are string values, it seems like using map[string]string should take care of this. (Ultimately, we're still copying the attribute values, but we aren't copying the map and keys every time.)

This might be a lot of work to test but if it makes sense to you it might be worth it.

Otherwise, I'm not seeing a lot of room for optimization. I know you've put a lot into this but if we really think this change introduces memory usage by orders of magnitude, then I think we may need to look at other designs such as using callbacks to update offsets.

@andrzej-stencel
Copy link
Member Author

One thing I notice is that we're now making a copy of the attributes map for every token. This might not be necessary. Generally, the attributes describe the file, so should be the same for every token emitted by a given reader. However,

This is true, some of the attributes are the same for all tokens and (currently) only the record number attribute changes for each token. A similar suggestion was offered from @swiatekm, in a different fashion: have one (a single instance of) attribute map for the common attributes (those describing the source file) and a second attribute map that would be allocated for every record - it would currently only hold the record number attribute. The solution with a single int for the record number surely uses less memory, but the approach with separate attribute maps is probably easier to potentially extend if a new attribute needs to be added in the future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[pkg/stanza/fileconsumer] Emit logs in batches
3 participants