Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clone returned byte slice in MarshalValue #1913

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Ragnaroek
Copy link

No issue number yet, as for now unsure if this is intended behaviour or a bug.

Summary

This PR is not ready for merge. It's more to start a discussion around a behaviour I noticed while upgrading to v2 of the driver.

After the v2 driver upgrade unit-tests failed in our project. I took me a long time to find the issue, but I think the root cause is that 'MarshalValue' hands out byte slices that are potentially shared with the Buffers from the pool. This can lead to nasty race-conditions where the byte slices are modified in our test-setup.

I tried to reproduce the behaviour in a unit-test. But this somehow did not work. However the unit-test is a stripped down version of our unit-test in the project and should give you an idea how this leads to a problem with our tests. I can detail this further of course, if it helps to understand the issue.

The fix is easy in the end. If the byte slice is copied, the problem stops. I can do this on the usage side, but I think this shared behaviour of the returned bytes is very unintuitives and error prone in the edge-cases?

The PR proposes a copy of the bytes before returning them. Not sure that is something that is aligned with goals of the library function.

Background & Motivation

This PR fixes a very unintuitive and hard bug to find while "only" upgrading to v2 of the MongoDB driver. I assume other users of the library could also experience this and a proposal for a fix is up for a discussion.

@matthewdale matthewdale self-requested a review December 31, 2024 19:57
@mongodb-drivers-pr-bot mongodb-drivers-pr-bot bot added the priority-3-low Low Priority PR for Review label Dec 31, 2024
Copy link
Contributor

API Change Report

No changes found!

Copy link
Collaborator

@matthewdale matthewdale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ragnaroek thank you for the PR! That is not expected behavior and is indeed a bug. I've confirmed this is the root cause behind another intermittent test failure we've been troubleshooting (GODRIVER-3307).

bson/marshal.go Outdated
@@ -133,7 +134,7 @@ func MarshalValue(val interface{}) (Type, []byte, error) {
return 0, nil, err
}
typ := sw.Next(2)
return Type(typ[0]), sw.Bytes(), nil
return Type(typ[0]), slices.Clone(sw.Bytes()), nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Go Driver currently supports a min Go version of 1.18, which does not have a "slices" package (it was added in Go 1.21). Instead, use the same clone logic from Marshal.

Suggested change
return Type(typ[0]), slices.Clone(sw.Bytes()), nil
buf := append([]byte(nil), sw.Bytes()...)
return Type(typ[0]), buf, nil

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, already using Go 1.23. I updated the PR to copy in a pre Go 1.21 way.

Comment on lines +334 to +337
func TestSharedUseOfMarshalledBytes(t *testing.T) {
type testValue struct {
value string
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unable to reproduce the buffer reuse bug with this test. The value field is not exported, so the test cases seem to always marshal an empty document. How is this test intended to work?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also failed to write a reproducible test for this. The test also does not fail if the Value field is made public.
To reproduced this a shared used of the byte buffer has to be provoked. As this is highly dependent on the concurrent behaviour of the byte pool, I'm not sure if this can be achieved in a stable way.

In our production code the test always failed. It was not shaky or something. Unfortunately the production code is proprietary and I cannot share it. The test in the PR tried to re-produce a minimal example of the failing test, but I failed doing so. The gist of it was that the reference to the (shared) byte buffer was stored in the 'sharedTestCase' struct and was modified by a second test. If the test were run in isolation they did not fail, only with two tests together the shared issue did surface.

If you also have no smart idea on how to reproduce this "shared buffer issue" I would remove the test and leave the PR with just the copy of the byte buffer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority-3-low Low Priority PR for Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants