Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using ByteSliceMode (unreleased new feature) can encode invalid CBOR data #545

Closed
fxamacker opened this issue May 29, 2024 · 3 comments · Fixed by #546
Closed

Using ByteSliceMode (unreleased new feature) can encode invalid CBOR data #545

fxamacker opened this issue May 29, 2024 · 3 comments · Fixed by #546
Labels
bug Something isn't working

Comments

@fxamacker
Copy link
Owner

fxamacker commented May 29, 2024

ByteSliceMode is a new feature planned for v2.7.0 (not yet released) and was added by:

Unfortunately, using the non-default encoding setting for ByteSliceMode can produce invalid CBOR data in some edge cases.

This problem was discovered by fuzzing.

Reproducer

Steps:

  • Create an encoding mode with ByteSliceMode set to any non-default option
  • Encode cbor.Tag with tag content of []byte type
func main() {
	data := []byte{
		0xc2, 0x49, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	} // 2(h'010000000000000000')

	var v1 cbor.Tag
	err := cbor.Unmarshal(data, &v1)
	if err != nil {
		fmt.Printf("failed to unmarshal %x cbor.Tag: %s\n", data, err)
		return
	}

	em, _ := cbor.EncOptions{
		ByteSlice: cbor.ByteSliceToByteStringWithExpectedConversionToBase64URL,
	}.EncMode()
	b, err := em.Marshal(v1)
	if err != nil {
		fmt.Printf("failed to marshal: %s\n", err)
		return
	}

	var v2 interface{}
	err = cbor.Unmarshal(b, &v2)
	if err != nil {
		fmt.Printf("failed to unmarshal %x to interface{}: %s\n", b, err)
	}

	// Output:
	// failed to unmarshal c2d549010000000000000000 to interface{}: cbor: tag number 2 or 3 must be followed by byte string, got tag
}

Expected: The code snippet should produce valid CBOR data 2(h'010000000000000000').

Got: The code snippet produces invalid CBOR data 2(21(h'010000000000000000')).

cc: @benluddy

EDIT: s/malformed/invalid

@fxamacker fxamacker added the bug Something isn't working label May 29, 2024
@benluddy
Copy link
Contributor

That's unfortunate. This is a validity error, but the output is not malformed, correct?

Marshaling a cbor.Tag can't guarantee that it produces valid output in general, especially when it comes to tag content type validity errors. You can of course produce invalid output by calling something like Marshal(Tag{Number: 0, Content: int64(52)}), or by doing a similar roundtrip with an unrecognized tag. For example, using BigIntConvertNone, the decimal fraction 0xc4823bffffffffffffffff00 (4([-18446744073709551616, 0])) roundtrips to 0xc482c348ffffffffffffffff00 (4([3(h'ffffffffffffffffff'), 0])), which is invalid (the exponent must have either type 0 or 1).

It's not clear to me what should be done about this.

I think it makes sense to check tag validity on the output of encodeTag for the built-in tag numbers before returning. If Marshal returns a nil error, the output should always be well-formed, and it shouldn't have any known validity errors by default.

Roundtripping an enclosed data item through interface{} and back without considering the tag number will never be completely safe, but I can imagine an argument to at least decouple cbor.Tag from the options that control encode/decode to/from interface{} (as in #503). WDYT?

@fxamacker
Copy link
Owner Author

fxamacker commented May 30, 2024

@benluddy

That's unfortunate. This is a validity error, but the output is not malformed, correct?

Yes, it is a validity error.

Roundtripping an enclosed data item through interface{} and back without considering the tag number will never be completely safe, but I can imagine an argument to at least decouple cbor.Tag from the options that control encode/decode to/from interface{} (as in #503). WDYT?

I think decoupling cbor.Tag from options makes sense since cbor.Tag is designed to represent CBOR tag as is. However, I need to double check already released (v2.6.0) options in case this is a breaking change.

@fxamacker fxamacker changed the title Using ByteSliceMode (unreleased new feature) can encode malformed data Using ByteSliceMode (unreleased new feature) can encode invalid CBOR data May 30, 2024
@benluddy
Copy link
Contributor

Yes, I expect it would be a breaking change since we can come up with a value of Content for most of the encode options whose output depends on the option.

Now that I've had time to consider this more, I want to implement a fix. I also want to make sure that I'm addressing the right issue, so please let me try to break down my understanding of the problem:

  1. Marshaling an arbitrary cbor.Tag is today already not guaranteed to produce valid CBOR, even for recognized tag numbers (e.g. https://go.dev/play/p/shjm6tv09hi or https://go.dev/play/p/gXPumodsVDb). I think it would be responsible to check for and return any tag validity errors for tag numbers with built-in support. Doing this would be backwards-incompatible, though, and I don't think breaking that by default is as clear-cut as bug: Marshal can produce malformed CBOR when marshaling a cbor.Marshaler that produces malformed CBOR #482 since the output is still well-formed.

  2. For unrecognized tag numbers, the library does not have enough information to detect tag validity errors. The application ultimately has to be responsible for validity errors in any special Tag values it directly creates and marshals.

  3. Users can unmarshal into cbor.Tag, or by default get a cbor.Tag when unmarshaling unrecognized tags into interface{}. When this happens, the tag number is extracted and the enclosed data item is unmarshaled into interface{}, independent of the enclosing tag, and subject to any decode options that affect decoding into interface{}. Unlike cbor.RawTag, the original encoded form from the input is not preserved. You may end up with (for example) a Go value of type int64, uint64, or math/big.Int after decoding a data item with major type 0 depending on the value and the decode options in effect.

  4. Because information about the original encoding is lost on unmarshal, and because the marshaling of the tag content value depends on encode options, roundtripping CBOR to Tag and back can result in a different sequence of bytes.

Given all this, I don't think that the requirement is that cbor.Tag must never marshal a data item with validity errors. And since it's already possible to do the same thing without the new option (as in https://go.dev/play/p/ZuLgThz5c3U), I think what makes this problematic parts is that the invalid output can be produced by a CBOR-to-Tag-to-CBOR roundtrip and that it affects a built-in tag number.

To address the immediate issue, I propose special-casing this in encodeTag (and maintaining similar special cases as necessary to make Tag marshaling "do the right thing" for the built-in tags). I hope that in the longer term, progress on topics like #507 will reach the point where user-registered types and "built-ins" can be configured using the same mechanisms, and that that will reduce the need for special cases across the board.

Later, a new "strongly encouraged opt-in" mode for UnrecognizedTagToAny could decode unrecognized tags to interface{} as RawTag instead of Tag. That would guarantee faithful roundtrips unless an application actively decides to process and change the unrecognized tag.

I'll try to get a PR open today for consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants