Using `ByteSliceMode` (unreleased new feature) can encode invalid CBOR data #545

fxamacker · 2024-05-29T23:59:57Z

ByteSliceMode is a new feature planned for v2.7.0 (not yet released) and was added by:

Support automatic conversion between text and binary string representations #476

Unfortunately, using the non-default encoding setting for ByteSliceMode can produce invalid CBOR data in some edge cases.

This problem was discovered by fuzzing.

Reproducer

Steps:

Create an encoding mode with ByteSliceMode set to any non-default option
Encode cbor.Tag with tag content of []byte type

func main() {
	data := []byte{
		0xc2, 0x49, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	} // 2(h'010000000000000000')

	var v1 cbor.Tag
	err := cbor.Unmarshal(data, &v1)
	if err != nil {
		fmt.Printf("failed to unmarshal %x cbor.Tag: %s\n", data, err)
		return
	}

	em, _ := cbor.EncOptions{
		ByteSlice: cbor.ByteSliceToByteStringWithExpectedConversionToBase64URL,
	}.EncMode()
	b, err := em.Marshal(v1)
	if err != nil {
		fmt.Printf("failed to marshal: %s\n", err)
		return
	}

	var v2 interface{}
	err = cbor.Unmarshal(b, &v2)
	if err != nil {
		fmt.Printf("failed to unmarshal %x to interface{}: %s\n", b, err)
	}

	// Output:
	// failed to unmarshal c2d549010000000000000000 to interface{}: cbor: tag number 2 or 3 must be followed by byte string, got tag
}

Expected: The code snippet should produce valid CBOR data 2(h'010000000000000000').

Got: The code snippet produces invalid CBOR data 2(21(h'010000000000000000')).

cc: @benluddy

EDIT: s/malformed/invalid

The text was updated successfully, but these errors were encountered:

benluddy · 2024-05-30T02:43:18Z

That's unfortunate. This is a validity error, but the output is not malformed, correct?

Marshaling a cbor.Tag can't guarantee that it produces valid output in general, especially when it comes to tag content type validity errors. You can of course produce invalid output by calling something like Marshal(Tag{Number: 0, Content: int64(52)}), or by doing a similar roundtrip with an unrecognized tag. For example, using BigIntConvertNone, the decimal fraction 0xc4823bffffffffffffffff00 (4([-18446744073709551616, 0])) roundtrips to 0xc482c348ffffffffffffffff00 (4([3(h'ffffffffffffffffff'), 0])), which is invalid (the exponent must have either type 0 or 1).

It's not clear to me what should be done about this.

I think it makes sense to check tag validity on the output of encodeTag for the built-in tag numbers before returning. If Marshal returns a nil error, the output should always be well-formed, and it shouldn't have any known validity errors by default.

Roundtripping an enclosed data item through interface{} and back without considering the tag number will never be completely safe, but I can imagine an argument to at least decouple cbor.Tag from the options that control encode/decode to/from interface{} (as in #503). WDYT?

fxamacker · 2024-05-30T03:00:50Z

@benluddy

That's unfortunate. This is a validity error, but the output is not malformed, correct?

Yes, it is a validity error.

Roundtripping an enclosed data item through interface{} and back without considering the tag number will never be completely safe, but I can imagine an argument to at least decouple cbor.Tag from the options that control encode/decode to/from interface{} (as in #503). WDYT?

I think decoupling cbor.Tag from options makes sense since cbor.Tag is designed to represent CBOR tag as is. However, I need to double check already released (v2.6.0) options in case this is a breaking change.

benluddy · 2024-05-31T19:46:33Z

Yes, I expect it would be a breaking change since we can come up with a value of Content for most of the encode options whose output depends on the option.

Now that I've had time to consider this more, I want to implement a fix. I also want to make sure that I'm addressing the right issue, so please let me try to break down my understanding of the problem:

Marshaling an arbitrary cbor.Tag is today already not guaranteed to produce valid CBOR, even for recognized tag numbers (e.g. https://go.dev/play/p/shjm6tv09hi or https://go.dev/play/p/gXPumodsVDb). I think it would be responsible to check for and return any tag validity errors for tag numbers with built-in support. Doing this would be backwards-incompatible, though, and I don't think breaking that by default is as clear-cut as bug: Marshal can produce malformed CBOR when marshaling a cbor.Marshaler that produces malformed CBOR #482 since the output is still well-formed.
For unrecognized tag numbers, the library does not have enough information to detect tag validity errors. The application ultimately has to be responsible for validity errors in any special Tag values it directly creates and marshals.
Users can unmarshal into cbor.Tag, or by default get a cbor.Tag when unmarshaling unrecognized tags into interface{}. When this happens, the tag number is extracted and the enclosed data item is unmarshaled into interface{}, independent of the enclosing tag, and subject to any decode options that affect decoding into interface{}. Unlike cbor.RawTag, the original encoded form from the input is not preserved. You may end up with (for example) a Go value of type int64, uint64, or math/big.Int after decoding a data item with major type 0 depending on the value and the decode options in effect.
Because information about the original encoding is lost on unmarshal, and because the marshaling of the tag content value depends on encode options, roundtripping CBOR to Tag and back can result in a different sequence of bytes.

Given all this, I don't think that the requirement is that cbor.Tag must never marshal a data item with validity errors. And since it's already possible to do the same thing without the new option (as in https://go.dev/play/p/ZuLgThz5c3U), I think what makes this problematic parts is that the invalid output can be produced by a CBOR-to-Tag-to-CBOR roundtrip and that it affects a built-in tag number.

To address the immediate issue, I propose special-casing this in encodeTag (and maintaining similar special cases as necessary to make Tag marshaling "do the right thing" for the built-in tags). I hope that in the longer term, progress on topics like #507 will reach the point where user-registered types and "built-ins" can be configured using the same mechanisms, and that that will reduce the need for special cases across the board.

Later, a new "strongly encouraged opt-in" mode for UnrecognizedTagToAny could decode unrecognized tags to interface{} as RawTag instead of Tag. That would guarantee faithful roundtrips unless an application actively decides to process and change the unrecognized tag.

I'll try to get a PR open today for consideration.

fxamacker added the bug Something isn't working label May 29, 2024

fxamacker changed the title ~~Using ByteSliceMode (unreleased new feature) can encode malformed data~~ Using ByteSliceMode (unreleased new feature) can encode invalid CBOR data May 30, 2024

benluddy mentioned this issue May 31, 2024

Disable conflicting encode options when marshaling cbor.Tag. #546

Merged

8 tasks

fxamacker closed this as completed in #546 Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using `ByteSliceMode` (unreleased new feature) can encode invalid CBOR data #545

Using `ByteSliceMode` (unreleased new feature) can encode invalid CBOR data #545

fxamacker commented May 29, 2024 •

edited

Loading

benluddy commented May 30, 2024

fxamacker commented May 30, 2024 •

edited

Loading

benluddy commented May 31, 2024

Using ByteSliceMode (unreleased new feature) can encode invalid CBOR data #545

Using ByteSliceMode (unreleased new feature) can encode invalid CBOR data #545

Comments

fxamacker commented May 29, 2024 • edited Loading

Reproducer

benluddy commented May 30, 2024

fxamacker commented May 30, 2024 • edited Loading

benluddy commented May 31, 2024

Using `ByteSliceMode` (unreleased new feature) can encode invalid CBOR data #545

Using `ByteSliceMode` (unreleased new feature) can encode invalid CBOR data #545

fxamacker commented May 29, 2024 •

edited

Loading

fxamacker commented May 30, 2024 •

edited

Loading