Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: inspect first/outer "kind" without full decode #440

Open
extemporalgenome opened this issue Nov 13, 2023 · 1 comment
Open

feature: inspect first/outer "kind" without full decode #440

extemporalgenome opened this issue Nov 13, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@extemporalgenome
Copy link

extemporalgenome commented Nov 13, 2023

Is your feature request related to a problem? Please describe.

A nice property of the json.RawMessage design is that it's fairly trivial to safely inspect the broad kind of JSON data with:

// A properly decoded json.RawMessage always
// starts with a non-space token byte.
switch theRawMessage[0] {
case '{': // object
case '[': // array
case '"': // string
case 'n': // null
case 'f': // false
case 't': // true
default:  // number
}

This can also be done by inspecting leading bytes of a cbor.RawMessage, but there are many more leading bytes, and they're much less memorable (i.e. the application would need to implement a partial CBOR decoder to work around this package not providing kind detection as a cheap capability).

Decoding into any to just check the kind is often undesirable because:

  1. It's expensive, especially in terms of garbage.
  2. The contract is not stable, and hard to exhaustively account using type assertions, since DecOptions can yield uint64 vs int64 variations, many possible map and slice combinations, etc. Use of reflect provides more stability, but is unwieldy.

Describe the solution you'd like

Introduce a cbor.Kind type, with values like cbor.KindInt. It's unclear if distinctions between int vs uint vs big int, or the different size variants, should be represented, though bit field style constants (i.e. cbor.KindNumber = cbor.KindInt | cbor.KindFloat | ..., cbor.KindInt = cbor.KindInt8 | ...), or helper methods (func (Kind) IsNumber() bool) could solve for this.

A func DetectKind([]byte) (Kind, error) function could be used to obtain a Kind value. If there's a const KindInvalid Kind = 0 available, then such a function would not need to return an error.

A companion DetectTagKind function which returns a (uint64, Kind) (or similar), may also be useful.

Describe alternatives you've considered

It seems there is a branch or effort to expose a streaming tokenizer. If so, that could provide equivalent functionality, where the above case would merely involve a peek at the next token, potentially followed by a normal decode or token consumption.

@fxamacker fxamacker added the enhancement New feature or request label Dec 3, 2023
@fxamacker
Copy link
Owner

@extemporalgenome Thanks for opening this issue! This sounds useful and makes sense. I'll look into it this month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants