Skip to content

Commit

Permalink
clarify strictness
Browse files Browse the repository at this point in the history
  • Loading branch information
DavidBuchanan314 committed Mar 1, 2024
1 parent e85c6d1 commit 723e4f5
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 20 deletions.
21 changes: 19 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# dag-cbrrr
Convert between DAG-CBOR and Python objects at hundreds of megabytes per second. Take a look at the [benchmarks](https://github.com/DavidBuchanan314/dag-cbor-benchmark)

Other than speed, a distinguishing feature is that it operates *non-recursively*. This means you can decode or encode arbitrarily deeply nested objects without running out of call stack (although of course you might still run out of heap)
Other than speed, a distinguishing feature is that it operates *non-recursively*. This means you can decode or encode arbitrarily deeply nested objects without running out of call stack (although of course you might still run out of heap).

Finally, cbrrr aims to be maximally strict regarding DAG-CBOR canonicalization rules. See [below](#strictness) for further details.

## Installation

Expand Down Expand Up @@ -69,7 +71,22 @@ def encode_dag_cbor(

"atjson_mode" refers to the representation used in atproto HTTP APIs, documented here [here](https://atproto.com/specs/data-model#json-representation). It is *not* a round-trip-safe representation.

### Using `multiformats.CID`
## Strictness

cbrrr aims to conform to all the [strictness rules](https://ipld.io/specs/codecs/dag-cbor/spec/#strictness) set out in the DAG-CBOR specification.

It decodes strictly, and there is no non-strict mode available. This means, among other things:

- No duplicate map keys are allowed
- No non-canonically sorted map keys are allowed
- No non-string map keys are allowed
- Only 64-bit floats are allowed
- All integers/lengths must be minimally encoded
- Only tag type 42 is allowed (NOTE: For now, CID values themselves are not validated)

In its default configuration, valid DAG-CBOR should round-trip perfectly, i.e. `encode_dag_cbor(decode_dag_cbor(data)) == data`. (This is not necessarily true if you specify `atjson_mode=True`, or pass a custom CID type (see below) that misbehaves in some way).

## Using `multiformats.CID`

cbrrr brings its own performance-oriented CID class, but it's relatively bare-bones (supporting only base32, for now). If you want more features and broader compatibility, you can use the CID class from [hashberg-io/multiformats](https://github.com/hashberg-io/multiformats) like so:

Expand Down
23 changes: 5 additions & 18 deletions tests/test_cbrrr.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,14 +94,10 @@ def test_duplicate_map_keys(self):
dup += cbor_head(MajorType.TEXT_STRING, 3) + b"abc"
dup += cbor_head(MajorType.UNSIGNED_INT, 2)

with self.assertRaises(ValueError) as cm:
# nb: my dupe detection logic is a consequence of my key order enforcement logic
with self.assertRaisesRegex(ValueError, "non-canonical"):
cbrrr.decode_dag_cbor(dup)

self.assertEqual(
str(cm.exception),
"non-canonical map key ordering ('abc' <= 'abc')"
)

def test_unsorted_map_keys(self):
# {"def": 1, "abc": 2}
obj = cbor_head(MajorType.MAP, 2)
Expand All @@ -110,13 +106,8 @@ def test_unsorted_map_keys(self):
obj += cbor_head(MajorType.TEXT_STRING, 3) + b"abc"
obj += cbor_head(MajorType.UNSIGNED_INT, 2)

with self.assertRaises(ValueError) as cm:
with self.assertRaisesRegex(ValueError, "non-canonical"):
cbrrr.decode_dag_cbor(obj)

self.assertEqual(
str(cm.exception),
"non-canonical map key ordering ('abc' <= 'def')"
)

# {"aaa": 1, "x": 2} (shorter string should sort first)
obj = cbor_head(MajorType.MAP, 2)
Expand All @@ -125,13 +116,9 @@ def test_unsorted_map_keys(self):
obj += cbor_head(MajorType.TEXT_STRING, 1) + b"x"
obj += cbor_head(MajorType.UNSIGNED_INT, 2)

with self.assertRaises(ValueError) as cm:
with self.assertRaisesRegex(ValueError, "non-canonical"):
cbrrr.decode_dag_cbor(obj)

self.assertEqual(
str(cm.exception),
"non-canonical map key ordering (len('x') < len('aaa'))"
)


if __name__ == '__main__':
unittest.main(module="tests.test_cbrrr")

0 comments on commit 723e4f5

Please sign in to comment.