marshall module ideas #8

pfalcon · 2018-01-07T10:42:20Z

It seems that MsgPack is a viable choice to implement marshall encdoing: https://github.com/msgpack/msgpack/blob/master/spec.md

Possibly, an adhoc serialization format would be even more efficient, but at least MsgPack is able to differentiate bytes vs str's, etc.

pfalcon · 2018-01-07T10:44:04Z

Problems would be: no differentiation between tuple and list, dict and OrderedDict.

pfalcon · 2018-01-07T10:45:12Z

Also, no encoding of array with 8 bits of length, there's a jump from 4 bits to 16 bits (same for maps).

pfalcon · 2018-01-07T11:25:49Z

There's also CBOR, and teh-drama between it and MsgPack: msgpack/msgpack#129

pfalcon · 2018-01-07T11:29:54Z

CBOR is used in CoAP, so kinda would be "more useful" than MsgPack...

pfalcon · 2018-01-07T11:34:01Z

MsgPack has random gap in:

fixstr	101xxxxx	0xa0 - 0xbf
bin 8	11000100	0xc4

I.e., only short textual strs can be efficiently encoded, bytestr's require explicit len byte always.

CBOR doesn't have that "limitation": https://tools.ietf.org/html/rfc7049#appendix-B (of course, it encodes something else less efficiently instead, as all MsgPack encoding bytes are used (well, one is reserved)).

pfalcon · 2018-01-07T12:11:28Z

Note that motivation for marshall module is encoding data rows for btree database. I.e. the motivation is: "need to serialize tuples for btree db" -> "why not implement that by implementing marshall module which can be used for many other things too".

That adds additional requirement: being able to efficiently compare serialized arrays (i.e. without requiring full decoding).

pfalcon · 2018-01-07T12:23:53Z

CBOR defines encodings for bignums for example. Looks, like it's a winner.

hardkrash · 2018-04-23T04:54:39Z

CBOR tags are rather extensible, they are looking to incorporate fixed point types and arrays for ADCs.
https://cbor-wg.github.io/array-tags/

smurfix · 2021-05-10T17:49:43Z

MsgPack has random gap in

Umm, no? 0xc0 through 0xc3 are None//False/True. CBOR also has gaps in it …

A more relevant advantage of CBOR is that you can prefix an item with a rather simple "use the following data as input to ‹class›__setstate__()" tag, where the class name is encoded in the tag. If you want to do the same thing with msgpack, you need either an in-memory copy of the object's encoded bytestring or two passes on the data structure __getstate() returns. Shorter tags could be used for more-common distinctions between e.g. tuple and list: just specify a "read only hint" tag, and possibly a "the following data is ordered" tag for OrderedDict.

Another advantage would be the ability to encode indeterminate-length data (this is basically impossible with msgpack), though I have no idea whether that is actually a relevant use case for micropython/pycopy.

asan considers that memcmp(p, q, N) is permitted to access N bytes at each of p and q, even for values of p and q that have a difference earlier. Accessing additional values is frequently done in practice, reading 4 or more bytes from each input at a time for efficiency, so when completing "non_exist<TAB>" in the repl, this causes a diagnostic: ==16938==ERROR: AddressSanitizer: global-buffer-overflow on address 0x555555cd8dc8 at pc 0x7ffff726457b bp 0x7fffffffda20 sp 0x7fff READ of size 9 at 0x555555cd8dc8 thread T0 #0 0x7ffff726457a (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xb857a) #1 0x555555b0e82a in mp_repl_autocomplete ../../py/repl.c:301 #2 0x555555c89585 in readline_process_char ../../lib/mp-readline/re #3 0x555555c8ac6e in readline ../../lib/mp-readline/readline.c:513 #4 0x555555b8dcbd in do_repl /home/jepler/src/micropython/ports/uni #5 0x555555b90859 in main_ /home/jepler/src/micropython/ports/unix/ #6 0x555555b90a3a in main /home/jepler/src/micropython/ports/unix/m #7 0x7ffff619a09a in __libc_start_main ../csu/libc-start.c:308 #8 0x55555595fd69 in _start (/home/jepler/src/micropython/ports/uni 0x555555cd8dc8 is located 0 bytes to the right of global variable 'import_str' defined in '../../py/repl.c:285:23' (0x555555cd8dc0) of size 8 'import_str' is ascii string 'import ' Signed-off-by: Jeff Epler <[email protected]>

pfalcon mentioned this issue Sep 3, 2018

"Finishing Pycopy" TODO #15

Open

15 tasks

pfalcon mentioned this issue Sep 23, 2019

error when pickle dict with float (point sign) and tuple pfalcon/pycopy-lib#40

Open

pfalcon mentioned this issue Aug 6, 2020

"Beyond finishing" ideas #18

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

marshall module ideas #8

marshall module ideas #8

pfalcon commented Jan 7, 2018 •

edited

Loading

pfalcon commented Jan 7, 2018

pfalcon commented Jan 7, 2018

pfalcon commented Jan 7, 2018

pfalcon commented Jan 7, 2018

pfalcon commented Jan 7, 2018 •

edited

Loading

pfalcon commented Jan 7, 2018

pfalcon commented Jan 7, 2018

hardkrash commented Apr 23, 2018

smurfix commented May 10, 2021 •

edited

Loading

marshall module ideas #8

marshall module ideas #8

Comments

pfalcon commented Jan 7, 2018 • edited Loading

pfalcon commented Jan 7, 2018

pfalcon commented Jan 7, 2018

pfalcon commented Jan 7, 2018

pfalcon commented Jan 7, 2018

pfalcon commented Jan 7, 2018 • edited Loading

pfalcon commented Jan 7, 2018

pfalcon commented Jan 7, 2018

hardkrash commented Apr 23, 2018

smurfix commented May 10, 2021 • edited Loading

pfalcon commented Jan 7, 2018 •

edited

Loading

pfalcon commented Jan 7, 2018 •

edited

Loading

smurfix commented May 10, 2021 •

edited

Loading