How are property values that are objects/arrays encoded in vector tiles? #75

mapsam · 2016-05-03T21:37:49Z

It would be great to add an example of encoding an Array or Object from the original GeoJSON to the PBF value (from the 4.5 example) to make it clear how the v2 spec should encode these values. Are the values encoded as string values and required to be parsed by the user?

i.e. for an array value:

values: {
  string_value: "[\"one\",\"two\",3]"
}

Related to https://github.com/mapbox/vector-tiles/issues/12

cc @flippmoke @danswick

The text was updated successfully, but these errors were encountered:

jfirebaugh · 2016-05-03T22:36:27Z

My understanding is that array and object values are not part of the specification, and if you want to use them you need to invent your own serialization protocol and ensure that the encoders and decoders in your part of the ecosystem understand it. In other words, there is no "should" here at the level of the VT spec.

mapsam · 2016-05-16T23:57:45Z

Thanks @jfirebaugh - this is the case.

e-n-f · 2016-05-17T00:09:51Z

Mapbox Studio Classic and recent versions of Tippecanoe write them as stringified JSON, with no way to tell whether the value originally came from a string or an object.

mapsam · 2016-05-17T00:10:59Z

with no way to tell whether the value originally came from a string or an object

Interesting @ericfischer. Does that mean parsers must try to parse as JSON first, then assume it's a string if the JSON cannot be parsed?

e-n-f · 2016-05-17T00:15:49Z

@mapsam I think most parsers just treat them as strings and don't make any attempt to reconstitute objects. But @ingalls may have more details on what ultimately happens with the values that prompted mapbox/tippecanoe#155.

mapsam · 2016-05-17T00:17:20Z

Ah, so it's up the user then to know their string/object-string values. Makes sense!

mcwhittemore · 2016-05-17T14:42:08Z

Here is one place this conversation is happening in mapbox-gl-js.

ingalls · 2016-05-17T15:00:11Z

@mapsam our use case is such that we expect either an array or a stringified array given the property name so for us parsing is relatively easy.

mcwhittemore · 2016-05-17T22:51:22Z

@mapsam mentioned reopening this to have a convo about adding how to handle objects and arrays into the spec.

@flippmoke what are your thoughts on adding this in?

pnorman · 2016-08-12T21:17:07Z

+1 to adding it in - multiple people are adding ways to represent lists (e.g. tilezen/mapbox-vector-tile#64)

nyurik · 2016-09-26T18:30:16Z

I have been talking to a large number of people about this at SOTM 2016. The biggest hurdle to store the unlimited key-values (object/dictionary/hstore) and the array support is that Mapnik is fairly rigid in terms of non-simple-type columns, and the number of those columns, so adding this support to Mapnik would require substantial efforts. I would like to propose both the Mapnik workaround as well as the spec change.

To generate an object or an array in vtile using Mapnik, one could generate the data as a single JSON-encoded string value, and add a custom server-side vector post-processing step that decodes JSON and converts it into values. This way, "name": {"en":"London","hu":"London","fr":"Londres"} would be stored as a long string, with multiple identical strings repeating, and the post-processor would convert it to 3 individual tags - "name:en":"London", "name:hu":"London", "name:fr":"Londres" (thus deduping the values).

There are two ways to actually store it:

Simple way that does not alter this spec -- simply add all keys and values as the layer message's keys and values, and update the tags fields. This might break some software's expectation that the list of tag columns is predefined, but otherwise no spec change is needed. One drawback of this approach is that the keys would be much longer - "name:en" instead of "en". Once implemented, a collate() function needs to be implemented to handle multiple languages - collate(tags,"name:fr","name:hu","name:en") would get the first existing value.
A more complex way is to extend the schema with two additional value types: dictionary and list.

The Value message should have two additional fields:

    // A list of key-value pairs. Just like Feature's "tags", the repeated pairs
    // of integers point to Layer's "keys" (1st integer) and "values" (2nd integer).
    repeated uint32 object_value = 8 [ packed = true ];

    // An array of values. Similar to Feature's "tags", each integer represents
    // an index of the Layer's "values" array.
    repeated uint32 list_value = 9 [ packed = true ];

So if the value contains object_value, the odd integers are the indexes of the Layer's keys array, and the even indexes refer to the Layer's values array. Arrays are implemented the same way, only this time list_value indexes only refer to the Layer's values array.

One of the drawback is that it is now possible to have circular dependencies, so to avoid this and simplify client's data validation, data may only be two levels deep (root+one sublevel). Another unenforced but possibly valuable requirement might be to restrict all values in both object and list to have the same type.

flippmoke · 2016-09-27T20:42:45Z

Sorry for my lack of responses on this ticket: Let me summarize some thoughts I have on this currently.

Mapnik Free Vector Tiles

I would like to propose both the Mapnik workaround as well as the spec change.

Eventually we do want to pull out all Mapbox vector tile encoding and decoding such that it is not required to be a part of Mapnik. In general mapnik-vector-tile issues should probably be discussed there. Here on the spec trying to specifically just cover ideas specific to the specification.

Lists

List support could be added by changing section 4.4 - which currently reads as follows:

Feature attributes are encoded as pairs of integers in the tag field of a feature. The first integer in each pair represents the zero-based index of the key in the keys set of the layer to which the feature belongs. The second integer in each pair represents the zero-based index of the value in the values set of the layer to which the feature belongs. Every key index MUST be unique within that feature such that no other attribute pair within that feature has the same key index. A feature MUST have an even number of tag fields. A feature tag field MUST NOT contain a key index or value index greater than or equal to the number of elements in the layer's keys or values set, respectively.

In theory we could allow keys not to be unique per a feature and if more then one occurs we assume the item to be a list. This is not ideal for compression necessarily but could solve the problem.

I do not think this change could qualify as anything but a major version incrementation (though we could debate on this).

Objects

Objects scare me quite a bit more because we suddenly are adding depth "values". This could be something that was very bad for performance overall if not implemented with thought and with some testing.

Example Implementation:

       message Object {
               // Same as feature tags currently
               repeated uint32 tags = 2 [ packed = true ];
       }

       message Value {
                // Exactly one of these values must be present in a valid message
                optional string string_value = 1;
                optional float float_value = 2;
                optional double double_value = 3;
                optional int64 int_value = 4;
                optional uint64 uint_value = 5;
                optional sint64 sint_value = 6;
                optional bool bool_value = 7;
                optional Object object_value = 8; 
                extensions 9 to max;
        }

Final Thoughts

I am going to leave this ticket open for continued discussion. Thanks for the feedback everyone!

flippmoke · 2016-09-27T20:50:31Z

@nyurik part of the challenge here with both of our ideas is that that means that all non object values MUST be encoded first. This would put a variety of different requirements on both the encoders and decoders.

nyurik · 2016-09-27T20:54:51Z

@flippmoke could you elaborate on your last comment? I'm not sure I understood.

flippmoke · 2016-09-27T21:05:31Z

@nyurik

Lets just assume the following structure:

{
    "key1": value1(int)
    "key2": value2(object) {
           "key3": value3(object) {
                 "key4": value4(int),
                 "key5": value5(int)
           },
           "key6": value6(int)
     },
     "key7": value7(int)
}

Because Value now has an "Object" that must know the index of keys and values. The order of encoding becomes critical.

value4 and value5 would have to be in the before value3.
value3 and value6 would have to be encoded before value2.

Edit: by encoded here I mean placed into values on the Layer.

                // Dictionary encoding for keys
                repeated string keys = 3;

                // Dictionary encoding for values
                repeated Value values = 4;

nyurik · 2016-09-27T21:42:44Z

@flippmoke ah, yes, we (@pnorman and I) were discussing this. I don't think we actually want to allow unlimited nest-ness. It has very limited practical value, whereas having just one level of unlimited key-value storage has a huge practical benefit - primarily for storing all possible translations of a given object (key=name, value={language-code => string}. The moment we put this restriction in place, the whole system becomes much more manageable and easy to implement:

No need to validate invalid recursive tags (value3 being the same as value2 in your example)
Encoding all the object's keys & values is a linear process - you add them to the Layer's keys & values, record their location, and save it to tags. This is identical to how the tags field is being created now.

e-n-f · 2016-09-27T21:58:35Z

If we're going to allow nesting at all, and it requires a version bump, why not allow arbitrary JSON objects?

Values are already tagged with an integer for their type, and only 1 through 7 are defined. So type 8 could denote that the value is an array (encoded as an integer array of sub-values) and type 9 denote that it is a hash (encoded as an integer array of sub-keys alternating with sub-values).

So a feature with this attribute:

"nested" : { "one" : "two", "three" : [ 4, 5, 6 ] }

would have these keys in the layer:

0. nested
1. one
2. three

and these values:

0. two (string)
1. 4 (int)
2. 5 (int)
3. 6 (int)
4. 1,2,3 (list)
5. 1,0,2,4 (hash)

with an outer key-value list of

0,5

e-n-f · 2016-09-27T22:12:28Z

I just realized that this was what @nyurik already said. Sorry for the needless duplication.

nyurik · 2016-09-27T22:42:02Z

@ericfischer I'm not sure it's the same - I actually think that we should NOT allow arbitrary nested levels, as it would create too many encoding and decoding problems, while not giving us much practical benefits. After all, vector tiles is a very domain-specific data storage format. My proposal is either to not change the spec at all (except that now a vector tile would have any number of non-predefined keys), or, if we change the spec, we only allow two new types (list and object) at the top level. Neither list nor object will allow storing other lists or objects inside of them for the reasons outlined above.

nyurik · 2016-09-28T12:34:14Z

Encoding example

{ "Features": [
  { "str": "aaa",
    "dic": { "en": "aaa", "fr": "bbb"},
    "arr": [ 100, 200 ]
  }, {
    "dic": { "en": "ccc", "gr": "bbb" },
    "arr": [ 100, 300 ]
  }
]}

VTile structure:

      # 0      1     2     3      4      5
keys: [ "str", "en", "fr", "dic", "arr", "gr" ]
values: [
  "aaa"     (string) # 0
  "bbb"     (string) # 1
                     #    key value pairs, value MUST NOT be object or array,
  [1,0,2,1] (object) # 2  all value indexes MUST be less than the index of this element.
  100       (int)    # 3
  200       (int)    # 4
  [3,4]     (array)  # 5  list of value indexes. Same restrictions as for Object
  "ccc"     (string) # 6
  [1,6,5,1] (object) # 7
  300       (int)    # 8
  [3,8]     (array)  # 9
]
features: [
  { tags: [0,0, 3,2, 4,5] },
  { tags: [3,7, 4,9] }
]

@flippmoke, I'm not so sure that repeating keys to make a list is better than adding a new value type. The decoder would need to do an additional key lookup for every new key, to make sure it hasn't seen it before (performance), and because the key needs to be repeated (compression).
UPDATE: @flippmoke yes, the values contained in the object have to be encoded before the object itself (same for arrays). But because nesting is not allowed, this would not be a problem.

e-n-f · 2016-09-28T18:57:24Z

Maybe to make validation easier, there could be a rule that a value can only refer to values that are numbered lower than itself? This would prevent reference loops.

nyurik · 2016-09-28T19:26:02Z

@ericfischer, while it would make the validation easier, it would also make encoding slightly slower and harder. Are there any practical reasons to allow infinite nesting? @pnorman mentioned that allowing nesting inside arrays would make multidimensional (Jagged) arrays possible, but I haven't seen a usage example yet.

e-n-f · 2016-09-28T19:47:10Z

I have seen one case where someone embedded an entirely different GeoJSON feature as an attribute of a GeoJSON feature. I don't know how common this need is in reality though.

From my perspective, guaranteeing that sub-values will be numbered below the value that refers to them seems easy, because you can't build your list of references until you have already assigned IDs to the sub-values, so the ID that then gets assigned to the list of references, once it is complete, will have to be greater than the IDs that were assigned earlier in the process to the things that went into it. But there might be reasons to want to write it the other way around instead. (I should try writing an implementation of this to make sure I'm not missing something important.)

e-n-f · 2016-09-28T22:03:31Z

@nyurik Whether or not it is a good idea, here is my experimental implementation of it in Tippecanoe: mapbox/tippecanoe#303

nyurik · 2016-09-29T11:27:41Z

@ericfischer, thanks. I will attempt to do the same with the https://github.com/mapbox/vector-tile-js .

While we are introducing a major change, question for proto-buf gurus: does it make sense to make keys and values to have lower ID number than features? Would it force protobuf to store features at the end, ensuring that a parser could read content sequentially, instead of skipping features and then coming back? I just looked at the https://github.com/mapbox/pbf/blob/master/index.js#L183 - sees skipping values is a linear traversal.

flippmoke · 2016-09-29T13:30:05Z

@ericfischer @nyurik could you both drop sample .proto files to make it easier to understand exactly what you are doing

pnorman · 2016-10-01T14:10:09Z

I also added a null type as 10 so that all the JSON types are covered.

I recommend not adding null

@flippmoke Can you create a dev branch in the repo which is the same as the master, except with 2.1 copied to 3.0? This will let us make PRs against it

flippmoke · 2016-10-03T14:36:53Z

@pnorman yes, I will setup a 3.0 dev branch. I have a feeling that a 3.0 branch will be much harder to find a consensus though compared to the 2.0. So don't expect any quick merges

flippmoke · 2016-10-03T14:40:37Z

@pnorman https://github.com/mapbox/vector-tile-spec/tree/v3.0-development

nyurik · 2016-10-03T21:47:28Z

I just re-posted my proposed change as #89. In the future, we should switch to 4 space tabs - easier to view diffs

e-n-f · 2016-10-03T21:52:17Z

@pnorman I have no objection to dropping key-value pairs where the value is null, which Tippecanoe (and, I think, Mapnik) does now when it encounters them. What do you think the right thing is to do when someone specifies null as an element of an array, though?

kkaefer · 2017-06-02T12:44:26Z

I recommend not adding null

@pnorman why is that?

@ericfischer Our current JS implementation interprets a missing value (e.g. a pbf message with no field set) as null, and I think that'd be preferable to having an explicit null value.

e-n-f · 2017-06-05T18:02:48Z

Thanks @kkaefer. I've updated mapbox/tippecanoe#303 to encode nulls as empty messages rather than as their own type.

e-n-f · 2018-08-14T19:14:03Z

The problem with this that I just discovered is that empty packed arrays in protocol buffers are indistinguishable from an empty message, so empty lists and hashes are mistaken for null values. I think we need magic empty-list and empty-hash messages (or a magic value index that means the same).

e-n-f · 2018-08-14T20:32:05Z

In mapbox/tippecanoe@9319f2e1b2022 I propose:

                // 0 signifies an empty hash, 1 signifies an empty list
                optional uint64 empty_value = 10;

as a way to represent empty lists and hashes.

mapsam closed this as completed May 16, 2016

mcwhittemore reopened this May 17, 2016

This was referenced Sep 26, 2016

Support encoding and decoding lists tilezen/mapbox-vector-tile#64

Open

Writing vector tiles mapbox/vector-tile-js#25

Open

nyurik mentioned this issue Feb 8, 2017

[proposal] multi language support ; language neutrality ; hstore for all osm name ; simple config openmaptiles/openmaptiles#80

Closed

4 tasks

jfirebaugh mentioned this issue Apr 5, 2017

How can I put a nested field name while defining a data driven styling? mapbox/mapbox-gl-js#4544

Closed

anandthakker mentioned this issue May 23, 2017

An expression to check if an array contains a given value mapbox/mapbox-gl-js#4698

Closed

waissbluth mentioned this issue Oct 25, 2017

Custom parsing/processing/modification of vector tiles after loading mapbox/mapbox-gl-js#5527

Closed

anandthakker mentioned this issue Oct 26, 2017

Support nested objects and arrays for GeoJSON features in query*Features mapbox/mapbox-gl-js#2434

Open

e-n-f mentioned this issue Nov 27, 2017

Any chance to include geojson properties? mapbox/tile-count#49

Open

andrewharvey mentioned this issue Dec 23, 2017

Support gradient-colored shapes mapbox/mapbox-gl-js#4095

Open

anandthakker mentioned this issue Mar 20, 2018

[2.1] Feature ID datatype #94

Open

e-n-f mentioned this issue Mar 26, 2018

Raster Field in Features #50

Open

ARolek mentioned this issue Mar 28, 2018

PostGIS JSON field go-spatial/tegola#380

Closed

jfirebaugh mentioned this issue Apr 6, 2018

Add Non-scalar properties to the version 3.0 #109

Closed

e-n-f mentioned this issue Apr 11, 2018

EXPERIMENTAL support for arbitrary JSON objects as feature attributes mapbox/tippecanoe#303

Open

flippmoke mentioned this issue Jun 26, 2018

Extension for per-node attributes #93

Open

flippmoke added the 3.0 Specification label Jul 16, 2018

e-n-f mentioned this issue Jul 16, 2018

Proposed representations of version 3 extensions #115

Closed

This was referenced Jul 19, 2018

Maps and Lists in feature properties #117

Closed

Complex Properties Overhaul #121

Closed

kkaefer mentioned this issue Jan 25, 2019

Add 'to-json' expression mapbox/mapbox-gl-js#7796

Closed

4 tasks

matteblair mentioned this issue Apr 22, 2019

Add support to parse some or all MVT feature properties as JSON tangrams/tangram#715

Merged

indus mentioned this issue Feb 28, 2023

Arrays in properties felt/tippecanoe#77

Open

stepankuzmin mentioned this issue Sep 16, 2024

Mouse events not working right on circle layers with sub-properties mapbox/mapbox-gl-js#9469

Open

How are property values that are objects/arrays encoded in vector tiles? #75

How are property values that are objects/arrays encoded in vector tiles? #75

Comments

mapsam commented May 3, 2016

jfirebaugh commented May 3, 2016

mapsam commented May 16, 2016

e-n-f commented May 17, 2016

mapsam commented May 17, 2016

e-n-f commented May 17, 2016

mapsam commented May 17, 2016

mcwhittemore commented May 17, 2016

ingalls commented May 17, 2016

mcwhittemore commented May 17, 2016

pnorman commented Aug 12, 2016 • edited Loading

nyurik commented Sep 26, 2016 • edited Loading

flippmoke commented Sep 27, 2016

Mapnik Free Vector Tiles

Lists

Objects

Final Thoughts

flippmoke commented Sep 27, 2016

nyurik commented Sep 27, 2016

flippmoke commented Sep 27, 2016 • edited Loading

nyurik commented Sep 27, 2016

e-n-f commented Sep 27, 2016

e-n-f commented Sep 27, 2016

nyurik commented Sep 27, 2016

nyurik commented Sep 28, 2016 • edited Loading

e-n-f commented Sep 28, 2016

nyurik commented Sep 28, 2016

e-n-f commented Sep 28, 2016

e-n-f commented Sep 28, 2016

nyurik commented Sep 29, 2016

flippmoke commented Sep 29, 2016

pnorman commented Oct 1, 2016 • edited Loading

flippmoke commented Oct 3, 2016

flippmoke commented Oct 3, 2016

nyurik commented Oct 3, 2016 • edited Loading

e-n-f commented Oct 3, 2016

kkaefer commented Jun 2, 2017

e-n-f commented Jun 5, 2017

e-n-f commented Aug 14, 2018

e-n-f commented Aug 14, 2018

pnorman commented Aug 12, 2016 •

edited

Loading

nyurik commented Sep 26, 2016 •

edited

Loading

flippmoke commented Sep 27, 2016 •

edited

Loading

nyurik commented Sep 28, 2016 •

edited

Loading

pnorman commented Oct 1, 2016 •

edited

Loading

nyurik commented Oct 3, 2016 •

edited

Loading