Add support to parse some or all MVT feature properties as JSON #715

bcamper · 2019-04-21T21:48:43Z

Sometimes source data includes properties with a richer object format than just strings or numbers -- e.g. arrays, nested objects, etc. The MVT spec doesn't prescribe explicit behavior for these cases, but notes that common tools such as Tippecanoe and Mapnik will encode these properties as stringified JSON.

This requires client-side support for parsing these fields. This PR adds support for this on MVT data sources through a new parse_json property:

If parse_json is true, then each property will be checked to see if it "looks like" stringified JSON (defined as a string with first character being { or [); if so, it is parsed as JSON (with JSON.parse).
If parse_json is an array of property names, only those specific properties are checked for JSON parsing. This is preferred to the above, because it limits the parsing impact to only the fields that need it.
If parse_json is undefined/null/false, no special parsing of properties is done (e.g. the current behavior).

Example usage:

sources:
  tiles:
    type: MVT
    url: https://...
    parse_json: [prop_a, prop_b] # treat feature properties 'prop_a' and 'prop_b' as stringified JSON

These parsed properties will now be properly returned as full JS objects through functions such as scene.getFeatureAt() or scene.queryFeatures().

To make use of these properties in filtering and styling, you will need to write a JS function to access the complex properties, e.g. if prop_a is an array of values, you could filter on it containing the value x like so:

filter: function() { return feature.prop_a.indexOf('x') > -1; }

bcamper · 2019-04-21T21:49:50Z

@nvkelso I think we also had some values similar to this in TIlezen for awhile (for multiple shields?), but removed them (for unrelated reasons).

nvkelso · 2019-04-22T04:01:22Z

Tilezen still exports the array properties in GeoJSON and TopoJSON, but currently strips out in MVT. We can start exporting them in MVT when Tangram grows support. Primarily these are multishields (road, transit, walk, bike).

bcamper · 2019-04-22T13:29:04Z

Tilezen still exports the array properties in GeoJSON and TopoJSON, but currently strips out in MVT. We can start exporting them in MVT when Tangram grows support. Primarily these are multishields (road, transit, walk, bike).

Oh great! Bring it on :)

bcamper · 2019-04-22T21:23:18Z

@matteblair thoughts on this on the ES side?

matteblair · 2019-04-22T23:38:27Z

This would be a bigger change for Tangram ES since we currently model feature properties explicitly as a mapping of strings to scalar values. However I can see the value in supporting more structured data for properties.

This is also a topic of discussion for the next version of the MVT spec (mapbox/vector-tile-spec#75). It does not seem like the proposed solution in MVT would be mutually exclusive with the one here, but it would make this one redundant. The MVT proposal also has the advantage that it's a much more compact representation of structured data: values can be de-duplicated in MVT but in stringified JSON they must be duplicated.

bcamper · 2019-04-23T02:13:23Z

This would be a bigger change for Tangram ES since we currently model feature properties explicitly as a mapping of strings to scalar values. However I can see the value in supporting more structured data for properties.

This is also a topic of discussion for the next version of the MVT spec (mapbox/vector-tile-spec#75). It does not seem like the proposed solution in MVT would be mutually exclusive with the one here, but it would make this one redundant. The MVT proposal also has the advantage that it's a much more compact representation of structured data: values can be de-duplicated in MVT but in stringified JSON they must be duplicated.

I agree the eventual MVT v3 solution will be better, however after some time (that ticket was opened in 2016...) I decided to finally move ahead with this because: 1) there are popular tools that implement this JSON stringify behavior, particularly Tippecanoe, so it's a reality for a lot of existing data, and 2) MVT v3 is going to be a substantial upgrade (tons of other things in the spec) that will require widespread updates to both encoding and decoding libs and I think a lot of MVT v2 data will persist.

matteblair · 2019-04-23T05:56:19Z

That makes sense. We probably shouldn't count on the MVT v3 spec being finished any time soon, and even with that approach we would have to update the feature data model in Tangram ES. This feature sounds good to me.

matteblair · 2019-04-23T06:10:20Z

I'm also interested in finding ways to make these properties available to filters in a "native" way, since using a JS function in a filter makes it an order of magnitude slower in Tangram ES. Having built-in accesors for structured properties could also implicitly define which ones need to be parsed, and allow us to do it lazily.

bcamper · 2019-04-23T14:57:54Z

Agree on finding adding support for more native syntax. I'm mostly concerned about how to do it in a way that doesn't impact performance, since the assumption of a 1:1 mapping from YAML string to feature property key is fast. But I'm sure we can use some clues from data source definition and/or caching strategy to mitigate this.

assume specified property names can be parsed when using `parse_json: []` form

(unescaped '.' will be interpreted as a nested feature property)

bcamper · 2019-05-01T15:14:01Z

OK @matteblair inspired me to take this further. I've added support both for accessing nested properties with native syntax in filters, and for doing array-to-array filter operations. I'm not 100% sure of the syntax/naming decisions here, so feedback/alternatives welcome.

Nested properties in filters

Dot notation with . can be used to access nested feature properties (these properties could have been encoded in a GeoJSON/TopoJSON source, or parsed from MVT stringified JSON properties as covered by this PR):

Given a feature property a: { b: { c: 'test' } }, this filter will match:

filter: { a.b.c: test }

Feature property names that include a . can be escaped with \., e.g. a feature property named 'd.e.f': 'escaped' will match with:

filter: { d\.e\.f: escaped }

These could be mixed, e.g. a property { 'a.b': { c: 'mixed' } would match with:

filter: { a\.b.c: mixed }

Array-to-array filters

We have existing syntax for checking if a single value matches one of a list of values, e.g.:

Does value x match a, b, or c?

filter: { x: [a, b, c] }

As part of this new support for "complex" property values that may be arrays, we need to expand this syntax. This branch adds two keywords for this, by extending the same filter object pattern we use for min/max range filter syntax:

To check if an array a contains one or more of the values p, q, r:

filter: { a: { includes: [p, q, r] } }

Note: while it would be possible to check the type value of a to see if it's an array, and apply this logic without additional syntax, I was weary of introducing an additional run-time check that can occur on many many features (vs. the other additions in this branch, which only occur once per filter, lazily when the filter is first parsed/built), especially for a case that is only needed for a small subset of feature properties. On the other hand, jsperf says my browser can perform 500 million array type checks per second, so this may be unfounded.

There are also cases where you'd like to test that an array contains all of a list of values. An includes_all variant was added for this. For instance, to check if an array a contains the values p, q, AND r:

filter: { a: { includes_all: [p, q, r] } }

One could check for multiple array values with multiple filters, but supporting direct syntax for them would be much more efficient than this, which will execute 3 separate array tests, e.g.:

filter:
  all:
    - a: { includes: p }
    - a: { includes: q }
    - a: { includes: r }

Note: includes (implicit "or"/"one of", like existing single-value-to-array filters) vs. includes_all doesn't feel entirely right, but I couldn't come up with something I liked better, suggestions welcome!

matteblair · 2019-05-02T18:08:14Z

I like the property accesors! Should be both more performant and easier to use for authoring filters. I'd suggest includes_any instead of includes just to make the meaning more obvious.

matteblair · 2019-05-02T18:26:24Z

An alternative way of accessing nested properties in YAML would be to use arrays for the filter keys, like:

filter: { [a, b, c]: test }

But I don't think this works in the JS YAML model - plus it's not very natural to read.

The \. escaping just seems a little awkward so I'm trying to think of other syntax options. Hopefully it should be very rare that we encounter data with property keys including ., so it's probably not worth worrying about.

nvkelso · 2019-05-02T18:51:45Z

I dig it, and @matteblair suggestion for includes_any.

filter: { d.e.f: escaped }

For the complicated property names, could we allow that be be quoted like "d.e.f", too?

nvkelso · 2019-05-02T19:00:19Z

@bcamper can you add an example on how to access the multishields in the all_shields properties in Tilezen's GeoJSON output, which are arrays (so first element in the array, or for each element in the array)?

Will Tangram abstract that GeoJSON and MVT store it in different ways, but I think Tangram will expand the MVT shorthand to the GeoJSON internally?

bcamper · 2019-05-02T19:01:54Z

Agree on includes_any, will update.

Yep, property names can be quoted or unquoted (e.g. d.e.f or "d.e.f"), as per standard YAML syntax (it's just personal preference).

@matteblair you're following my thinking too -- I considered an array syntax, but you're correct JS YAML parser doesn't support it, and then it seemed just as complicated or more to parse it out manually from a string key. And this type of dot notation would be valid JS. Agree \. is a bit awkward, but I concluded it was common enough as a string escaping pattern in other cases. It could also use JS-style bracket notation like a[b][c], but this similarly felt a little more involved to parse and still requires some form of escaping (to support [ and ]) if you're really trying to be complete.

bcamper · 2019-05-02T19:07:08Z

@nvkelso can you provide an example tile/location where that property shows up? Not finding it in a quick search.

…ains some or all of a list of values

nvkelso · 2019-05-06T22:32:07Z

"all_networks":["US:US","US:CA"],"all_shield_texts":["101","128"]

12-648-1568.topojson.json.zip

bcamper · 2019-05-07T15:31:24Z

@nvkelso

"all_networks":["US:US","US:CA"],"all_shield_texts":["101","128"]

Can you elaborate on how you think we would want to filter on and use these array fields? For example, are the index positions 0, 1, etc. going to reliably reflect a fixed network "level" (national highway, state highway, etc.)? It seems like we couldn't necessarily guarantee that for all data?

It's easy enough to match on the values, like "does all_networks contain US:CA", but I don't have an immediate answer for how we would then match that index to all_shield_texts to get the desired shield text out (this could be done in a JS function, but we are first seeing what/if we can create native syntax support). We could put the whole object into one array, e.g. something like:

[{ network: 'US:US', text: '101' }, { network: 'US:CA', text: '128' }]

But that also doesn't fit simple filter matching syntax (it's more like the complex destructuring matching offered JS and other languages... which is cool but likely overkill here!); so we'd probably want a JS filter function for it as currently stored.

Let's figure out how we want to style them, then determine what additional scene syntax and/or data format changes are needed (since they aren't in use anyway, right?).

@matteblair other thoughts?

nvkelso · 2019-05-07T16:12:33Z

Mostly I'm interested in multishield support, for roads, walking paths, bicycle, and transit. As long as this gets us part way there, and it's possible to step all the way there with an extension then I'm good.

matteblair · 2019-05-07T17:27:01Z

I don't know the full range of data for shields, but for the case above it we might be able to use an object property like:

"networks": { "US:US": "101", "US:CA": "128" }

Then do filtering and styling like:

filter: { networks.US:US : true }
draw:
  text:
    text_source: networks.US:US

This implies that property accessors work in styling expressions as well as filtering expressions - not sure if that was planned or not.
Encoding this way is also not very compact if we're stringifying the whole JSON object :\

nvkelso · 2019-05-07T21:24:27Z

😂 I think this is taking a complex turn when I'm just interested in multi-shields ;)

bcamper · 2019-05-07T22:47:28Z

It's the tangram way. I feel comfortable we can prototype multi-shields with this data and some of the new syntax, and then go from there.

…

On Tue, May 7, 2019 at 5:24 PM Nathaniel V. KELSO ***@***.***> wrote: 😂 I think this is taking a complex turn when I'm just interested in multi-shields ;) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#715 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAECXMUZACAUKRZ7CUAPVDPUHXQZANCNFSM4HHMPEXA> .

matteblair · 2019-05-08T02:15:05Z

@nvkelso I don't think I understand what you mean when you say "multi-shields". Can you explain?

nvkelso · 2019-05-08T02:54:49Z

The section of road given as an example above has data to support 2 shields, a US route and a California route. On many maps both routes would have a shield drawn, so 2 shields total in this example. Often this is zoom 13+ as mid zooms can only accommodate 1 shield for density issues. But both routes will be signed in real life, on the ground.

…

On Tue, May 7, 2019 at 19:15 Matt Blair ***@***.***> wrote: @nvkelso <https://github.com/nvkelso> I don't think I understand what you mean when you say "multi-shields". Can you explain? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#715 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAGQIO4U55TSXTY5JEAXJBDPUIZSVANCNFSM4HHMPEXA> .

bcamper added this to the v0.19.0 milestone Apr 21, 2019

matteblair mentioned this pull request Apr 23, 2019

Add support for complex properties from MVT formats tangrams/tangram-es#1618

Open

bcamper added 2 commits April 30, 2019 12:08

optionally parse some or all MVT feature properties as JSON

176927c

only check if property values look like JSON for parse_json: true

2f8f269

assume specified property names can be parsed when using `parse_json: []` form

bcamper force-pushed the mvt-json-decode branch from 46ca815 to 2f8f269 Compare April 30, 2019 16:10

bcamper added 2 commits April 30, 2019 21:37

enable filters to access nested feature properties with dot notation

4d86068

use '\.' to escape dot notation on feature properties in filters

5f13a07

(unescaped '.' will be interpreted as a nested feature property)

bcamper requested review from matteblair and nvkelso May 1, 2019 15:18

matteblair approved these changes May 2, 2019

View reviewed changes

bcamper added 2 commits May 3, 2019 13:11

add filter syntax to match if a feature property that's an array cont…

c552ce9

…ains some or all of a list of values

includes -> includes_any for new filter syntax for array values

cc378cb

bcamper force-pushed the mvt-json-decode branch from 0a5cbd2 to cc378cb Compare May 3, 2019 17:13

bcamper merged commit ef88117 into master May 30, 2019

bcamper deleted the mvt-json-decode branch May 30, 2019 12:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support to parse some or all MVT feature properties as JSON #715

Add support to parse some or all MVT feature properties as JSON #715

bcamper commented Apr 21, 2019

bcamper commented Apr 21, 2019

nvkelso commented Apr 22, 2019

bcamper commented Apr 22, 2019

bcamper commented Apr 22, 2019

matteblair commented Apr 22, 2019

bcamper commented Apr 23, 2019

matteblair commented Apr 23, 2019

matteblair commented Apr 23, 2019

bcamper commented Apr 23, 2019

bcamper commented May 1, 2019

matteblair commented May 2, 2019

matteblair commented May 2, 2019

nvkelso commented May 2, 2019

nvkelso commented May 2, 2019

bcamper commented May 2, 2019 •

edited

Loading

bcamper commented May 2, 2019

nvkelso commented May 6, 2019

bcamper commented May 7, 2019 •

edited

Loading

nvkelso commented May 7, 2019

matteblair commented May 7, 2019

nvkelso commented May 7, 2019

bcamper commented May 7, 2019 via email

matteblair commented May 8, 2019

nvkelso commented May 8, 2019 via email

Add support to parse some or all MVT feature properties as JSON #715

Add support to parse some or all MVT feature properties as JSON #715

Conversation

bcamper commented Apr 21, 2019

bcamper commented Apr 21, 2019

nvkelso commented Apr 22, 2019

bcamper commented Apr 22, 2019

bcamper commented Apr 22, 2019

matteblair commented Apr 22, 2019

bcamper commented Apr 23, 2019

matteblair commented Apr 23, 2019

matteblair commented Apr 23, 2019

bcamper commented Apr 23, 2019

bcamper commented May 1, 2019

Nested properties in filters

Array-to-array filters

matteblair commented May 2, 2019

matteblair commented May 2, 2019

nvkelso commented May 2, 2019

nvkelso commented May 2, 2019

bcamper commented May 2, 2019 • edited Loading

bcamper commented May 2, 2019

nvkelso commented May 6, 2019

bcamper commented May 7, 2019 • edited Loading

nvkelso commented May 7, 2019

matteblair commented May 7, 2019

nvkelso commented May 7, 2019

bcamper commented May 7, 2019 via email

matteblair commented May 8, 2019

nvkelso commented May 8, 2019 via email

bcamper commented May 2, 2019 •

edited

Loading

bcamper commented May 7, 2019 •

edited

Loading