Switch from flatgeobuffer to geomedea for GTFS #8

dabreegster · 2024-07-18T16:59:52Z

CC @michaelkirk, I'm trying out geomedea for the use case I described in Discord!

Metric	flatgeobuffer	geomedea
File size	99MB	53MB
Bristol	3.6MB in 23 requests	5MB in 20 requests
Elephant & Castle	6.4MB in 935 requests, 1.76 minutes	9.4MB in 24 requests, 8.3s

Bristol doesn't have many GTFS trip shapes intersecting the area, while E&C in London has loads.

Unless I'm measuring something wrong, the current approach with geomedea incurs more bandwidth, but through way less requests and latency.

dabreegster · 2024-07-18T17:03:26Z

backend/src/gtfs/gmd.rs

+                    .collect(),
+            ));
+            let mut props = Properties::empty();
+            // TODO bincode or something else?


The size difference is probably coming from here, I need to rethink how to encode this. Most of the size comes from a bunch of chrono::NaiveTimes right now, which get encoded in a pretty naive way

Is variant some pre-encoding of all your properties into a single byte stream?

Internally geomedea is using bincode for encoding property and geometry, and then each page is zstd compressed.

Varints probably make sense for property data, but probably not currently for geometry data, until/unless I also implement delta encoding for geometries. I'd like to implement delta encoding but it'll require reworking some API internals.

I'm using serde_json::to_vec on

// Per stop, (original ID and name) pub stop_info: Vec<(orig_ids::StopID, String)>, // Each one has an arrival time per stop pub trips: Vec<Vec<NaiveTime>>, // Metadata pub route: Route, }

The space is dominated by the times. Dropping some precision and some delta encoding would make tons of sense there.

I'll also play with using PropertyValue::Vec of some integers manually, instead of this.

https://docs.rs/chrono/latest/src/chrono/naive/time/serde.rs.html#5

OK, NaiveTime gets serde-ified as a string right now, that's amazing. Switching to something more appropriate...

michaelkirk · 2024-07-18T17:22:28Z

backend/Cargo.toml


 [target.'cfg(not(target_arch = "wasm32"))'.dependencies]
+geomedea = { git = "https://github.com/michaelkirk/geomedea" }


btw this should compile in wasm now.

In WASM, I want to disable the writer feature, but otherwise enable it. I fiddled around with specifying features based on architecture and landed here as something that works, but I'll take another look if there's a way to be more clear here...

michaelkirk · 2024-07-18T17:31:08Z

Unless I'm measuring something wrong, the current approach with geomedea incurs more bandwidth, but through way less requests and latency.

It's not entirely surprising that geomedea might request more data.

In FGB, there is a single buffer of uncompressed features. In FGB, since there is no compression, the index tells us exactly where each feature is in the file. Using this I implemented smart feature batching, so feature requests will only merge adjacent features into a single request if they are "close enough".

To take advantage of compression, geomedea groups features into pages, so you have to download an entire page even if you only need one feature in the page. Because geomedea's features are in compressed pages, request batching would be a little different. It can still be done, but I guess it'd be "page batches" rather than "feature batches". I haven't implemented this yet, but it should be doable in a non-breaking way.

michaelkirk · 2024-07-18T18:19:56Z

Could you do me a favor?

RUST_LOG=debug

And give me the lines matching: Finished using an HTTP client. used_bytes

e.g.:

Finished using an HTTP client. used_bytes=839712, wasted_bytes=293690, req_count=4
Finished using an HTTP client. used_bytes=17, wasted_bytes=0, req_count=1

wasted_bytes should correspond to the bytes that could be gained by having more clever page-batching.

michaelkirk · 2024-07-18T22:53:29Z

wasted_bytes should correspond to the bytes that could be gained by having more clever page-batching.

I had a go at "more clever page-batching" here:
michaelkirk/geomedea#12

michaelkirk · 2024-07-19T00:02:27Z

I was looking at the network traffic for your existing FGB integration - and I feel like there must be a bug in the FGB client. It makes no sense for all those small nearby requests (4 bytes?!).

I'm looking into that now.

dabreegster · 2024-07-19T11:21:31Z

After updating to the latest 417d4f43cd35aa98aea19a0b17632c8309b50466:

Bristol reads 2.6MB over 25 requests, total 3.6s (time measured from a perfectly fast localhost -- I could also try yocalhost or on a real wifi connection to cloudflare or something)
- I see 3 logs: Finished using an HTTP client. used_bytes=156716, wasted_bytes=0, req_count=5
- Finished using an HTTP client. used_bytes=3717536, wasted_bytes=1570173, req_count=19
- Finished using an HTTP client. used_bytes=17, wasted_bytes=0, req_count=1
Elephant & Castle reads 6.5MB also over exactly 25 requests
- Finished using an HTTP client. used_bytes=352492, wasted_bytes=0, req_count=10
- Finished using an HTTP client. used_bytes=5062757, wasted_bytes=2614402, req_count=37
- Finished using an HTTP client. used_bytes=17, wasted_bytes=0, req_count=1

These two cases are now competitive with fgb, so I'm almost definitely going to switch to this. :)

dabreegster · 2024-07-19T12:09:23Z

With the new property encoding...

Elephant reads 6.3MB over 23 requests.
Finished using an HTTP client. used_bytes=156716, wasted_bytes=0, req_count=5
Finished using an HTTP client. used_bytes=3776001, wasted_bytes=1490472, req_count=17
Finished using an HTTP client. used_bytes=17, wasted_bytes=0, req_count=1

Bristol reads 2.5MB over 25 requests
Finished using an HTTP client. used_bytes=144956, wasted_bytes=1344, req_count=9
Finished using an HTTP client. used_bytes=1078601, wasted_bytes=473493, req_count=15
Finished using an HTTP client. used_bytes=17, wasted_bytes=0, req_count=1

So the new encoding is not giving that huge of an advantage, but still opens the way to doing something nicer later with delta encoding.

I'm going to merge this in now and continue to play with encoding / perf later on. It's a huge improvement with low work, so thanks so much for the new format, adding WASM support, and these page batching fixes!

michaelkirk · 2024-07-23T19:14:15Z

Here's Elephant & Castle with flatgeobuf/flatgeobuf#376

tldr; there was a bad bug in the http fetch implementation, triggered by those 1.05MB requests. It hadn't came up in the shape of my own data and requests, so thanks for helping to uncover it.

With the bug fix, the two formats seem to be in the same ballpark of network transfer for your queries.

edit for completeness, here's the same with geomedea (one more request, 15% less bytes transferred):

Switch from flatgeobuffer to geomedea for GTFS

04a4357

dabreegster commented Jul 18, 2024

View reviewed changes

michaelkirk reviewed Jul 18, 2024

View reviewed changes

Update geomedea

e3864b4

dabreegster mentioned this pull request Jul 19, 2024

Performance and optimizing bandwidth acteng/will-it-fit#8

Open

3 tasks

Manually encode RouteVariants in properties

59c0b45

Fix URL

d0275d8

dabreegster merged commit 0df81d1 into main Jul 19, 2024

dabreegster deleted the geomedea branch July 19, 2024 12:17

michaelkirk mentioned this pull request Jul 23, 2024

Fix too-small request sizing after making a large request. flatgeobuf/flatgeobuf#376

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch from flatgeobuffer to geomedea for GTFS #8

Switch from flatgeobuffer to geomedea for GTFS #8

dabreegster commented Jul 18, 2024 •

edited

Loading

dabreegster Jul 18, 2024

michaelkirk Jul 18, 2024

dabreegster Jul 19, 2024 •

edited

Loading

dabreegster Jul 19, 2024

michaelkirk Jul 18, 2024

dabreegster Jul 19, 2024

michaelkirk commented Jul 18, 2024 •

edited

Loading

michaelkirk commented Jul 18, 2024

michaelkirk commented Jul 18, 2024

michaelkirk commented Jul 19, 2024 •

edited

Loading

dabreegster commented Jul 19, 2024 •

edited

Loading

dabreegster commented Jul 19, 2024

michaelkirk commented Jul 23, 2024 •

edited

Loading


		[target.'cfg(not(target_arch = "wasm32"))'.dependencies]
		geomedea = { git = "https://github.com/michaelkirk/geomedea" }

Switch from flatgeobuffer to geomedea for GTFS #8

Switch from flatgeobuffer to geomedea for GTFS #8

Conversation

dabreegster commented Jul 18, 2024 • edited Loading

dabreegster Jul 18, 2024

Choose a reason for hiding this comment

michaelkirk Jul 18, 2024

Choose a reason for hiding this comment

dabreegster Jul 19, 2024 • edited Loading

Choose a reason for hiding this comment

dabreegster Jul 19, 2024

Choose a reason for hiding this comment

michaelkirk Jul 18, 2024

Choose a reason for hiding this comment

dabreegster Jul 19, 2024

Choose a reason for hiding this comment

michaelkirk commented Jul 18, 2024 • edited Loading

michaelkirk commented Jul 18, 2024

michaelkirk commented Jul 18, 2024

michaelkirk commented Jul 19, 2024 • edited Loading

dabreegster commented Jul 19, 2024 • edited Loading

dabreegster commented Jul 19, 2024

michaelkirk commented Jul 23, 2024 • edited Loading

dabreegster commented Jul 18, 2024 •

edited

Loading

dabreegster Jul 19, 2024 •

edited

Loading

michaelkirk commented Jul 18, 2024 •

edited

Loading

michaelkirk commented Jul 19, 2024 •

edited

Loading

dabreegster commented Jul 19, 2024 •

edited

Loading

michaelkirk commented Jul 23, 2024 •

edited

Loading