-
-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite tile expiry (management of expired tiles) #709
Comments
Something that'd be really nice to have would be an osm2pgsql that didn't run out of memory with relatively high values of EXPIRY_MAXZOOM set in openstreetmap-tiles-update-expire script. I particularly noticed this when using zoom levels > 20 but from looking at the expiry code (and then backing away very quickly) I suspect it could be made much more efficient at lower max zooms too. See links from https://wiki.openstreetmap.org/wiki/User:SomeoneElse/mod_tile_28 for more info. |
If only one zoom level is requested for expiry, then the behavior is the same. For multiple level, people using scripts for the combined tiles will suddenly do a lot of unnecessary expiries. So it will increase the load but won't break anything. You mentioned an increase in size by factor 20. For how many zoom levels was that and what are the absolute file sizes in the result. An increase from 100 kB to 2MB probably doesn't matter. However, going from 100MB to 2GB is a problem. @SomeoneElseOSM The number of tiles grows with exponentially with each zoom level. I don't think there is much we can do about expiry at zoom level 28. |
I've never used the expiry feature myself, so it is difficult to judge the impact of such a change. Could people who are using expiry comment here how they use it? What zoom parameters do you use and at what intervals do you update your database? |
The OSMF servers don't use osm2pgsql expiry, but instead have expire.rb. I'm not sure which method it is closer to. render_expired is another way, written by @woodpeck which consumes a tile list. I'm not sure if it recurses up/down or not. WMF uses tilerator in pyramid mode which parses the osm2pgsql output but requires just a single zoom level (i.e. osm2pgsql is run with @zerebubuth, can you add some comment on how mapzen is using the osm2pgsql tile expiry? tilezen/vector-datasource#118 points at https://github.com/mapzen/vagrant-tiles but it's deprecated, so I'm not sure what is currently being done. |
@Nakaner if I'm reading your description right, the two methods are the same for outputting just a single zoom, right? |
@giggls wrote me that the German tile server uses following: MINZOOM=13
osm2pgsql ... -e $(($MINZOOM-3))-16 -o $EXPIRELOG
..
expiremeta.pl --map=mapnikde --minzoom=$MINZOOM <$EXPIRELOG expiremeta.pl is part of tirex. |
@pnorman wrote: Yes, they are not almost the same. The screen shot shows the expired tiles from the same dataset as used above on zoom level 12. Red tiles were written to the output file by both implementations. Blue tiles were only written by my implementation. Both implementations used the same code to detect which tiles should be marked. The database was set back to the state before the application of the diff, i.e. testing conditions were equal. |
That looks like a difference in the selection of tiles (out of scope), not subtiles. |
Mapzen has a process which takes the list of expired tiles which |
Using expiremeta.pl is not set in stone. With the current code my feeling is that way to many tiles are scheduled for re-rendering. |
Nathaniel here from Mapzen. It's always nice to see old code given some love, thanks for your proposal and PR :) Mapzen currently use a pegged version of slightly older osm2pgsql for Tilezen in our production vector tile service. We don't have issues with the expiry file getting larger (it is just on one machine as a previous commenter pointed out so no network issues) – but I'm curious about the resulting list of changed files. |
@nvkelso wrote:
The format is the same. The only important difference is relevant if you produce tile expiry lists for multiple zoom levels. If you produce tile expiry lists for zoom levels 10 to 11, the tile expiry will only output tile 10/612/321 if all its four subtiles (11/1224/642, 11/1224/643, 11/1225/642, 11/1225/643) have changed. If there only changes in 11/1224/642, only this tile will be written to the output file. If you produce to the tile expiry list for zoom levels 10 to 12 and there are only changes in 12/2449/1285 but no other sub-sub-tiles of 10/612/321, only 12/2449/1285 will be written to the output file. My code will output 10/612/321 and 11/1224/642 if you requested the zoom levels 10 to 11 and the only map change happened in 12/2449/1285. If you requested zoom level 10 to 12 and the only map change happened in 12/2449/1285, it will output 10/612/321, 11/1224/642 and 12/2449/1285. Use the Tile Calculator of Geofabrik, if you need a map showing you the tile IDs on top of the map. Have you had a look at the shape files (and the raw tile expiry lists) I linked in my initial posting? They are the source of the images I posted.
Yes.
It seems that the current implementation of a tree drops some tiles. I did not investigate further because it is hidden in the old, ugly implementation. Red tiles were written to the output file by both implementations. Blue tiles were only written by my implementation. |
I notice that the blue tiles appear to be twice as large as the red tiles, suggesting that they're at one zoom level lower. Is that the case? Has the output changed if only a single zoom level is requested (e.g: What confused me is this exchange:
If the two methods are not (almost) the same for outputting just a single zoom, then I think I'd want to understand the difference in the selection of tiles (more tiles, fewer tiles, by what proportion?) and the reason why before I updated to the new expiry algorithm. Not that a change is necessarily bad, and I don't think it should hold up this issue being resolved. I think it would be really useful to have a thorough description of what's changed and what differences users can expect in the changelog / release notes. |
@zerebubuth wrote:
The old tree implementation silently drops some tiles (the blue ones) at any point in the code. I don't know why it does so. The unordered set does not drop anything. Dropping the tiles is not documented, neither in the user documentation nor in the source code. I don't know if it is a bug or a feature. I myself expected the tile expiry to behave as my implementation does.
Pull request #747 does not change the selection of the tiles based on the geometry of the changed features in the database. That is still done by the legacy code ( Following text could be used for the release notes:
license of the text and the image: CC-0 |
@Nakaner Very helpful README.md text and image, thank you! :) |
@pnorman wrote: Well, half of the work is still left. I am rewriting the part which determines which tiles intersect with a geometry. Work is done on branch tile-selection-rewrite. The branch might look stale but I continued work on it on Friday but its not in a state which can be pushed. (Some unit tests are missing, some are failing due to bugs in the implementation) |
I pushed my work to my fork repository a few minutes ago. Feedback to the code is appreciated. Performance has not been tested yet. |
My code has no special handling for boundary relations. They are treated as polygons. If the bounding box of a polygon is larger than the maximum bounding box, it is treated as a multilinestring and its outer and inner rings will be expired. This means that there is no change in behaviour compared to the old implementation. If someone wants to treat polygons always as linestrings, he just has to set the maximum bounding box size to 0. |
I am currently testing my new code (branch tile-selection-rewrite in my fork of this repository) which selects which tiles should be marked as expired. The branch has a brand new selection algorithm for tile expiry. Polygons will no longer be expired by their bounding box. The algorithm tries some approximation (but still uses bounding boxes in some kind). You won't see any differences on medium zoom levels (10 to 14) but higher zoom levels show a small difference. z16: 5584865 (master) vs. 5168071 (my branch), i.e. -7.5% The maximum bounding box size was set to 20,000 (projected Mercator units). A first test pointed out that my branch takes about 25 % longer to apply a daily diff of Europe but I am not sure if there was anything else running on the machine which disturbed my measurements. Therefore, I will repeat the measurements and report my results here. An example, zoom level 16, changes between 2018-02-04 20:00 UTC and 2018-02-05 20:00 UTC, blue is new, grey is old: After looking at the images, I got the impression that some performance improvements might help:
If we ignore relations, we will get much shorter expiry lists. I calculated expiry lists with a self-written PostgreSQL importer over a year ago. The following images are from my master thesis. One shows the expiries (summed up over more than a month of hourly diffs) without relations, one with relations. Please note that the algorithm being used there did only expire the perimeter of polygons, not the whole area. |
This this mean that the current algorithm expires something like the blue box and the new one would do something the green box, assuming the black/pink shape is a new polygon? |
The perimeter is generally safe. Not doing them at all might be a problem.
This would be a problem in many cases. For example, if the ref, network, or other properties used for rendering change, then the tiles containing the linestring of the relation would need to be dirtied. |
@pnorman wrote:
Yes, that's almost correct. Here is an improved drawing of your example (it is difficult to distinguish the green and the blue line in the lower right corner). More details can be found in the comments in the header and source files.
Some map styles don't use route relations at all. I would add a command line option and let the users decide (good explanation and documentation is necessary of course). |
Here are the performance measurements. I used the same machine as for the performance tests of my first pull request. The database was imported using The commands in the order I performed them. There was a break of more than a day between the second and the third test. After each test, I shut down the PostgreSQL daemon using Systemd, restored a backup of the tablespace and restarted the daemon again. I also restored the flatnodes file.
|
I am currently rewriting the tile expiry of osm2pgsql because its code is not nice, difficult to understand and it seems to me that it does not what it is expected to do. As part of my master thesis I wrote a database import tool which has a tile expiry. @lonvia asked me a few months ago to port its tile expiry to osm2pgsql.
The work is done in my fork of this repository in the
expire-as-set
branch.I have already replaced the self-made tree structure by an unordered set of
uint64_t
.Because the new tile expiry is not fully backward-compatible, I'm opening this issue. The format of the written expiry file is the same – one tile per line, each line follows the pattern
zoom/x/y
. In difference to the old code, my code writes out all tiles on all requested zoom levels if something in the tile[1] has changed. However, the old tile expiry writes out a tile on a zoom level if all its subtiles are expired or it is a tile on the maximum zoom level.If you convert the expiry log files into shape files and visualize them using QGIS, it will look like this (area fill is semi-transparent):
My code would generate an expiry list which will look like this (rendered without an area fill):
EDIT: The lower image does not show all zoom levels because otherwise the image would only contain black lines and it would be almost opaque.
The raw data of the images can be downloaded here.
I can see only one advantage in this behaviour – reducing the size of the file. But this behaviour is not documented anywhere and there is a disadvantage: Every program which reads the tile expiry list, has to derive the list of subtiles on its own. I think that the size of the file does not matter that much because you usually don't transfer it over a network.
[1] i.e. the method
expire_tiles::expire_tile(uint32_t x, uint32_t y)
has been called. The selection of tiles is also handled by my rewrite but out of scope of this issue.The text was updated successfully, but these errors were encountered: