Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"pmtiles merge" CLI command? #403

Open
mgibbs189 opened this issue Jun 6, 2024 · 13 comments
Open

"pmtiles merge" CLI command? #403

mgibbs189 opened this issue Jun 6, 2024 · 13 comments

Comments

@mgibbs189
Copy link

I have a 120GB raster dataset, split into 2600 geotiffs. Extracting a .pmtiles from it hasn't been easy 🙃

Would you consider adding a pmtiles merge CLI command?

See below for the different approaches I've tried, and how a pmtiles merge command could help.

Option 1:
gdal_merge of the geotiffs, then gdal_translate to .mbtiles

[-] Single-threaded
[-] painfully slow
[-] gdal requires ludicrous amount of free space (7+ TB)

Option 2:
Convert each geotiff to its own .mbtiles file, then merge the .mbtiles files together

[+] multi-threaded, minus the merging
[-] tippecanoe's tile-join doesn't support rasters

Option 3:
gdal2tiles -> directory of .pngs, then mb-util to generate merged .mbtiles, then .pmtiles

[+] png creation multi-threaded
[-] creating millions of pngs is torture on the filesystem
[-] slow

Option 4 (proposed):
Convert each geotiff into its own .mbtiles file, then pmtiles convert for each, then pmtiles merge to stitch all the pmtiles together

[+] multi-threaded
[+] fast(?)

@bdon
Copy link
Member

bdon commented Jun 8, 2024

Option 1:
gdal requires ludicrous amount of free space (7+ TB)

Does this also happen if you create a .vrt using gdalbuildvrt instead of using gdal_merge?

Then gdal_translate to pmtiles if you're on 3.8+: https://gdal.org/drivers/vector/pmtiles.html

@mgibbs189
Copy link
Author

@bdon here's the best approach I've found thus far:

gdalbuildvrt -o worldcover_v1.vrt Extracted/*.tif
gdal_translate -of MBTiles -co RESAMPLING=NEAREST worldcover_v1.vrt worldcover.mbtiles
pmtiles convert worldcover.mbtiles worldcover.pmtiles --tmpdir=./tmp/

re: gdal's pmtiles driver (vectors only?):

This driver supports reading and writing PMTiles datasets containing vector tiles, encoded in the MapVector Tiles (MVT) format.

The issue is that gdal_translate is single-threaded. The above process takes 4 days for a 120GB dataset.

If I could process each .tif separately (.tif -> .mbtiles -> .pmtiles), then merge them all together, that could potentially save a TON of time. Your pmtiles CLI commands seem to be way more optimized than gdal.

@bdon
Copy link
Member

bdon commented Jun 8, 2024

The issue is that gdal_translate is single-threaded. The above process takes 4 days for a 120GB dataset.

Maybe https://github.com/mapbox/rio-mbtiles is a multithreaded option? I've been exploring a derivative of that that can generate PMTiles directly.

@mgibbs189
Copy link
Author

mgibbs189 commented Jun 9, 2024

The rio-mbtiles command is suited for small to medium (~1 GB) raster sources.

It seems like every solution thus far has caveats.

rio-mbtiles doesn't help much here, because I could just run gdal_translate -of MBTiles individually on each tif file, then run pmtiles convert individually on each mbtiles file to get a bunch of pmtiles files.

The problem is still the same... squashing all the pmtiles into one.

Let's assume the simplest use-case (merging pmtiles with no overlaps). What does that process look like to build out pmtiles merge? Is it technically complex? Could I maybe sponsor it?

@bdon
Copy link
Member

bdon commented Jun 11, 2024

Let's assume the simplest use-case (merging pmtiles with no overlaps)

In most cases "no overlaps" is not actually feasible. If you have 2600 GeoTIFFs, and each one is being tiled to zoom 0, then all the resulting .pmtiles will have data at z=0, meaning pmtiles merge is not enough, we will also need to do image operations to mosaic tiles. z=0 is an extreme example but it's pretty likely that there is some spatial overlap between lower zooms, unless your use case is specific enough that there's really no overlap.

@bdon
Copy link
Member

bdon commented Jun 11, 2024

^ My solution to the above is that any correct solution needs to start with a .vrt and GDAL handles the raster mosaicing prior to tiling. I suggested rio-mbtiles because that has parallelism, and if we identify the limits on raster sources we ought to be able to work around them. One solution may be to use gdaladdo on the VRT prior to tiling.

@mgibbs189
Copy link
Author

I'll give rio-mbtiles another try and will report back.

Thanks for the clarification re: pmtiles merge.

@mgibbs189
Copy link
Author

@bdon For the 120GB dataset, rio mbtiles is showing 900 hours to finish... on a monster machine with 30 cores processing it.

Again, I love PMTiles but this hasn't been a great experience 🙃

The fastest approach I've found thus far is to create the millions of tiles via gdal2tiles.py (bleh), then mb-util to clump them together, then finally pmtiles convert.

Really, really hoping that you could give us raster users some better tooling, whether it involves pmtiles merge, some custom rio extension, etc.

@bdon
Copy link
Member

bdon commented Jun 12, 2024

Again, I love PMTiles but this hasn't been a great experience

The problems you are encountering seem to be related to your specific challenging dataset and tiling it, regardless of tile archive format. Could you make your dataset and current tooling available so other can pitch in to help?

@mgibbs189
Copy link
Author

It's the ESA WorldCover dataset: https://worldcover2021.esa.int/

The best approach thus far:

  1. gdalbuildvrt to generate a .vrt from the 2600+ extracted .tifs
  2. gdal2tiles.py to create Tiles directory from the .vrt (millions of png tiles)
  3. mb-util to build an .mbtiles file from the Tiles directory
  4. pmtiles convert to input the .mbtiles file and output a .pmtiles file

Every other solution I've tried has either failed (see the first comment in this thread), required absurd resources, or would've taken many days (weeks) to complete.

@bdon maybe you could give it a try as a proof-of-concept 😇

@larsmaxfield
Copy link
Contributor

larsmaxfield commented Aug 2, 2024

Have you looked into using libvips to join the TIFFs and then generate a tileset? Something like this:

  • libvips with arrayjoin to merge the TIFFs, then with dzsave format=google to create a zyx tileset directory
  • pmtiles with disk_to_pmtiles to convert that zyx directory to PMTiles

libvips works as a pipeline. It's very quick and does not require keeping images in memory when manipulating them. There's also a pyvips binding. I use that.

@mgibbs189
Copy link
Author

@larsmaxfield thanks for the suggestion. That's very similar to Option 3 above (I'm hoping to avoid the tileset generation step). There's got to be a better way around it, but the likelihood of a pmtiles merge command isn't looking promising 😬

@bdon
Copy link
Member

bdon commented Aug 3, 2024

If you want the ESA WorldCover as tiles you can modify this script https://github.com/OvertureMaps/overture-tiles/blob/main/profiles/Base.java to convert Overture GeoParquet's ESA Worldcover dataset into tiles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants