Skip to content

spatialx-project/docker-spark-geolake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Spark + Geolake Quickstart Image

This is a dokcer environment to quickly get up a geolake service with

  • a JDBC catalog (Postgres)
  • a computing engine(Spark 3.3)
  • a storage backend(local file system)

Geolake is built on top of Apache Iceberg, this project was also borrowed from docker-spark-iceberg.

Start

docker compose up

Open http://localhost:8888 in browser, there are 2 notebooks inside :

  • geolake-scala-demo: shows an example of how to use geolake scala api to read/write data from/to geolake.

  • benchmark-portotaxi: this notebook runs a benchmark on portotaxi dataset which has 1.7M records. You will see the reading/writing performance of the 3 Parquet format(GeoLake Parqeut, GeoParquet, GeoParquet(bbox)) for spatial data. You will also see how the partition reolution parameter affects the performance.

Ports 4041, 4042 and 4043 are also forwarded, so you can access Spark Web UI if necessary.

About

Docker environment to quickly get up a geolake service

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published