Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SEDONA-323] Add keplergl wrapper #898

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
16f39f5
Update apache sedona version to 1.4.1
iGN5117 Jul 4, 2023
c132894
Refactored notebook imports and added unified SedonaContext entry points
iGN5117 Jul 4, 2023
ff9238c
Replaced geopandas plot with KeplerGL visualization.
iGN5117 Jul 4, 2023
38ae4bd
Merge branch 'master' into develop_Nilesh_1.5.0_NotebookVisualization
iGN5117 Jul 4, 2023
40135b3
Add keplerGL to pipfile
iGN5117 Jul 5, 2023
c7a62a2
Merge branch 'develop_Nilesh_1.5.0_NotebookVisualization' of https://…
iGN5117 Jul 5, 2023
6629120
Merge branch 'master' into develop_Nilesh_1.5.0_NotebookVisualization
iGN5117 Jul 5, 2023
3095099
try adding dependencies in pipfile
iGN5117 Jul 6, 2023
10adc55
Merge branch 'develop_Nilesh_1.5.0_NotebookVisualization' of https://…
iGN5117 Jul 6, 2023
088aaf4
Add keplergl import
iGN5117 Jul 6, 2023
8e22147
Add env.yml for binder (copied from leafmap)
iGN5117 Jul 7, 2023
d9d3cbb
Force version 3.6.4 of jupyter lab in pipfile
iGN5117 Jul 11, 2023
196ac82
revert adding environment.yml
iGN5117 Jul 11, 2023
0f0015c
add sedona.maps with SedonaKepler, a wrapper for KeplerGl visualizati…
iGN5117 Jul 11, 2023
92578dc
Merge branch 'sedona-master' into develop_Nilesh_1.5.0_NotebookVisual…
iGN5117 Jul 11, 2023
bb46de5
Refactor SedonaKepler wrapper to take SedonaDataFrames as data input
iGN5117 Jul 12, 2023
349c855
moved sedonakepler tests to a separate folder
iGN5117 Jul 13, 2023
a566ee0
Add apache license header to newly created py files
iGN5117 Jul 13, 2023
f7964ba
refactor private method signature to be in line with naming convention
iGN5117 Jul 13, 2023
5786f2a
Added comment explaining usage of _repr_html()
iGN5117 Jul 13, 2023
4ce9f22
Update documentation
iGN5117 Jul 14, 2023
084e0dd
fix typo
iGN5117 Jul 14, 2023
b4295e2
Add copies of datasets to get around keplergl bug
iGN5117 Jul 14, 2023
2e822bf
Refactor SedonaKepler constructor to add support for pandas with pyth…
iGN5117 Jul 17, 2023
bea9973
Missed pushing gif
iGN5117 Jul 18, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
346 changes: 327 additions & 19 deletions binder/ApacheSedonaSQL_SpatialJoin_AirportsPerCountry.ipynb

Large diffs are not rendered by default.

Binary file added docs/image/sedona_customization.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
75 changes: 75 additions & 0 deletions docs/tutorial/sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -505,6 +505,81 @@ The details of a join query is available here [Join query](../api/sql/Optimizer.

There are lots of other functions can be combined with these queries. Please read [SedonaSQL functions](../api/sql/Function.md) and [SedonaSQL aggregate functions](../api/sql/AggregateFunction.md).

## Visualize query results

==Sedona >= 1.5.0==


Spatial query results can be visualized in Jupyter lab/notebook using SedonaKepler.

SedonaKepler exposes APIs to create interactive and customizable map visualizations using [KeplerGl](https://kepler.gl/).

### Creating a map object using SedonaKepler.create_map

SedonaKepler exposes a create_map API with the following signature:

```python
create_map(df: SedonaDataFrame=None, name: str='unnamed', geometry_col: str='geometry', config: dict=None) -> map
```

The parameter 'name' is used to associate the passed SedonaDataFrame in the map object and any config applied to the map is linked to this name. It is recommended you pass a unique identifier to the dataframe here.

The parameter 'geometry_col' is used to identify the geometry containing column. This is required if the column has a name other than the standard 'geometry'.

!!!Note
Failure to pass the correct geometry column name (if it has a name other than 'geometry') will result in a failure to create a map object.

If no SedonaDataFrame object is passed, an empty map (with config applied if passed) is returned. A SedonaDataFrame can be added later using the method `add_df`

A map config can be passed optionally to apply pre-apply customizations to the map.

!!!Note
The map config references every customization with the name assigned to the SedonaDataFrame being displayed, if there is a mismatch in the name, the config will not be applied to the map object.


!!! abstract "Example usage (Referenced from Sedona Jupyter examples)"

=== "Python"
```python
map = SedonaKepler.create_map(df=groupedresult, name="AirportCount", geometry_col="country_geom")
map
```

### Adding SedonaDataFrame to a map object using SedonaKepler.add_df
SedonaKepler exposes a add_df API with the following signature:

```python
add_df(map, df: SedonaDataFrame, name: str='unnamed', geometry_col='geometry')
```

This API can be used to add a SedonaDataFrame to an already created map object. The map object passed is directly mutated and nothing is returned.

The parameters name and geometry_col have the same conditions as 'create_map'

!!!Tip
This method can be used to add multiple dataframes to a map object to be able to visualize them together.

!!! abstract "Example usage (Referenced from Sedona Jupyter examples)"
=== "Python"
```python
map = SedonaKepler.create_map()
SedonaKepler.add_df(map, groupedresult, name="AirportCount", geometry_col="country_geom")
map
```

### Setting a config via the map
A map rendered by accessing the map object created by SedonaKepler includes a config panel which can be used to customize the map

<img src="../../image/sedona_customization.gif" width="1000">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iGN5117 This doc references to this image but this image was not committed together with this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that, pushed the gif



### Saving and setting config

A map object's current config can be accessed by accessing its 'config' attribute like `map.config`. This config can be saved for future use or use across notebooks if the exact same map is to be rendered everytime.

!!!Note
The map config references each applied customization with the name given to the dataframe and hence will work only on maps with the same name of dataframe supplied.
For more details refer to keplerGl documentation [here](https://docs.kepler.gl/docs/keplergl-jupyter#6.-match-config-with-data)
## Save to permanent storage

To save a Spatial DataFrame to some permanent storage such as Hive tables and HDFS, you can simply convert each geometry in the Geometry type column back to a plain String and save the plain DataFrame to wherever you want.
Expand Down
1 change: 1 addition & 0 deletions python/Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ geopandas="<=0.10.2"
pyspark=">=2.3.0"
attrs="*"
pyarrow="*"
keplergl = "==0.3.2"

[requires]
python_version = "3.7"
69 changes: 69 additions & 0 deletions python/sedona/maps/SedonaKepler.py
jiayuasu marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

from keplergl import KeplerGl
import geopandas as gpd
Kontinuation marked this conversation as resolved.
Show resolved Hide resolved


class SedonaKepler:

@classmethod
def create_map(cls, df=None, name="unnamed", geometry_col="geometry", config=None):
"""
Creates a map visualization using kepler, optionally taking a sedona dataFrame as data input
:param df: [Optional] SedonaDataFrame to plot on the map
:param name: [Optional] Name to be associated with the given dataframe, if a df is passed with no name, a default name of 'unnamed' is set for it.
:param geometry_col: [Optional] Custom name of geometry column in the sedona data frame,
if no name is provided, it is assumed that the column has the default name 'geometry'.
:param config: [Optional] A map config to be applied to the rendered map
:return: A map object
"""
kepler_map = KeplerGl()
if df is not None:
SedonaKepler.add_df(kepler_map, df, name, geometry_col)

if config is not None:
kepler_map.config = config

return kepler_map

@classmethod
def add_df(cls, kepler_map, df, name="unnamed", geometry_col="geometry"):
"""
Adds a SedonaDataFrame to a given map object.
:param kepler_map: Map object to add SedonaDataFrame to
:param df: SedonaDataFrame to add
:param name: [Optional] Name to assign to the dataframe, default name assigned is 'unnamed'
:param geometry_col: [Optional] Custom name of geometry_column if any, if no name is provided, a default name of 'geometry' is assumed.
:return: Does not return anything, adds df directly to the given map object
"""
geo_df = SedonaKepler._convert_to_gdf(df, geometry_col)
kepler_map.add_data(geo_df, name=name)

@classmethod
def _convert_to_gdf(cls, df, geometry_col="geometry"):
"""
Converts a SedonaDataFrame to a GeoPandasDataFrame and also renames geometry column to a standard name of 'geometry'
:param df: SedonaDataFrame to convert
:param geometry_col: [Optional]
:return:
"""
pandas_df = df.toPandas()
geo_df = gpd.GeoDataFrame(pandas_df, geometry=geometry_col)
if geometry_col != "geometry":
geo_df = geo_df.rename(columns={geometry_col: "geometry"})
return geo_df
16 changes: 16 additions & 0 deletions python/sedona/maps/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
3 changes: 2 additions & 1 deletion python/sedona/spark/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,4 +38,5 @@
from sedona.utils import KryoSerializer
from sedona.utils import SedonaKryoRegistrator
from sedona.register import SedonaRegistrator
from sedona.spark.SedonaContext import SedonaContext
from sedona.spark.SedonaContext import SedonaContext
from sedona.maps.SedonaKepler import SedonaKepler
16 changes: 16 additions & 0 deletions python/tests/maps/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
Loading
Loading