Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geoserver importer fails to import GeoJSON with properties containing string arrays #323

Open
giohappy opened this issue Jan 25, 2022 · 3 comments

Comments

@giohappy
Copy link

giohappy commented Jan 25, 2022

The importer raises an exception when trying to import a GeoJSON with properties containing arrays of strings, as in this example.
The GeoJSON specification (RFC7947) says that any valid JSON object is supported as property values, and the JSON data interchange format (RFC7159) includes arrays.

Here below the relevant section of the error log.
Apparently the importer detects a mix of strings and arrays, but I couldn't spot this mixture in the test file.

2022-01-25 10:06:45,597 ERROR [geoserver.rest] - Found conflicting types ArrayList and String for property lines
java.lang.IllegalStateException: Found conflicting types ArrayList and String for property lines
	at org.geotools.geojson.feature.FeatureTypeHandler.primitive(FeatureTypeHandler.java:160)
	at org.json.simple.parser.JSONParser.parse(JSONParser.java:471)
	at org.json.simple.parser.JSONParser.parse(JSONParser.java:312)
	at org.geotools.geojson.GeoJSONUtil.parse(GeoJSONUtil.java:280)
	at org.geotools.geojson.feature.FeatureJSON.readFeatureCollectionSchema(FeatureJSON.java:503)
	at org.geoserver.importer.format.GeoJSONFormat.task(GeoJSONFormat.java:152)
	at org.geoserver.importer.format.GeoJSONFormat.list(GeoJSONFormat.java:142)
	at org.geoserver.importer.Importer.createTasks(Importer.java:687)
	at org.geoserver.importer.Importer.initForDirectory(Importer.java:584)
	at org.geoserver.importer.Importer.addTasks(Importer.java:501)
	at org.geoserver.importer.Importer.update(Importer.java:474)
	at org.geoserver.importer.rest.ImportTaskController.acceptData(ImportTaskController.java:294)
	at org.geoserver.importer.rest.ImportTaskController.taskPost(ImportTaskController.java:194)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
@giohappy giohappy added the bug label Apr 12, 2022
@simboss simboss added needs estimate A plan needs to be proposed for the fix or ehancement adn estimated GeoNode labels Apr 12, 2022
@aaime
Copy link
Member

aaime commented Apr 13, 2022

The topic is ... complicated, there are multiple issues from different angles.

First one, you cite the GeoJSON example as a "valid GeoJSON file" according to spec. GeoTools will not be able to handle a lot of valid GeoJSON files, as they are completely schemaless, change shape for each feature, be arbitrarily nested and so on.
Generally speaking, you can expect a GeoJSON to be parsed if it fits a "shapefile data model", fixed structure, and simple attributes.

Now, can we handle arrays? GeoTools has some growing support for them, as values in a simple feature.
The importer is failing to do so, while using the gt-geojson unsupported module to parse the GeoJSON.
That module has been abandoned for years, and in the last few months, a migration has begun to switch from gt-geojson to a new gt-geojsonstore module that Ian Turton wrote (which is also in unsupported, mind!).

So, on 2.21.x, we have the importer using gt-geojsonstore, and on 2.19.x, we have it use gt-geojson. Two separate code bases.
I've tried both:

  • On 2.21.x, the GeoJSON is successfully parsed, but fails to import in PostGIS, as there is not enough type information for the datastore to create an array column (the GeoJSON parse comes out with a generic java.util.List, instead of a String[]) and the store just does not know how to deal with an array column in "create table" anyways (can read and write on tables with arrays, but cannot yet create one).
  • On 2.19.x, the GeoJSON does not parse correctly due to a bug, but if it did, it would still have the same problem as 2.21.x, it would not import in PostGIS anyways.

I guess we want to have a fix for 2.19.x, but also have it working in the future, right?
An assumption: the array will contain uniform types of data (if it starts with a String, it's going to be a string till the end, if it's a number, same, numbers till the end of the array, and in all array values of the same attribute, across the features of the same collection).

Based on the assumption are some guess-stimates:

  • Fix parsing of GeoJSON with arrays in gt-geojson, with backports down to GeoTools 25.x (used by GS 2.19.x): 1 day
  • Change the parsing result from List to String[], so that we have enough type information, in both gt-geojson and gt-geojsonstore (with backports): 0.5 days
  • Make gt-jdbc-postgis handle correctly the creation of tables with array columns, with backports: 1.5 days.
  • Also fix postgis store type cache corruption reported here: 0.5 day.

So a grand total of 3.5 days. I know it's a lot, but we have a library switch in the middle and some basic infrastructure that's missing and needs to be built.
Also, the estimates are valid for someone that is familiar with both gt-geojson and gt-geojsonstore, and with array handling in postgis (I'm afraid it's just me...). Pad generously if you plan to assign to someone else.

@aaime aaime removed the needs estimate A plan needs to be proposed for the fix or ehancement adn estimated label Apr 13, 2022
@aaime
Copy link
Member

aaime commented Apr 15, 2022

After some consideration, tightened the estimates a bit (down to 3.5 days)

@giohappy
Copy link
Author

I don't know if we want to keep this open, beyond the original requirement for GeoNode.
In GeoNode we're adopting a different approach for GeoJSON (and other formats), so it doesn't need this fix anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants