-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Values in registry that should be numbers/integers are indexed as strings in JSON API response #153
Comments
@tloubrieu-jpl could I bug you for more details?
|
Example document/fields {
"pds:Axis_Array/pds:sequence_number": [
"1",
"2"
],
"em16_tgo_cas:Detector_Information/em16_tgo_cas:pixel_width": [
"10.0"
]
} |
Of note is the fact that this document may not have had Simplest way to confirm is probably just to write a test and run it against various versions (current, and all pre-versioning versions) of the repairkit sweeper. If it isn't due to a bug in |
@alexdunnjpl 👍 . not sure where this bug is. that being said, there is definitely some funkiness happening across the schemas based upon this warning I have seen from our EN cross-cluster search: This is from the Index Patterns page where it is trying to bring all the indexes fields together, and it is colliding with their types. I imagine something went weird here with past versions of Harvest (hopefully not the current), and we just need to do a scrub of the types. |
@tloubrieu-jpl @jordanpadams I'm softly skeptical that this is new/erroneous behaviour and that OpenSearch was intended to use float/int types when storing numerically-defined fields.
@jordanpadams if you're seeing collisions, does that suggest that some field is differently-typed across different nodes? If so, that would suggest that at some point, harvest changed from (I assume) varied field types to Thoughts? |
Hi @alexdunnjpl , Thanks for looking at that. Then I removed these tests when I transitioned the integration test to the I&T team so that we re-start from something less chaotic than how the test suite had turned in. So I am not sure when the number stoped being numbers. So you are saying that none of the values are stored as numbers in OpenSearch. Does the schema confirms that ? Is the current version of harvest pushing the documents with number converted as strings ? Thanks |
Sorry I think @alexdunnjpl you already answered my question on harvest not converting values to string. |
This ElasticSearch issue seems to indicate that while numeric values are stored as-is in It seems fraught to intentionally write numerics to keyword-typed fields, on that basis alone. @tloubrieu-jpl going the other direction, could you give an example of a node, document and numeric field which is in OpenSearch as a numeric type rather than a string? Regarding the query in the tests, I'm not familiar with the internals of our query-language parsing, but I'd expect that OpenSearch will happily take a numeric type in a query and cast it when doing a comparison, if necessary. I'll test that assumption now. |
Confirmed: OS will accept a numeric to compare with a So it looks like there are two potential issues:
Looking at the "cart:Equirectangular/cart:latitude_of_projection_origin": {
"type": "double"
},
"cart:Equirectangular/cart:longitude_of_central_meridian": {
"type": "double"
},
"cart:Equirectangular/cart:standard_parallel_1": {
"type": "double"
},
"cart:Geo_Transformation/cart:upperleft_corner_x": {
"type": "double"
},
"cart:Geo_Transformation/cart:upperleft_corner_y": {
"type": "double"
},
"cart:Geodetic_Model/cart:a_axis_radius": {
"type": "double"
},
"cart:Geodetic_Model/cart:b_axis_radius": {
"type": "double"
},
"cart:Geodetic_Model/cart:c_axis_radius": {
"type": "double"
}, but the field I'm using as a test example is keyword: "em16_tgo_cas:Detector_Information/em16_tgo_cas:pixel_width": {
"type": "keyword"
}, so either:
in either case, seems necessary to
|
Suggest moving this ticket to |
@alexdunnjpl @jordanpadams I moved the ticket to the harvest repository, from what was said before I suggest 2 actions:
|
The properties which we expect to be number are:
As seen in request https://pds.nasa.gov/api/search/1/products |
@tloubrieu-jpl @jordanpadams I'm trying to inspect the relevant index, but I'm not having much luck tracking it down. The first product returned by that request is Is there a way to quickly determine which OS node is serving a particular document? |
@alexdunnjpl I have no idea what is going on here... Will try out all the registries. not sure how this is possible |
It looks like the collection is there, but the product is not... |
@alexdunnjpl actually I lied.
|
Sorry @jordanpadams - didn't mean to send you off to do the thing I was lazily avoiding! So the issue with the "missing" property mappings was that I forgot to convert them The good-ish news is that those are all mapped as |
More good-ish news. Searches appear to respect the mapping type, i.e. when performing comparisons the values aren't compared lexicographically and therefore aren't returning erroneous results. ex. this query {
"query": {
"bool": {
"must": [
{
"range": {
"pds:File/pds:records": {
"gte": 0,
"lte": 35
}
}
}
]
}
},
"_source": {
"includes": ["pds:File/pds:records"]
},
"size": 3,
"track_total_hits": true
} returns the following (note presence of value {
"took": 4,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 150,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "registry",
"_type": "_doc",
"_id": "urn:nasa:pds:compil-comet:nuc_properties:extinct::1.0",
"_score": 1.0,
"_source": {
"pds:File/pds:records": [
"35"
]
}
},
{
"_index": "registry",
"_type": "_doc",
"_id": "urn:nasa:pds:gbo-kpno:mosaic-9p:reduced_2005_july0_conv_stand_july04_tab::1.0",
"_score": 1.0,
"_source": {
"pds:File/pds:records": [
"7"
]
}
},
{
"_index": "registry",
"_type": "_doc",
"_id": "urn:nasa:pds:gbo-kpno:mosaic-9p:reduced_200_standards_centers_july04_tab::1.0",
"_score": 1.0,
"_source": {
"pds:File/pds:records": [
"31"
]
}
}
]
}
} |
Testing via the API appears to indicate that comparisons are being correctly performed:
Note that records @tloubrieu-jpl @jordanpadams with this in mind, what disconnect, if any, still exists between current and desired behaviour, and is it a regression, or a new feature (if that's known for certain)? |
@alexdunnjpl, @tloubrieu-jpl may have a specific instance where this was breaking, but, in general, I think this is very tightly coupled with NASA-PDS/registry#230. For this particular case, it may have been something we broke in the schema during re-indexing or repair kit or ???, but, in general, I think we may need a way to validate our OpenSearch schema types match the expectations of our released information model schemas, and repair those schema fields where there is a mismatch. I will try to poke through Kibana to identify where OpenSearch sees a mismatch in schema types across all the clusters. |
Roger that - thanks Jordan! |
@alexdunnjpl per my comment on our Sprint Tag, it doesn't look like there is a way for me to look this up. I just see the screenshot from above that there is something out there. Is there any way we can just develop a generic sweeper that goes through fields and checks their type matches what we have in the schema types? |
@jordanpadams sure can! You just want detection/report, or with automatic sync and re-index? What should the source of truth be for mapping fields onto their correct opensearch index mapping type? |
@alexdunnjpl the source of truth is the JSON files online (e.g. this one). see Harvest for how it gets those files. I'm thinking this ticket may actually be 2 parts:
|
@alexdunnjpl @tloubrieu-jpl have we figured out what is going on here? are we good to close this out or is there still work to be done? |
@jordanpadams implementation of your two items in the previous comment are still outstanding. We can close this ticket iff tickets exist for those two items, otherwise this should stay open. Probably best to split into new tickets since 1 will take much less time than 2. |
@jordanpadams the registry-mgr |
@alexdunnjpl correct. I don't think this is really a registry-mgr or harvest thing to date. They both work with LDDs and trying to strongly type things in the schema, but nothing specifically dealing with manipulating or reviewing the schema after the fact |
created NASA-PDS/registry-sweepers#128 and NASA-PDS/registry-sweepers#129. closing this one. |
💡 Description
⚔️ Parent Epic / Related Tickets
No response
The text was updated successfully, but these errors were encountered: