-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
German postcodes via geo search #110
base: master
Are you sure you want to change the base?
German postcodes via geo search #110
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this suggestion. It looks like a very promising direction, and would definitely be useful. (For example, in order to improve the search by german postcodes in Qwant Maps!)
I made a few comments, mostly about the integration in the existing implementation. Depending on which kind of zones should be affected by this change, the postcodes lookup may need to be moved further in the generation steps, after the zone_type
has been computed. But overall your approach looks good to me.
I would agree to make this an optional feature, possibly enabled by default. In many countries little data will be affected, but it's not clear for me yet if this additional step could cause a significant performance hit for large input files.
src/zone_ext.rs
Outdated
.split(';') | ||
.filter(|s| !s.is_empty()) | ||
.map(|s| s.to_string()) | ||
.sorted() | ||
.collect(); | ||
if let Some(boundary) = boundary.as_ref() { | ||
if let Some(bbox) = bbox { | ||
if (zip_codes.is_empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should all zones get their postcodes completed here ?
For large zones (countries, states, and possibly any zone with type larger than ZoneType::City
) this lookup could be expensive and of little use. The problem is the zone_type
is still unknown at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @amatissart :-)
So far, I thought that if an object which already has zip_codes (the 3rd if condition), we will not add additional ones.
The checks above are only to make the Rust compiler happy 👍 - I'd be glad to do this differently :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah. now I get what you mean I guess. I'll think through this some more.
src/zone_ext.rs
Outdated
.filter(|postcode_bbox| { | ||
//info!(" - Candidate Postcode: {:?}", postcode_bbox.get_postcode().zipcode); | ||
|
||
let overlap_between_postcode_and_area = BooleanOp::intersection(boundary, postcode_bbox.get_postcode().get_boundary()); | ||
|
||
// anteil überlappender Bereiches / Postcode: "Wieviel % des Postcodes sind von dieser Fläche befüllt" | ||
let overlap_percentage_relative_to_postcode = overlap_between_postcode_and_area.unsigned_area() / postcode_bbox.get_postcode().unsigned_area(); | ||
|
||
//info!(" CHOSEN {} {:?}", overlap_percentage_relative_to_postcode, overlap_percentage_relative_to_postcode > 0.05); | ||
// at least 5% des Postcodes müssen in der genannten Fläche liegen | ||
overlap_percentage_relative_to_postcode > 0.05 | ||
|
||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly is the justification for this check ? Should we expect that some boundary=postal_code
relations do not represent their exact coverage because of approximate boundaries ? Or do you chose to focus on the main postcodes for each zone, and deliberately ignore the ones that cover only a small part of the area ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey :)
- only checking the bounding boxes (as is done by the RTree) is not enough to determine whether postcodes match or not.
- Thus, I first did a "contains" check between postcode exact coordinates and the underlying geometry.
- However, this check had the problem that sometimes, even if the postcode and the shape shared only a single boundary line the contains check returned
true
. - Thus, I came up with this "heuristic"; which seems to improve real-world results quite a bit.
Maybe we make the heuristic opt-in, or we make the percentage configurable?
All the best,
Sebastian
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of contains
check do you refer to exactly ?
We had similar issues when implementing the hierarchy lookup among administrative areas. Unfortunately, the geo
crate does not provide extensive algorithms to implement such checks natively yet. That's why Zone
implements a .contains()
method that relies on the .covers()
method provided by the GEOS library. It's supposed to handle such edge cases nicely even when geometries share boundaries. So if that's not what you've already tried, implementing a similar .contains()
method for Postcode
could be worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coool, I'll try this :) I've checked the contains method of the simple geo
Crate types.
Hey @amatissart , thank you so much for your detailed review ❤️
Could you point me to the place where zone_type is computed? I'm afraid I don't understand the overall structure well enough to move that code block around :-) Thanks and all the best, |
Hey @amatissart , PS: I just figured out that my changes greatly slow down the system on big instances :-( I guess I somehow need to parallelize the work. I've seen that the hierarchy building in Cosmogony is using all 32 cores which I have. Can you give me a pointer how you exploit this parallelism? (A short link to docs or source code would be awesome). Thanks so much ❤️ , |
I must say that this structure of this file
To parallelize the process, Cosmogony is using |
@amatissart Hey, Thank you so much 👍 👍 Your pointers are greatly appreciated - I'll try to implement this as soon as possible (but might take 2-3 weeks). BTW - if I can somewhen return the favor of your guidance, let me know ;-) I am a Neos CMS core dev, so if you happen to have a problem in the Open Source Content Management area anytime soon, let me know ;-) All the best, |
Hey @amatissart , A quick status update: I fixed a few of the issues; the main issue with performance still being a problem; and I still need to move postcode generation around to a later stage. I'll do that in the upcoming days. All the best, |
Hi, and thanks for your contribution, it looks indeed very promising! About the performance issues I suppose updating to a simpler "contains" mechanism could be helpful? I am probably wrong but I was wondering if we could get less candidates from the Rtree if we queried for zones that contain an arbitrary point inside of the envelope instead of zones that intersect with its bbox? As you are new to Rust I'll slip a few comments about the ecosystem that could be helpful:
|
Hey @remi-dupre , ah, is the rtree capable of doing this; or would I need to iterate over all points of my feature here? IMHO the problem might be that if you have the following situation, we would not find the postcode correctly: /**
*/ Or am I missing anything? 😄 Thank you so much for your comments, keep them coming - they are extremely helpful for me 👍 👍 I'll try that out ASAP! All the best, |
Oh your right, that's where I was wrong: I thought the inclusion was checked in the other direction :/ Sorry about that ! |
No need to be sorry ❤️ Thanks for your work and your comments! All the best, |
@remi-dupre @amatissart btw - which IDE are you using? I'm currently trying out IntelliJ IDEA with Rust support, but I am not yet fully sure on it... |
Current status: I think we are getting closer to a possible release. Still missing are the CLI flag to enable/disable the algorithm; and maybe the other "contains" algorithm to speed up the process some more. :) |
I personally use Vim with the Coc plugin. I think people typically use visual code (
That's very cool, thanks again for your energy! |
Hey, All the best, |
Hey everybody,
I've tried to include german Postcodes in the database, which means we cannot read the post_code attribute, but instead have to read an OSM boundary relation: https://wiki.openstreetmap.org/wiki/DE:Relation:boundary - so I've implemented a geo overlap search to determine the postcodes.
So far the code seems to work, but is quite ugly (I am just learning rust).
I want to polish this PR for a merge-able state, but I'd need your feedback on the following things:
Thanks for your work,
Sebastian