Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: region simplification for data size reduction #3

Open
zverok opened this issue Feb 1, 2016 · 8 comments
Open

Proposal: region simplification for data size reduction #3

zverok opened this issue Feb 1, 2016 · 8 comments

Comments

@zverok
Copy link
Owner

zverok commented Feb 1, 2016

This is proposal awaiting for someones ideas/verification.

The idea:

  • 80 MiB of data is rather bad for gem;
  • this can be fixed by simplifying regions (reducing number of points);
  • ...which has a drawback of reduced precision near the borders;
  • ...which, for borders-with-sea, can be fixed by "if no region succeeds, use the nearest";
  • ...but can't be fixed for borders between states.

So, if somebody could look at simplify branch and provide some opinions, it would be really cool.

For example, some amount of simplification can be seen at here -- it is world map simplification factor 0.1 -- you can experiment with factor at script/simplify.rb.

@jotolo
Copy link
Contributor

jotolo commented Feb 13, 2019

@zverok What happened with the simplify branch?
It's good to go, right?

@bf4
Copy link

bf4 commented Oct 19, 2020

Another idea is to break the gem into a meta-gem that has all the logic and let's you specify via other ways which data to release and package. like maybe only pull in the 'data' dir if there's no file in some known location. or tar gz each file in the data folder when packaging to rubygems and only untar a given file when needed etc?

@zverok
Copy link
Owner Author

zverok commented Dec 5, 2020

@jotolo (Sorry for the late response... Well, like, almost 2 years late, I am not sure you are still interested) It turns out that, say, for Europe (lot of complicated borders) simplified data is absolutely unacceptable, as it might miss large cities and even whole countries.

@bf4 Actually, it seems for me that some much more effective encoding (than just dumb JSON) is possible and necessary for any serious usage, but it seems almost nobody uses the gem, so... I am not actually have much incentive to work on its optimization.

@trevorturk
Copy link

@zverok I've been using the gem happily in the past, but went to geonames.org API to keep my memory use on Heroku down. I came back to look at the options and I think yours is still a good option -- thank you! I noticed no Ruby option is listed here: https://github.com/evansiroky/timezone-boundary-builder#lookup-libraries -- do you think your gem is still reasonable to use, or perhaps there's a different recommended way now? If you're interested please drop me a line and I'd be happy to sponsor some development as I did have good luck using your gem in the past!

@zverok
Copy link
Owner Author

zverok commented May 2, 2021

@trevorturk Honestly, I don't know a proper answer to this. I am trying to maintain the library (it is simple enough for it), and, like, today I pushed the 0.0.6 (updating data to 2020d... released last November), and it works properly under all supported Ruby versions.

Other than that— I don't know. The task of making it more effective (both by performance and memory) is interesting enough, but I have a long list of projects I am interested in, and typically I work on projects/ideas related to each other. wheretz was created during the work on reality, which is now dormant (because I became disenchanted in the idea of a set of open real-world data libraries in Ruby).

So, as you might see, it doesn't come down to a sponsorship (I have a pretty well-paying job, and all of my side projects are just for the fun/the cause I believe at the time).

Honestly, if somebody finds it useful and this is still the best option for Ruby, it could be a cause good enough to trying to make it better... But I am not sure how much resource it will require.

@trevorturk
Copy link

No worries at all, everything makes sense. Do you mind if I submit a PR to add yours to the list here? https://github.com/evansiroky/timezone-boundary-builder#lookup-libraries -- I just wasn't sure if you decided not to maintain the library anymore. In any case, thank you, it's been a great resource! 😎👍

@terryyin
Copy link

terryyin commented May 5, 2021

BTW, since 0.0.6 it became much slower.

@zverok
Copy link
Owner Author

zverok commented May 5, 2021

@terryyin I'll try to look into it on the weekend. But code haven't changed at all, only data updated, so it could be something to do with data quality (more detailed borders with more points). If you can provide some particular request examples that became slower, it would be very helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants