-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seems like a lot of duplicate effort #12
Comments
It's a fair point, but I think it extends to more than just those two packages. SpeciesDistributionToolkit.jl implements a bunch of features that are already in
Basically to make a package that makes it easy to actually fit ensemble models, and evaluates them, and projects them, similar to e.g. biomod2 in R. The most obvious way to do this (it seems to me) is to depend on MLJ. I needed that functionality for my own research and figured I might as well make a package. SDT and related packages offer some tools to load and manipulate spatial and occurrence data, but don't actually fit models (and explicitly say that fitting models is not the goal). The reason I started it as a separate package is that I would much rather depend on packages like Rasters.jl and MLJBase.jl with a lot of uses and users, instead of packages like SimpleSDMLayers.jl. That said, I can see I ended up adding some tools here (e.g. spatial thinning) that could also fit in SDT. Does that make sense? |
Sure, if it makes things easier for your research you should use whatever gets the job done most efficiently---that makes sense. That being said, over the next couple years of my postdoc fellowship, a major goal is to add interfaces to MLJ (and other computer vision specific tools) in SDT, likely building on the API in the (very recently released) SDeMO.jl subpackage of SDT.
Happy to accept any contributions (and spatial thinning is a great one) |
I didn't know about SDeMO.jl and I'm happy to hear that you will add interfaces to MLJ. I have tried as much as possible to fit new functionality into existing packages, so this package can be as small as possible (and to avoid siloing and duplicate efforts). The MLJ ecosystem already has a lot of really useful functionality. An example is a whole package that deals with confusion matrices and performance measures: https://github.com/JuliaAI/StatisticalMeasures.jl, which I can see SDeMO.jl re-implements from scratch. I can see a use case for a very lightweight package with a few functions that really are specific to SDMs (and that both this package and SDT could depend on). But I can't think of very many functions that wouldn't fit into some other already existing package. |
I would love to offer things like BIOCLIM, which I can see you have implemented. But that would be much easier if it were registered as a standalone package (ideally with an interface to MLJ - but I understand if that's not a priority) so it would be much more lightweight. I implemented Maxnet for instance, but as a completely separate package: https://github.com/tiemvanderdeure/maxnet.jl |
I would just like to point out that @tiemvanderdeure is a serial contributor to Rasters.jl and the wider geospatial ecosystem, RasterDataSources.jl and the MLJ.jl ecosystem specifically due to this package intentionally being a small component of a wider functioning ecosystem. Its not really fair to say what this package should be without understanding the context. See: Maxnet.jl was specifically written for this package to build on, but you can use it too. To get faster point extraction here, Tiem and I fixed To me it seems that in the long term the approach taken here is better for science in Julia than the approach of SpeciesDistributionToolkit.jl (and involves much less duplication of effort in total). In Julia we are limited by a small community but our strength is the ability to reuse code across domains by building small, modular components that leverage other packages. See https://www.youtube.com/watch?v=kc9HwsxE1OY |
I think we're speaking past each other a bit here. All of the subpackages within SDT.jl are registered separately (though I get why having a The reason for combining each package into the meta-package SDT.jl (like Tidier.jl) is that SDT.jl is not designed to be used by seasoned Julia programmers, it is largely for teaching purposes. Most people who work with SDMs do not know (or perhaps have never even heard of!) Julia. SDT is meant as an 'all-in-one' introduction to the language for the median SDM practitioner that exclusively works in R. People who know Julia will know how to install the specific sub-packages they need for a given purpose.
I agree the modular approach is generally better for scientific software and the Julia ecosystem as a whole. Still, I've found that teaching people to use these tools requires meeting them halfway, which is why I think convenient (though inefficient) meta-packages like SDT.jl will bring more users to the language.
Agreed. It'd be nice to have a common API for operations on |
Rasters.jl and SpeciesDistributionModels.jl are also not designed to be used by seasoned programmers. They are pretty easy to use these days. To smooth the transition from other languages we are also part of cross-language initiatives, like this work-in-progress book: https://geocompx.org/jl
I'm not sure. The thing that almost always brings people to Julia is a paradigm shift in scale or performance To focus on the topic of the issue: what I look for to reduce duplication of effort is a focus on the key underlying tools we share and are important in the wider ecosystem. Like fixing bugs and performance in MLJ.jl and other stats packages, or helping us on the geospatial packages you rely on in SpeciesDistributionToolkit.jl - that we currently maintain with a few others in our spare time. Making high level packages like SDM/SDT work together is harder and has lower returns. |
I know most people that use SDMs use R. And I really hope to be able to contribute to changing that. I just don't think making a whole ecosystem of packages that is completely parallel to other Julia packages is the way to go, though. It doesn't take me long to find code in SDT.jl to broadcast over rasters, and mask them, and to read in climate data, while Rasters and RasterDataSources already does that. I think that's a shame, both because it's a huge amount of duplicate effort to write and maintain, and because it's never going to compete with Rasters.jl in terms of speed (or indeed user-friendliness), becuase Rasters.jl has 1000s of people using it and maybe dozens contributing. So to go back to your question of what this package should do that SDT.jl doesn't do: I wanted to make a package that works with the existing ecosystem of spatial data and machine learning packages. I knew that SDT.jl existed, but didn't see it as an option because it exists within a parallel universe. |
There is a pretty robust codebase in https://github.com/PoisotLab/SpeciesDistributionToolkit.jl that already does many of the stated goals of this package. What functionality are you interested in that isn't in SDT? We could always work to add that functionality to SDT rather than duplicate a bunch of effort...
The text was updated successfully, but these errors were encountered: