Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lighten the MLJ dependency? #1

Closed
ablaom opened this issue Nov 30, 2023 · 2 comments
Closed

Lighten the MLJ dependency? #1

ablaom opened this issue Nov 30, 2023 · 2 comments

Comments

@ablaom
Copy link

ablaom commented Nov 30, 2023

Just noticed that you have MLJ as a dep here. Depending on your objectives, you may be able to lighten that. MLJ itself just imports a bunch of components. So, for example, maybe you just need MLJBase and StatisticalMeasures.

Here is what the various components do:

help?> MLJ
search: MLJ MLJType MLJFlow MLJOpenML MLJ_VERSION MLJIteration multitarget_l2

  MLJ

  MLJ (https://alan-turing-institute.github.io/MLJ.jl/dev/) is a Machine
  Learning toolbox for Julia. It collects together functionality from the
  following packages, which can be loaded separately:

    •  MLJBase.jl: The machine interface, tools to partition and unpack
       datasets, evaluate/evaluate! for model performance, |> pipeline
       syntax, TransformedTargetModel wrapper, general model composition
       syntax (learning networks), synthetic data generators, scitype and
       schema methods (from ScientificTypes.jl) for checking how MLJ
       interprets your data

    •  StatisticalMeasures.jl: MLJ-compatible measures (metrics) for
       machine learning, confusion matrices, ROC curves.

    •  MLJModels.jl: Common transformers for data preprocessing,
       searching the model registry, loading models with @load

    •  MLJTuning.jl: Hyperparameter optimization via TunedModel wrapper

    •  MLJIteration.jl: IteratedModel Wrapper for controlling iterative
       models

    •  MLJEnsembles.jl: Homogeneous model ensembling, via the
       EnsembleModel wrapper

    •  MLJBalancing.jl: Incorporation of oversampling/undersampling
       methods in pipelines, via the BalancedModel wrapper

    •  OpenML.jl: Tool for grabbing datasets from OpenML.org

If you only need a few 3rd party models, you can load them manually (see below) and not need the @load convenience loader from MLJModels:

julia> import MLJDecisionTreeInterface.DecisionTreeClassifier
julia> Tree = MLJDecisionTreeInterface.DecisionTreeClassifier
julia> tree = Tree()
@ablaom ablaom changed the title Perhaps you can lighten the MLJ dependency Lighten the MLJ dependency? Nov 30, 2023
@tiemvanderdeure
Copy link
Owner

Thanks for chipping in @ablaom!

I think you are right and we can easily get away with using just a few parts of the MLJ ecosystem.

Good to see it is so easy to get rid of @load.

There are just 5 or maybe 6 classifiers that are commonly used in species distribution modelling. We want to make it very straightforward for people to find the models they need, with the settings and names similar to what people are used to from similar packages in R. One possibility is to just add them as dependencies, we also discussed having something like a load_recommended() function.

In any case, being able to build this on top of MLJ is really convenient as it will be super easy to add more models.

@tiemvanderdeure
Copy link
Owner

Solved by #4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants