Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with tribuo as a transitive dependency #1

Closed
kirahowe opened this issue Jun 12, 2024 · 8 comments
Closed

Problem with tribuo as a transitive dependency #1

kirahowe opened this issue Jun 12, 2024 · 8 comments

Comments

@kirahowe
Copy link
Member

When scicloj.ml.tribuo is included as a dependency in another project, starting a REPL fails with the following error:

Could not start nREPL server: Error building classpath. Could not find artifact org.tribuo:tribuo-all:jar:4.2.0 in central (https://repo1.maven.org/maven2/)

I believe this is due to the lack of support for BOM deps in tools.deps. I don't know if leiningen also has this issue. One possible workaround (insofar as this counts as one at all..) would just be to update the readme of this project to mention that one has to include tribuo-all explicitly in their own deps.edn for this library to work.

An actual solution would be to track down all of the components of tribuo-all that are (or could possibly be) used give the supported tribuo features in this library and list them all explicitly as deps.

I believe it would be worth at least doing the readme update, since the error message is somewhat misleading if you don't already know what the underlying issue is.

@behrica
Copy link
Member

behrica commented Jun 12, 2024

I fully agree that we need docu.
This is complicated by the fact that the correct dependencies to add, depend on the model used in practice in the model specification:

:tribuo-components [{:name "trainer"
                                 :type "org.tribuo.classification.dtree.CARTClassificationTrainer"}]
                         :tribuo-trainer-name "trainer"}))

Each model type needs a different dependency.

@behrica
Copy link
Member

behrica commented Jun 12, 2024

I suggest to identify and add "fixed" the core dependencies, for sure.

@behrica
Copy link
Member

behrica commented Jun 12, 2024

The imports are done by tech.ml.dataset.tribuo

[org.tribuo.classification Label LabelFactory]
           [org.tribuo DataSource Output OutputFactory Trainer Model MutableDataset
            Prediction]
           [org.tribuo.impl ArrayExample]
           [org.tribuo.provenance SimpleDataSourceProvenance]
           [org.tribuo.regression RegressionFactory Regressor]
           [org.tribuo.regression.evaluation RegressionEvaluator RegressionEvaluation]
           [com.oracle.labs.mlrg.olcut.config ConfigurationManager]
           [com.oracle.labs.mlrg.olcut.config.json JsonConfigFactory])

The code here uses:

(:import [org.tribuo.regression.evaluation RegressionEvaluator]
          [org.tribuo.regression Regressor]))

@behrica
Copy link
Member

behrica commented Jun 12, 2024

I added the needed "core" dependencies to the deps in this branch:
6f2bae5

This still does require that a "user" of scicloj.ml.tribuo needs to add the deps of the used model, as I did in the 'test':

org.tribuo/tribuo-regression-sgd {:mvn/version "4.2.0"}

So we needed to document this in here.

@behrica
Copy link
Member

behrica commented Jun 12, 2024

Which model is in which "component" and therefore "in which deps", is fully documented in this table:
https://github.com/oracle/tribuo/blob/main/docs/PackageOverview.md
So we can refer to it.

@behrica
Copy link
Member

behrica commented Jun 12, 2024

@kiramclean
I propose to merge this in:
https://github.com/scicloj/scicloj.ml.tribuo/tree/unpackTribuoDeps

and then you can try it.

@kirahowe
Copy link
Member Author

This makes sense. It's cool that they split the deps out into smaller jars that can be included individually to minimize the final package size, but sad that it creates this downstream problem for libraries that consume tribuo. Anyway this seems like a reasonable solution. We should update the readme and include a link that list to make it clear which libs need to be included for which models. I can also make a note of that in all the upcoming tutorials/book chapters etc. Thanks for looking into this! I think Tribuo is a good solution for ML models going forward.. seems to work smoothly and across platforms well.

@behrica
Copy link
Member

behrica commented Jun 29, 2024

fixed in #2 .
Docu was updated by f4ebf1e

@behrica behrica closed this as completed Jun 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants