Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum size of training set #13

Open
muammar opened this issue Nov 23, 2021 · 0 comments
Open

Minimum size of training set #13

muammar opened this issue Nov 23, 2021 · 0 comments

Comments

@muammar
Copy link

muammar commented Nov 23, 2021

There is empirical evidence that Chemprop can learn meaningful representations from a dataset of at least 1K pairs SMILES/properties. I think it has been the case for most of the experiments I have carried out. Now, when applying evidential deep learning, this does not seem to hold anymore. From my understanding, that might be because we are predicting in the output layer the parameters to parameterize a normal inverse gamma distribution and modeling that might require more data (I am ok with that). Is this assumption correct?

How did I get to this point? I took a 1.2K data points dataset and randomly partitioned 80%/20% for training and test set, respectively. If I use Chemprop for a regression task without evidential learning, metrics to evaluate predictive power (MAE, RMSE, and R2) are descent. But if I use the same dataset to train the evidential learning case, then the model cannot predict the test set. Of course, it also lets me know that it is very uncertain about making predictions, but I was surprised to see a degradation of generalization.

Any thoughts would be appreciated.

Best,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant