Minimum size of training set #13

muammar · 2021-11-23T20:14:33Z

There is empirical evidence that Chemprop can learn meaningful representations from a dataset of at least 1K pairs SMILES/properties. I think it has been the case for most of the experiments I have carried out. Now, when applying evidential deep learning, this does not seem to hold anymore. From my understanding, that might be because we are predicting in the output layer the parameters to parameterize a normal inverse gamma distribution and modeling that might require more data (I am ok with that). Is this assumption correct?

How did I get to this point? I took a 1.2K data points dataset and randomly partitioned 80%/20% for training and test set, respectively. If I use Chemprop for a regression task without evidential learning, metrics to evaluate predictive power (MAE, RMSE, and R2) are descent. But if I use the same dataset to train the evidential learning case, then the model cannot predict the test set. Of course, it also lets me know that it is very uncertain about making predictions, but I was surprised to see a degradation of generalization.

Any thoughts would be appreciated.

Best,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimum size of training set #13

Minimum size of training set #13

muammar commented Nov 23, 2021 •

edited

Loading

Minimum size of training set #13

Minimum size of training set #13

Comments

muammar commented Nov 23, 2021 • edited Loading

muammar commented Nov 23, 2021 •

edited

Loading