Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New functions for better support for different scores and metrics #281

Open
avehtari opened this issue Feb 1, 2025 · 2 comments
Open

New functions for better support for different scores and metrics #281

avehtari opened this issue Feb 1, 2025 · 2 comments

Comments

@avehtari
Copy link
Collaborator

avehtari commented Feb 1, 2025

This is related to issues #223, #220, and #201

We have discussed rewriting CRPS and SCRPS functions to use better computation (probability weighted moment form). As new functions are needed, this has lead to thinking overall improvement of metric and score functions. Some thoughts

  • Keep the current loo for elpd, as it does not have arguments for the predictions and observations
  • a) Deprecate loo_predictive_metric and create new loo_predictive_measure, or
    b) create new loo_predictive_score
    • (a) might be better if we change the output
    • Noa thinks we could drop loo_
    • I think we could drop predictive_ but should not drop loo_
  • For all measures, return a loo object with pointwise and estimates
    • With pointwise information, we can do model comparisons
    • Should we extend the current loo object or make a new object type?
    • One challenge might be the sub sampling loo, which increase the amount of
      of work when implementing other measures
  • Extend loo_compare to work with other measures, as we know how to compute diff_se for all current
    metrics and scores. Currently, loo_predictive_metric returns only estimate and se for one model
    predictions
    • Or do we need a new function
  • Include MAE, RMSE, MSE, R2, ACC, balanced ACC, and Brie score to metrics
  • Include RPS, SRPS, CRPS, SCRPS, and log score to scores
  • When computing S?C?RPS or current metrics, maybe store function specific diagnostics?
  • All measures need psis object or log weights, and thus
    should we always compute p_loo, too? Or should we compute measure specific value describing
    amount of fitting?
  • I have a minimal function that computes loo versions of RPS, SRPS, CRPS, SCRPS and includes documentation for the equations with references (but no argument checking, and computes it only for one scalar y, etc)

tagging @n-kall and @jgabry

@n-kall
Copy link
Contributor

n-kall commented Feb 1, 2025

I think dropping "loo_" could only make sense if the functions could also calculate the measure for the in sample case not just the loo case

@avehtari
Copy link
Collaborator Author

avehtari commented Feb 1, 2025

Functions can calculate in sample if the weights are equal, and we want to make it easy to compute in sample

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants