-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RandomSurvivalForest is not consistent making predictions of survival functions at certain times #249
Comments
There's indeed an inconsistency. For |
#363 adds a test describing (my understanding of) the current behaviour described in this issue. Wondering what a fix might look like?
|
@cpoerschke I guess the question goes to @sebp however for me all models should support all-times range. If they give wrong predictions outside the trained ranges so be it. |
If with "all-times ranges" you mean the range of the training data (disregarding censoring status), I would agree. It would also be worthwhile to check how random survival forest or gradient boosting implementation in R handle this. |
Thank you very much @sebp I don't know how those implementations do it in R, however, scikit-learn in random forest model doesn't put constraints on users if they want to go outside the training data (even though the results are obviously not good outside the training region). Here is a gist of that behavior that you can run in colab as well: https://gist.github.com/alonsosilvaallende/ef813de35a8b0f4328b451aea46b9c48 |
414354a explores clipping inputs on an opt-in basis. |
@cpoerschke Thanks for adding the clipping option. This might be a good option for advanced users that are aware of the risks. For time points that are smaller than the smallest time point in the training data, it would actually be okay to predict a survival probability of 1. For time points beyond the far end of the time axis, predictions would be very speculative. Regarding that |
Values outside the interval specified by `self.domain` will raise an exception. Values in `x` that are in the interval `[self.domain[0]; self.x[0]]` get mapped to `self.y[0]`. By default, the lower bound is set to 0. Fixes #249
Values outside the interval specified by `self.domain` will raise an exception. Values in `x` that are in the interval `[self.domain[0]; self.x[0]]` get mapped to `self.y[0]`. By default, the lower bound is set to 0. Fixes #249
Values outside the interval specified by `self.domain` will raise an exception. Values in `x` that are in the interval `[self.domain[0]; self.x[0]]` get mapped to `self.y[0]`. By default, the lower bound is set to 0. Fixes #249
#375 ensures that all estimators behave the same. It does not allow for evaluating the survival function (or CHF) beyond the maximum time point in the training data. Evaluating below the minimum time point is allowed, if the value is non-negative. @cpoerschke Feel free to open a new issue/PR to discuss the clipping approach you proposed. |
Describe the bug
RandomSurvivalForest is not consistent making predictions of survival functions (predict_survival_function) at certain times. I would expect that RandomSurvivalForest can predict survival functions when the times are within the intervals of the training and validation times. However, while the intervals of both the training and validation times are [0,182], RandomSurvivalForest cannot predict the survival function on times between [0,175]. Note that this doesn't happen with CoxPHSurvivalAnalysis
or GradientBoostingSurvivalAnalysis
Code Sample to Reproduce the Bug
Expected Results
Minimum training time: 0.0
Maximum training time: 182.0
Minimum validation time: 0.0
Maximum validation time: 182.0
Integrated Brier Score: 0.21497683508486282
Minimum training time: 0.0
Maximum training time: 182.0
Minimum validation time: 0.0
Maximum validation time: 182.0
Integrated Brier Score: 0.21497683508486282
Actual Results
Minimum training time: 0.0
Maximum training time: 182.0
Minimum validation time: 0.0
Maximum validation time: 182.0
Integrated Brier Score: 0.21497683508486282
Minimum training time: 0.0
Maximum training time: 182.0
Minimum validation time: 0.0
Maximum validation time: 182.0
Versions
Please execute the following snippet and paste the output below.
System:
python: 3.7.12 (default, Jan 15 2022, 18:48:18) [GCC 7.5.0]
executable: /usr/bin/python3
machine: Linux-5.4.144+-x86_64-with-Ubuntu-18.04-bionic
Python dependencies:
pip: 21.1.3
setuptools: 57.4.0
sklearn: 1.0.2
numpy: 1.21.5
scipy: 1.4.1
Cython: 0.29.27
pandas: 1.3.5
matplotlib: 3.2.2
joblib: 1.1.0
threadpoolctl: 3.1.0
Built with OpenMP: True
sksurv: 0.17.0
cvxopt: 1.2.7
cvxpy: 1.0.31
numexpr: 2.8.1
osqp: 0.6.2
The text was updated successfully, but these errors were encountered: