-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/min impurity decrease #165
Conversation
Codecov Report
@@ Coverage Diff @@
## master #165 +/- ##
=======================================
Coverage 98.33% 98.33%
=======================================
Files 37 37
Lines 3126 3132 +6
Branches 460 461 +1
=======================================
+ Hits 3074 3080 +6
Misses 28 28
Partials 24 24
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for helping to improve sksurv. The changes look reasonable. However, you would also need to modify RandomSurvivalForest
to include the new min_logrank_split
parameter. In addition, you need to add tests to assure that the intended change does what it is supposed to.
I'd suggest to create some instances where the min_logrank_split
gets triggered at different depth of the tree building process. You can check the log-rank statistic in a fully grown tree and design tests based on that. You might find the LogrankTreeBuilder class helpful in that regard.
Please let me know if you have any questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, I think the min log-rank check in the unit test differs from what LogrankCriterion
does.
Second, I'm not sure whether the unit test is sufficient in the sense that this condition is actually triggered.
s, p = compare_survival(y, groups) | ||
abs_z = abs(norm.ppf(p/2)) | ||
if s > best_stat and abs_z >= min_logrank: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LogrankCriterion
class uses the absolute value of the log-rank test statistic as splitting criterion, whereas here, the condition is based on the p-value.
@agnesbao Are you still going to work on improving this PR, or should I consider it as abandoned? |
I'm closing this PR as no further information has been provided. Please feel free to reopen this PR if you can provide the requested updates. |
Checklist
min_impurity_decrease
in survival tree for early stoppping? #144What does this implement/fix? Explain your changes
Add
min_logrank_split
parameter toSurvivalTree
which function in place asmin_impurity_decrease
in sklearn tree model. Like in DecisionTree we set min threshold on whether to keep splitting based on impurity decrease, here we set min threshold on the abs logrank stats (how separate the survival curves are)I'm still working on writing a functional test for this param, but want to open this PR first to seek feedback on whether it makes sense.