Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/min impurity decrease #165

Closed
wants to merge 6 commits into from

Conversation

agnesbao
Copy link

@agnesbao agnesbao commented Jan 20, 2021

Checklist

What does this implement/fix? Explain your changes
Add min_logrank_split parameter to SurvivalTree which function in place as min_impurity_decrease in sklearn tree model. Like in DecisionTree we set min threshold on whether to keep splitting based on impurity decrease, here we set min threshold on the abs logrank stats (how separate the survival curves are)

I'm still working on writing a functional test for this param, but want to open this PR first to seek feedback on whether it makes sense.

@codecov
Copy link

codecov bot commented Jan 20, 2021

Codecov Report

Merging #165 (2db2b11) into master (ccaf30e) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #165   +/-   ##
=======================================
  Coverage   98.33%   98.33%           
=======================================
  Files          37       37           
  Lines        3126     3132    +6     
  Branches      460      461    +1     
=======================================
+ Hits         3074     3080    +6     
  Misses         28       28           
  Partials       24       24           
Impacted Files Coverage Δ
sksurv/ensemble/forest.py 100.00% <100.00%> (ø)
sksurv/tree/tree.py 99.30% <100.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ccaf30e...2db2b11. Read the comment docs.

Copy link
Owner

@sebp sebp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for helping to improve sksurv. The changes look reasonable. However, you would also need to modify RandomSurvivalForest to include the new min_logrank_split parameter. In addition, you need to add tests to assure that the intended change does what it is supposed to.

I'd suggest to create some instances where the min_logrank_split gets triggered at different depth of the tree building process. You can check the log-rank statistic in a fully grown tree and design tests based on that. You might find the LogrankTreeBuilder class helpful in that regard.

Please let me know if you have any questions.

@agnesbao agnesbao requested a review from sebp June 25, 2021 02:16
Copy link
Owner

@sebp sebp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I think the min log-rank check in the unit test differs from what LogrankCriterion does.

Second, I'm not sure whether the unit test is sufficient in the sense that this condition is actually triggered.

Comment on lines +146 to +148
s, p = compare_survival(y, groups)
abs_z = abs(norm.ppf(p/2))
if s > best_stat and abs_z >= min_logrank:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LogrankCriterion class uses the absolute value of the log-rank test statistic as splitting criterion, whereas here, the condition is based on the p-value.

@sebp
Copy link
Owner

sebp commented Oct 26, 2021

@agnesbao Are you still going to work on improving this PR, or should I consider it as abandoned?

@sebp
Copy link
Owner

sebp commented Nov 23, 2021

I'm closing this PR as no further information has been provided. Please feel free to reopen this PR if you can provide the requested updates.

@sebp sebp closed this Nov 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider opening up min_impurity_decrease in survival tree for early stoppping?
2 participants