Question about Support for Survival/Time-to-Event Data #1285

lict99 · 2024-11-24T08:41:29Z

I am writing to express my appreciation for the excellent work on the package, which has greatly facilitated causal inference in Python. As a user of the package, I have been able to successfully apply it to various datasets and problems.

However, I was wondering if it would be possible to extend DoWhy's capabilities to support survival or time-to-event data? Currently, the package appears to focus on traditional outcomes such as binary, continuous, or count responses. Time-to-event data is a common outcome type in many fields (e.g., medicine, economics, sociology), and I believe that supporting this would greatly enhance the utility of DoWhy.

I understand that adding new features can be a significant undertaking, but I was hoping to get some insight into whether there are any plans to support survival analysis or if you could recommend alternative packages or methods for causal inference with time-to-event data. Any advice or resources you could share would be greatly appreciated.

Thank you again for your hard work on the package.

amit-sharma · 2024-11-24T09:57:45Z

Can you provide a motivating example or dataset on which you'd like to run DoWhy?

Supporting new kinds of data is significant work. So we can try to do this step-by-step: first, let's understand a popular, high impact scenario where we can extend DoWhy, and then later we can support survival analysis fully.

lict99 · 2024-11-25T12:30:12Z

Survival data typically comprises two key components: time (the duration from the start of an observation period to either an event occurrence, study end, loss of contact, or withdrawal) and status (indicating whether an event has occurred or if censoring has taken place). I've found several popular datasets on Kaggle datasets. Specifically:

The Breast Cancer Survival Dataset contains a clear distinction between the patient's status (Patient_Status column) and time (interval between Date_of_Surgery and Date_of_Last_Visit). Other variables within this dataset can be used as potential predictors.
The Cirrhosis Patient Survival Prediction dataset features status (Status column) and time (N_Days column), with other variables available for use in predictive modeling.

Additionally, I've found a helpful introduction to survival analysis on the wiki, which provides a solid starting point for understanding this topic.

Thank you for your attention to this matter.😊

github-actions · 2024-12-26T01:59:15Z

This issue is stale because it has been open for 30 days with no activity.

samblechman · 2025-01-06T14:43:38Z

Adding an additional request here for this functionality. I do understand this would be a significant amount of work, but agree that is would be extremely useful for many applications (e.g., medical).

For example, oftentimes the outcome of interest is 30-day mortality after treatment. Patients who died anytime after 30 days or never died are "right-censored" and to understand the effect of treatment or covariates on 30-day mortality, the survival time of right-censored patients is imputed as 30 days. However, without a test that considers right-censoring, imputing survival time as 30 days would affect treatment effect estimate.

emrekiciman · 2025-01-06T17:08:00Z

Hey folks, I wanted to add a link to this discussion of survival analysis in the discord: https://discord.com/channels/818456847551168542/818456856137170996/1221611463823720588

Notably, Paidamoyo Chapfuwa has published a counterfactual survival analysis notebook that could be integrated into PyWhy and extended with its identification algorithms and/or CATE estimators, etc. She was looking for someone who might push the integration forward. Would make a "good first project" for a person interested in getting more involved.

https://github.com/paidamoyo/counterfactual_survival_analysis

lict99 added the question Further information is requested label Nov 24, 2024

amit-sharma added the enhancement New feature or request label Nov 24, 2024

lict99 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 25, 2024

lict99 reopened this Nov 25, 2024

github-actions bot added the stale label Dec 26, 2024

github-actions bot removed the stale label Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Support for Survival/Time-to-Event Data #1285

Question about Support for Survival/Time-to-Event Data #1285

lict99 commented Nov 24, 2024

amit-sharma commented Nov 24, 2024

lict99 commented Nov 25, 2024

github-actions bot commented Dec 26, 2024

samblechman commented Jan 6, 2025

emrekiciman commented Jan 6, 2025 •

edited

Loading

Question about Support for Survival/Time-to-Event Data #1285

Question about Support for Survival/Time-to-Event Data #1285

Comments

lict99 commented Nov 24, 2024

amit-sharma commented Nov 24, 2024

lict99 commented Nov 25, 2024

github-actions bot commented Dec 26, 2024

samblechman commented Jan 6, 2025

emrekiciman commented Jan 6, 2025 • edited Loading

emrekiciman commented Jan 6, 2025 •

edited

Loading