Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summarize numeric columns #24

Open
iaindillingham opened this issue May 27, 2022 · 2 comments
Open

Summarize numeric columns #24

iaindillingham opened this issue May 27, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@iaindillingham
Copy link
Member

In the dataset-report notebook, @robinyjpark summarizes numeric columns with a CDF (cumulative distribution function). Suggestions from @wjchulme in this Slack thread:

  • use a stepped line (plt.step())
  • start the curve at 0
@iaindillingham
Copy link
Member Author

@HelenCEBM also suggests summarizing numeric columns including and excluding missing values. This was based on her experience of using cohort-report, which may replace numpy.nan with zeros (check this!). When would a numeric column include a zero that wasn't a measured value?

@robinyjpark
Copy link
Contributor

Update: The dataset-report notebook is now up to date with Will's suggestion (stepped line, start at 0)!

@iaindillingham iaindillingham moved this to Todo in Overview May 30, 2022
@iaindillingham iaindillingham added the enhancement New feature or request label Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Status: Todo
Development

No branches or pull requests

2 participants