Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: do not impute isProteinCoding #902

Merged
merged 3 commits into from
Nov 5, 2024
Merged

fix: do not impute isProteinCoding #902

merged 3 commits into from
Nov 5, 2024

Conversation

addramir
Copy link
Contributor

@addramir addramir commented Nov 5, 2024

✨ Context

removing isProteinCoding

πŸ›  What does this PR implement

πŸ™ˆ Missing

🚦 Before submitting

  • Do these changes cover one single feature (one change at a time)?
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you make sure there is no commented out code in this PR?
  • Did you follow conventional commits standards in PR title and commit messages?
  • Did you make sure the branch is up-to-date with the dev branch?
  • Did you write any new necessary tests?
  • Did you make sure the changes pass local tests (make test)?
  • Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

@github-actions github-actions bot added bug Something isn't working Dataset size-XS labels Nov 5, 2024
@addramir addramir marked this pull request as ready for review November 5, 2024 15:05
cols_to_impute = [
"proteinGeneCount500kb",
"geneCount500kb",
"credibleSetConfidence",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there nulls for this column? I fixed that in a previous PR

Getting the mean of isProteinCoding will result in always getting 1 or 0 since when the data is present is unanymous

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are nulls for credibleSetConfidence. The same reason as for geneCounts. For protein coding it is not valid to impute using studyLocusId, using geneId would be more reasonble, but it is fine to use 0.

@ireneisdoomed ireneisdoomed changed the title fix: fix col names for imputation fix: do not impute isProteinCoding Nov 5, 2024
Copy link
Contributor

@ireneisdoomed ireneisdoomed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that imputing isProteinCoding doesn't make sense

@addramir addramir merged commit 6ec0d45 into dev Nov 5, 2024
5 checks passed
@addramir addramir deleted the yt_fix_col_to_impute branch November 5, 2024 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Dataset size-XS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants