Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: l2g feature to indicate if gene is protein-coding or not #873

Merged
merged 12 commits into from
Oct 25, 2024

Conversation

xyg123
Copy link
Contributor

@xyg123 xyg123 commented Oct 24, 2024

✨ Context

Adding a feature to tell l2g when a gene is protein coding or not
opentargets/issues#3509

🛠 What does this PR implement

A new feature that is either 0 or 1, for each studylocus-to-gene mapping, 1 when the gene is protein-coding, 0 otherwise.

🚦 Before submitting

  • Do these changes cover one single feature (one change at a time)?
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you make sure there is no commented out code in this PR?
  • Did you follow conventional commits standards in PR title and commit messages?
  • Did you make sure the branch is up-to-date with the dev branch?
  • Did you write any new necessary tests?
  • Did you make sure the changes pass local tests (make test)?
  • Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

@@ -81,6 +81,51 @@ def common_genecount_feature_logic(
)


def common_protein_coding_feature_logic(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only going to be one feature, I don't see the need in having a common function.

@addramir
Copy link
Contributor

Looks good to me. @ireneisdoomed please have a look.
BTW why is it named isProteinCoding1mb? Why 1mb?

Copy link
Contributor

@ireneisdoomed ireneisdoomed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xyg123
Copy link
Contributor Author

xyg123 commented Oct 25, 2024

Looks good to me. @ireneisdoomed please have a look. BTW why is it named isProteinCoding1mb? Why 1mb?

The feature is generated for each study locus, and it’s set to 1mb because, by default, VEP and other related annotation tools use this range to define potential effector genes.

@ireneisdoomed
Copy link
Contributor

@xyg123 I agree with @addramir in that the window only matters to build the list of variant/gene pairs. If the gene is protein coding or not is not dependent of the window.

@xyg123 xyg123 merged commit ee96c11 into dev Oct 25, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dataset documentation Improvements or additions to documentation Feature Method size-M Step
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants