Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update BenchmarkSVs workflow(s) with some new features #199

Merged
merged 14 commits into from
Feb 5, 2025

Conversation

rickymagner
Copy link
Contributor

This PR includes a bunch of refactoring, improvements, and new features for the process of benchmarking SV VCFs. At a high level, this includes:

  • Allow for the option to use truvari refine and truvari ga4gh for collapsing (or "harmonizing") similar events in truth/query down to one event to try to improve benchmarking statistics where calls might get mismatched due to extreme fuzziness in the calling step. This includes an alignment step using mafft as outlined in the truvari documentation on the process. The docker image is updated to include newer versions of truvari as well as mafft. This option requires the input files to be phased.
  • Splits multiallelic sites before running truvari since the tool expects this (previous this would be expected for the users to preprocess in this way, but this adds a quick convenience).
  • The QC tasks were split out into a separate workflow to streamline the benchmarking vs qc/counting tasks. Some common tasks were moved to a third file and are imported.
  • Dockstore yml has been updated to be able to import both workflows there.
  • The benchmarking stats also include a breakdown by HET vs HOM sites (and both together).

Copy link
Collaborator

@kachulis kachulis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! only question is whether you want to add any tests.

Array[String] base_sample_names
File base_vcf
File base_vcf_index
String base_sample_name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so is this changing to only support benchmarking a single sample at a time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I wanted to keep things simpler in the WDL / Terra UI and made this just single sample for now. It would be easy to write a wrapper in the future if there was demand.

@rickymagner
Copy link
Contributor Author

I'll put tests on the infinite to-do list and get this merged for now. Thanks for checking!

@rickymagner rickymagner merged commit bff59b5 into main Feb 5, 2025
4 checks passed
@rickymagner rickymagner deleted the rm_sv_refine branch February 5, 2025 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants