Update BenchmarkSVs workflow(s) with some new features #199

rickymagner · 2024-11-07T19:15:13Z

This PR includes a bunch of refactoring, improvements, and new features for the process of benchmarking SV VCFs. At a high level, this includes:

Allow for the option to use truvari refine and truvari ga4gh for collapsing (or "harmonizing") similar events in truth/query down to one event to try to improve benchmarking statistics where calls might get mismatched due to extreme fuzziness in the calling step. This includes an alignment step using mafft as outlined in the truvari documentation on the process. The docker image is updated to include newer versions of truvari as well as mafft. This option requires the input files to be phased.
Splits multiallelic sites before running truvari since the tool expects this (previous this would be expected for the users to preprocess in this way, but this adds a quick convenience).
The QC tasks were split out into a separate workflow to streamline the benchmarking vs qc/counting tasks. Some common tasks were moved to a third file and are imported.
Dockstore yml has been updated to be able to import both workflows there.
The benchmarking stats also include a breakdown by HET vs HOM sites (and both together).

kachulis

looks good! only question is whether you want to add any tests.

kachulis · 2025-02-04T16:39:01Z

BenchmarkSVs/BenchmarkSVs.wdl

-        Array[String] base_sample_names
+        File base_vcf
+        File base_vcf_index
+        String base_sample_name


so is this changing to only support benchmarking a single sample at a time?

Yeah I wanted to keep things simpler in the WDL / Terra UI and made this just single sample for now. It would be easy to write a wrapper in the future if there was demand.

rickymagner · 2025-02-05T20:45:50Z

I'll put tests on the infinite to-do list and get this merged for now. Thanks for checking!

rickymagner added 14 commits November 4, 2024 22:01

Refactor SV into benchmarking and QC separately and simplify

e812e48

Allow optional truvari refine step in benchmarking

9386284

Update dockstore yml

ebce557

Fix dockstore links

50f3009

Update BenchmarkSVs and some dockers for SVs + phasing

e7871ba

Split MAs before truvari

912d2d0

Typo in truvari refine command

0e74926

Add ga4gh harmonization; split counts by GT type

ac81673

Fix path for harmonized outputs

6a9b00e

Update versions; use refine output for truvari task

d44a02d

Update README

9db5280

Fix edge case where pandas parses 1 GT as int

c14ffc6

Add optional flag for collect closest stats

8905b5a

Expose additional annotated truvari vcf outputs to top level

eeed291

kachulis approved these changes Feb 4, 2025

View reviewed changes

rickymagner merged commit bff59b5 into main Feb 5, 2025
4 checks passed

rickymagner deleted the rm_sv_refine branch February 5, 2025 20:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update BenchmarkSVs workflow(s) with some new features #199

Update BenchmarkSVs workflow(s) with some new features #199

rickymagner commented Nov 7, 2024

kachulis left a comment

kachulis Feb 4, 2025

rickymagner Feb 5, 2025

rickymagner commented Feb 5, 2025

Update BenchmarkSVs workflow(s) with some new features #199

Update BenchmarkSVs workflow(s) with some new features #199

Conversation

rickymagner commented Nov 7, 2024

kachulis left a comment

Choose a reason for hiding this comment

kachulis Feb 4, 2025

Choose a reason for hiding this comment

rickymagner Feb 5, 2025

Choose a reason for hiding this comment

rickymagner commented Feb 5, 2025