snakemake validation #18

aryarm · 2020-06-30T15:17:06Z

There are a lot of config options and it might be hard for a user to know how to fill them, despite all of the documentation I've provided. It would be great if we could validate the config files that the user provides, so that the user has instant feedback on whether they filled everything out correctly.

Snakemake lets us use JSON schemas to do this, but I think we'll need something a lot more robust. While JSON schemas might allow us to conditionally require options based on other options, I doubt it will allow us to validate the format and content of some of the pipeline's inputs in the way that I want them to. For example, it would be great if we could notify the user in advance if

Their BAM files don't have read group information.
Their trained RF model doesn't support columns in the datasets for which they want to predict variants
The files they provided are not in the correct TSV format, or are otherwise missing some columns
etc

It feels like those sorts of checks will require much more complicated validation logic than JSON schemas provide. Perhaps the best way to proceed would be to create a validation python module that uses argparse or something similar? We could import that module in the Snakefiles.

Update (10/22/20): There is an alternative to importing the validation module in the Snakefiles. Instead, we could create a single python script run.py that executes Snakemake (much like run.bash). And then, we could import the validation module there. This would also offer us the benefit of being able to place more complicated validation/preparation logic there in the future.
wait - no, that won't work because we won't have access to the dependencies that we need within that validation module unless it's running as a rule or checkpoint

The text was updated successfully, but these errors were encountered:

aryarm · 2021-07-04T17:42:10Z

ok, looking back on this now, I think it would be best to combine this work with aryarm/as_analysis#72
then, we could create a custom python script to read from the new samples.yml file and implement any validation of the input and config options

aryarm added enhancement New feature or request low-priority labels Jun 30, 2020

aryarm added this to the VarCA v2.0.0 milestone Jul 4, 2021

aryarm self-assigned this Jul 4, 2021

aryarm added the breaking this will break backwards-compatibility label Jul 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

snakemake validation #18

snakemake validation #18

aryarm commented Jun 30, 2020 •

edited

Loading

aryarm commented Jul 4, 2021

snakemake validation #18

snakemake validation #18

Comments

aryarm commented Jun 30, 2020 • edited Loading

aryarm commented Jul 4, 2021

aryarm commented Jun 30, 2020 •

edited

Loading