Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snakemake validation #18

Open
aryarm opened this issue Jun 30, 2020 · 1 comment
Open

snakemake validation #18

aryarm opened this issue Jun 30, 2020 · 1 comment
Assignees
Labels
breaking this will break backwards-compatibility enhancement New feature or request low-priority
Milestone

Comments

@aryarm
Copy link
Owner

aryarm commented Jun 30, 2020

There are a lot of config options and it might be hard for a user to know how to fill them, despite all of the documentation I've provided. It would be great if we could validate the config files that the user provides, so that the user has instant feedback on whether they filled everything out correctly.

Snakemake lets us use JSON schemas to do this, but I think we'll need something a lot more robust. While JSON schemas might allow us to conditionally require options based on other options, I doubt it will allow us to validate the format and content of some of the pipeline's inputs in the way that I want them to. For example, it would be great if we could notify the user in advance if

  1. Their BAM files don't have read group information.
  2. Their trained RF model doesn't support columns in the datasets for which they want to predict variants
  3. The files they provided are not in the correct TSV format, or are otherwise missing some columns
  4. etc

It feels like those sorts of checks will require much more complicated validation logic than JSON schemas provide. Perhaps the best way to proceed would be to create a validation python module that uses argparse or something similar? We could import that module in the Snakefiles.

Update (10/22/20): There is an alternative to importing the validation module in the Snakefiles. Instead, we could create a single python script run.py that executes Snakemake (much like run.bash). And then, we could import the validation module there. This would also offer us the benefit of being able to place more complicated validation/preparation logic there in the future.
wait - no, that won't work because we won't have access to the dependencies that we need within that validation module unless it's running as a rule or checkpoint

@aryarm aryarm added enhancement New feature or request low-priority labels Jun 30, 2020
@aryarm
Copy link
Owner Author

aryarm commented Jul 4, 2021

ok, looking back on this now, I think it would be best to combine this work with aryarm/as_analysis#72
then, we could create a custom python script to read from the new samples.yml file and implement any validation of the input and config options

@aryarm aryarm added this to the VarCA v2.0.0 milestone Jul 4, 2021
@aryarm aryarm self-assigned this Jul 4, 2021
@aryarm aryarm added the breaking this will break backwards-compatibility label Jul 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking this will break backwards-compatibility enhancement New feature or request low-priority
Projects
None yet
Development

No branches or pull requests

1 participant