You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are a lot of config options and it might be hard for a user to know how to fill them, despite all of the documentation I've provided. It would be great if we could validate the config files that the user provides, so that the user has instant feedback on whether they filled everything out correctly.
Snakemake lets us use JSON schemas to do this, but I think we'll need something a lot more robust. While JSON schemas might allow us to conditionally require options based on other options, I doubt it will allow us to validate the format and content of some of the pipeline's inputs in the way that I want them to. For example, it would be great if we could notify the user in advance if
Their BAM files don't have read group information.
Their trained RF model doesn't support columns in the datasets for which they want to predict variants
The files they provided are not in the correct TSV format, or are otherwise missing some columns
etc
It feels like those sorts of checks will require much more complicated validation logic than JSON schemas provide. Perhaps the best way to proceed would be to create a validation python module that uses argparse or something similar? We could import that module in the Snakefiles.
Update (10/22/20): There is an alternative to importing the validation module in the Snakefiles. Instead, we could create a single python script run.py that executes Snakemake (much like run.bash). And then, we could import the validation module there. This would also offer us the benefit of being able to place more complicated validation/preparation logic there in the future.
wait - no, that won't work because we won't have access to the dependencies that we need within that validation module unless it's running as a rule or checkpoint
The text was updated successfully, but these errors were encountered:
ok, looking back on this now, I think it would be best to combine this work with aryarm/as_analysis#72
then, we could create a custom python script to read from the new samples.yml file and implement any validation of the input and config options
There are a lot of config options and it might be hard for a user to know how to fill them, despite all of the documentation I've provided. It would be great if we could validate the config files that the user provides, so that the user has instant feedback on whether they filled everything out correctly.
Snakemake lets us use JSON schemas to do this, but I think we'll need something a lot more robust. While JSON schemas might allow us to conditionally require options based on other options, I doubt it will allow us to validate the format and content of some of the pipeline's inputs in the way that I want them to. For example, it would be great if we could notify the user in advance if
It feels like those sorts of checks will require much more complicated validation logic than JSON schemas provide. Perhaps the best way to proceed would be to create a validation python module that uses argparse or something similar? We could import that module in the Snakefiles.
Update (10/22/20): There is an alternative to importing the validation module in the Snakefiles. Instead, we could create a single python script
run.py
that executes Snakemake (much likerun.bash
). And then, we could import the validation module there. This would also offer us the benefit of being able to place more complicated validation/preparation logic there in the future.wait - no, that won't work because we won't have access to the dependencies that we need within that validation module unless it's running as a rule or checkpoint
The text was updated successfully, but these errors were encountered: