-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: new inphared-db wrapper #1550
base: master
Are you sure you want to change the base?
Conversation
Is the config file really necessary? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good already. But of course, I have a bunch of suggestions. Maybe we can work on those together during the symposium or even the workshop.
|
||
rule get_inphareddb: | ||
output: | ||
expand("{date}{suffix}", date=config["date"], suffix=config["suffix"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the things we try with these wrappers, is to have them work with arbitrary file names. So the config[]
entries should not be used in this example Snakefile, but rather in the wrapper.py
file (via snakemake.params.date
, for example).
expand("{date}{suffix}", date=config["date"], suffix=config["suffix"]) | |
"resources/inphared.fasta" |
To add the values in the config variables back into the file name, the users of the wrapper should then add python code around it. But I also like the idea of directly showcasing how to do that here. So maybe we could have two versions of calling the wrapper here, one with a fixed file name (like i suggest here), and one that contains the config[]
entries.
@@ -0,0 +1,4 @@ | |||
name: inphared-db | |||
description: Download sequence file from the Inphared database (https://github.com/RyanCook94/inphared/blob/main/README.md), and store them in a single .fasta file. Please check the current database available at the above link and adjust the config file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description: Download sequence file from the Inphared database (https://github.com/RyanCook94/inphared/blob/main/README.md), and store them in a single .fasta file. Please check the current database available at the above link and adjust the config file. | |
description: Download sequence file from the [inphared database](https://github.com/RyanCook94/inphared/blob/main/README.md), and store them in a single .fasta file. Please check the above link for available database version and adjust the config file. |
output: | ||
expand("{date}{suffix}", date=config["date"], suffix=config["suffix"]) | ||
params: | ||
prefix = config["prefix"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename this to url:
, as that might make its purpose a bit clearer?
prefix = config["prefix"], | |
url = config["url"], |
Obviously, the same rename then applies in the config.yaml
file.
@@ -0,0 +1,29 @@ | |||
rule get_genome: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file can be deleted here, right? It's simply a copy-paste leftover, if I understand this correctly.
@@ -0,0 +1,80 @@ | |||
__author__ = "Johannes Köster" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file can simply be deleted here, right?
@@ -0,0 +1,30 @@ | |||
rule get_genome: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file can simply be deleted here, right?
shell: | ||
"curl {params.prefix}{params.date}{params.suffix} -o {params.date}{params.suffix}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some little things here:
- You have to reference anything from the snakemake rule via the
snakemake
dict. So for examplesnakemake.params.date
. - We should only use the full
output
file name here, so that users can put in there whatever they want. - We have to call the
shell()
function here, as this is not the body of an actual rule, but rather a plain python script. - We have to use the
f""
construct here to use format strings to fill in the variables in this context. Here, we are not dealing with snakemake wildcards, but rather python variables. The{}
syntax is the same, so this is very confusing... - I would manually put in the separator between URL and file name here, just for a slightly clearer structure of the download link. Then, we can remove the trailing
/
in the url inconfig.yaml
.
shell: | |
"curl {params.prefix}{params.date}{params.suffix} -o {params.date}{params.suffix}" | |
shell(f"curl {snakemake.params.url}/{snakemake.params.date}{snakemake.params.suffix} -o {snakemake.output}" |
prefix: | ||
"https://millardlab-inphared.s3.climb.ac.uk/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep this in sync with the suggestions elsewhere, we would change this to:
prefix: | |
"https://millardlab-inphared.s3.climb.ac.uk/" | |
url: | |
"https://millardlab-inphared.s3.climb.ac.uk" |
"2Jul2023" | ||
|
||
suffix: | ||
"_refseq_genomes.fa" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a smaller reference fasta file (of some small subset of viruses, e.g.), that could be used in the example? This should optimally get executed in the CI tests regularly, so shouldn't download much, if possible.
Also, we should still add an actual test run for this wrapper.
This PR was marked as stale because it has been open for 6 months with no activity. |
Description
Adding a new wrapper to download the current inphared database.
QC
For all wrappers added by this PR,
input:
andoutput:
file paths in the resulting rule can be changed arbitrarily,threads: x
statement withx
being a reasonable default,map_reads
for a step that maps reads),environment.yaml
specifications follow the respective best practices,input:
oroutput:
),Snakefile
s and their entries are explained via comments (input:
/output:
/params:
etc.),stderr
and/orstdout
are logged correctly (log:
), depending on the wrapped tool,tempfile.gettempdir()
points to (see here; this also means that using any Pythontempfile
default behavior works),meta.yaml
contains a link to the documentation of the respective tool or command,Snakefile
s pass the linting (snakemake --lint
),Snakefile
s are formatted with snakefmt,