feat: new inphared-db wrapper #1550

nacnoriko · 2023-07-20T14:43:12Z

Description

Adding a new wrapper to download the current inphared database.

QC

I confirm that:

For all wrappers added by this PR,

there is a test case which covers any introduced changes,
input: and output: file paths in the resulting rule can be changed arbitrarily,
either the wrapper can only use a single core, or the example rule contains a threads: x statement with x being a reasonable default,
rule names in the test case are in snake_case and somehow tell what the rule is about or match the tools purpose or name (e.g., map_reads for a step that maps reads),
all environment.yaml specifications follow the respective best practices,
wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:),
all fields of the example rules in the Snakefiles and their entries are explained via comments (input:/output:/params: etc.),
stderr and/or stdout are logged correctly (log:), depending on the wrapped tool,
temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to (see here; this also means that using any Python tempfile default behavior works),
the meta.yaml contains a link to the documentation of the respective tool or command,
Snakefiles pass the linting (snakemake --lint),
Snakefiles are formatted with snakefmt,
Python wrapper scripts are formatted with black.
Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays).

fgvieira · 2023-07-28T13:13:38Z

Is the config file really necessary?
What is the old_wrapper.py file?

dlaehnemann

This looks good already. But of course, I have a bunch of suggestions. Maybe we can work on those together during the symposium or even the workshop.

dlaehnemann · 2023-08-01T08:03:49Z

bio/reference/inphared-db/test/Snakefile

+
+rule get_inphareddb:
+    output:
+        expand("{date}{suffix}", date=config["date"], suffix=config["suffix"])    


One of the things we try with these wrappers, is to have them work with arbitrary file names. So the config[] entries should not be used in this example Snakefile, but rather in the wrapper.py file (via snakemake.params.date, for example).

Suggested change

expand("{date}{suffix}", date=config["date"], suffix=config["suffix"])

"resources/inphared.fasta"

To add the values in the config variables back into the file name, the users of the wrapper should then add python code around it. But I also like the idea of directly showcasing how to do that here. So maybe we could have two versions of calling the wrapper here, one with a fixed file name (like i suggest here), and one that contains the config[] entries.

dlaehnemann · 2023-08-01T08:03:53Z

bio/reference/inphared-db/meta.yaml

@@ -0,0 +1,4 @@
+name: inphared-db
+description: Download sequence file from the Inphared database (https://github.com/RyanCook94/inphared/blob/main/README.md), and store them in a single .fasta file. Please check the current database available at the above link and adjust the config file. 


Suggested change

description: Download sequence file from the Inphared database (https://github.com/RyanCook94/inphared/blob/main/README.md), and store them in a single .fasta file. Please check the current database available at the above link and adjust the config file.

description: Download sequence file from the [inphared database](https://github.com/RyanCook94/inphared/blob/main/README.md), and store them in a single .fasta file. Please check the above link for available database version and adjust the config file.

dlaehnemann · 2023-08-01T08:06:08Z

bio/reference/inphared-db/test/Snakefile

+    output:
+        expand("{date}{suffix}", date=config["date"], suffix=config["suffix"])    
+    params:
+        prefix = config["prefix"], 


Maybe rename this to url:, as that might make its purpose a bit clearer?

Suggested change

prefix = config["prefix"],

url = config["url"],

Obviously, the same rename then applies in the config.yaml file.

dlaehnemann · 2023-08-01T08:06:36Z

bio/reference/inphared-db/test/old_release.smk

@@ -0,0 +1,29 @@
+rule get_genome:


This file can be deleted here, right? It's simply a copy-paste leftover, if I understand this correctly.

dlaehnemann · 2023-08-01T08:07:00Z

bio/reference/inphared-db/old_wrapper.py

@@ -0,0 +1,80 @@
+__author__ = "Johannes Köster"


This file can simply be deleted here, right?

dlaehnemann · 2023-08-01T08:07:14Z

bio/reference/inphared-db/test/old_snakefile.smk

@@ -0,0 +1,30 @@
+rule get_genome:


This file can simply be deleted here, right?

dlaehnemann · 2023-08-01T08:13:39Z

bio/reference/inphared-db/wrapper.py

+    shell:
+        "curl {params.prefix}{params.date}{params.suffix} -o {params.date}{params.suffix}"


Some little things here:

You have to reference anything from the snakemake rule via the snakemake dict. So for example snakemake.params.date.

We should only use the full output file name here, so that users can put in there whatever they want.

We have to call the shell() function here, as this is not the body of an actual rule, but rather a plain python script.

We have to use the f"" construct here to use format strings to fill in the variables in this context. Here, we are not dealing with snakemake wildcards, but rather python variables. The {} syntax is the same, so this is very confusing...

I would manually put in the separator between URL and file name here, just for a slightly clearer structure of the download link. Then, we can remove the trailing/ in the url in config.yaml.

Suggested change

shell:

"curl {params.prefix}{params.date}{params.suffix} -o {params.date}{params.suffix}"

shell(f"curl {snakemake.params.url}/{snakemake.params.date}{snakemake.params.suffix} -o {snakemake.output}"

dlaehnemann · 2023-08-01T08:14:35Z

bio/reference/inphared-db/test/config.yaml

+prefix:
+    "https://millardlab-inphared.s3.climb.ac.uk/"


To keep this in sync with the suggestions elsewhere, we would change this to:

Suggested change

prefix:

"https://millardlab-inphared.s3.climb.ac.uk/"

url:

"https://millardlab-inphared.s3.climb.ac.uk"

dlaehnemann · 2023-08-01T08:16:09Z

bio/reference/inphared-db/test/config.yaml

+    "2Jul2023"
+
+suffix:
+    "_refseq_genomes.fa"


Is there a smaller reference fasta file (of some small subset of viruses, e.g.), that could be used in the example? This should optimally get executed in the CI tests regularly, so shouldn't download much, if possible.

Also, we should still add an actual test run for this wrapper.

github-actions · 2024-02-01T01:26:34Z

This PR was marked as stale because it has been open for 6 months with no activity.

nacnoriko and others added 2 commits July 20, 2023 16:13

add first inphared-db wrapper

a7dba06

Merge branch 'master' into inphared-wrapper

fc53af2

dlaehnemann requested changes Aug 1, 2023

View reviewed changes

github-actions bot added the Stale label Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: new inphared-db wrapper #1550

feat: new inphared-db wrapper #1550

nacnoriko commented Jul 20, 2023

fgvieira commented Jul 28, 2023

dlaehnemann left a comment

dlaehnemann Aug 1, 2023

dlaehnemann Aug 1, 2023

dlaehnemann Aug 1, 2023

dlaehnemann Aug 1, 2023

dlaehnemann Aug 1, 2023

dlaehnemann Aug 1, 2023

dlaehnemann Aug 1, 2023

dlaehnemann Aug 1, 2023

dlaehnemann Aug 1, 2023

github-actions bot commented Feb 1, 2024

	expand("{date}{suffix}", date=config["date"], suffix=config["suffix"])
	"resources/inphared.fasta"

		@@ -0,0 +1,4 @@
		name: inphared-db
		description: Download sequence file from the Inphared database (https://github.com/RyanCook94/inphared/blob/main/README.md), and store them in a single .fasta file. Please check the current database available at the above link and adjust the config file.

		shell:
		"curl {params.prefix}{params.date}{params.suffix} -o {params.date}{params.suffix}"

	shell:
	"curl {params.prefix}{params.date}{params.suffix} -o {params.date}{params.suffix}"
	shell(f"curl {snakemake.params.url}/{snakemake.params.date}{snakemake.params.suffix} -o {snakemake.output}"

feat: new inphared-db wrapper #1550

Are you sure you want to change the base?

feat: new inphared-db wrapper #1550

Conversation

nacnoriko commented Jul 20, 2023

Description

QC

fgvieira commented Jul 28, 2023

dlaehnemann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Feb 1, 2024