Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose resource files and reference genome to users #7

Open
cgpu opened this issue Nov 25, 2019 · 2 comments
Open

Expose resource files and reference genome to users #7

cgpu opened this issue Nov 25, 2019 · 2 comments

Comments

@cgpu
Copy link
Collaborator

cgpu commented Nov 25, 2019

This is an enhancement request which would aid in being able to run GenomeChronicler for any reference genome.

It also helps avoid GATK conflicts with sarek resources, especially for the non-canonical chromosomes.

The idea would be that all the current files (listed below) are kept as defaults, but also exposed to the user as hyperparameters. This will allow me to expose them as parameters in the GenomeChronicler Nextflow process and keep them in sync with the former processes resource files.

Even further, we could separate each process within the main perl script and have it as N processes in a fully nextflow-ified version. As a grooming first step, we could update below in what step of GenomeChronicler uses the reference files/resources and continue from there.

{process placeholder} resource file
process 1kGP_GRCh38_exome.bed
process 1kGP_GRCh38_exome.bim
process 1kGP_GRCh38_exome.fam
process GRCh38_full_analysis_set_plus_decoy_hla_noChr.dict
process GRCh38_full_analysis_set_plus_decoy_hla_noChr.fa
process GRCh38_full_analysis_set_plus_decoy_hla_noChr.fa.fai
process clinvar.db
process genosetDependencies.txt
process getevidence.db
process gnomad.db
process parsedGenosets.txt
process snpedia.db
process snps.19-114.unique.nochr.bed
process snps.19-114.unique.nochr.bed.gz
process snps.19-114.unique.nochr.bed.gz.tbi

@afonsoguerra feel free to add enhancement label, since this is not a bug, but a nice-to-have addition.

@cgpu cgpu changed the title Expose resource file to user Expose resource files and reference genome to users Nov 25, 2019
@afonsoguerra
Copy link
Member

Moving forward I think the different scripts need to be linked by a configuration file for each run, that will include all the parameters instead of being passed through the command line, that way an arbitrary number of parameters can be set and only passed once... once the manuscript is out of the door I'll look into doing that.

@cgpu
Copy link
Collaborator Author

cgpu commented Nov 27, 2019

Sounds good to me! Since you already have familiarity with Sarek we can work together on this to implement it the pure nextflow way, so we will have a nextflow.config with defaults but also allow the user to custom-specify, either completely free, or with sensible options (eg. curated by us reference and resource bundles).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants