Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clean up ngi-genomes folder #1007

Open
anoronh4 opened this issue Mar 13, 2024 · 0 comments
Open

clean up ngi-genomes folder #1007

anoronh4 opened this issue Mar 13, 2024 · 0 comments

Comments

@anoronh4
Copy link
Collaborator

We have several files in the ngi-igenomes folder on juno that do not actually exist in the remote reference repository, making recreation of this reference folder difficult in any other environment. Many of these paths are listed in the tempo references configuration file. Here's a list of files that are newer than Nov 16, 2018:

$ find $PWD -mtime -1930 -type f -exec ls -l {} \;
-rw-r----- 1 gongy cmopipeline 242018150 Mar 10  2022 /juno/work/taylorlab/cmopipeline/mskcc-igenomes/igenomes/Homo_sapiens/GATK/hg19/1000G_phase1.indels.hg19.sites.vcf
-rw-r----- 1 gongy cmopipeline 90196895 Mar 10  2022 /juno/work/taylorlab/cmopipeline/mskcc-igenomes/igenomes/Homo_sapiens/GATK/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf
-rw-r----- 1 gongy cmopipeline 1484596 Mar 10  2022 /juno/work/taylorlab/cmopipeline/mskcc-igenomes/igenomes/Homo_sapiens/GATK/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.idx
-rw-r----- 1 gongy cmopipeline 12381528 Mar 10  2022 /juno/work/taylorlab/cmopipeline/mskcc-igenomes/igenomes/Homo_sapiens/GATK/hg19/dbsnp_138.hg19.vcf.idx
-rw-r----- 1 gongy cmopipeline 1238920 Mar 10  2022 /juno/work/taylorlab/cmopipeline/mskcc-igenomes/igenomes/Homo_sapiens/GATK/hg19/1000G_phase1.indels.hg19.sites.vcf.idx
-rw-r----- 1 gongy cmopipeline 10796220779 Mar 10  2022 /juno/work/taylorlab/cmopipeline/mskcc-igenomes/igenomes/Homo_sapiens/GATK/hg19/dbsnp_138.hg19.vcf
-rw-r--r-- 1 socci cmopipeline 1517 Mar 18  2019 /juno/work/taylorlab/cmopipeline/mskcc-igenomes/igenomes/Homo_sapiens/GATK/GRCh37/Annotation/intervals/human.b37.genome.bed
-rw-r--r-- 1 socci cmopipeline 1360930446 Mar  7  2019 /juno/work/taylorlab/cmopipeline/mskcc-igenomes/igenomes/Homo_sapiens/GATK/GRCh37/Sequence/WholeGenomeFasta/human_g1k_v37_decoy.fasta.microsatellites.list
-rw-rw-r-- 1 socci cmopipeline 3189750467 Feb 27  2019 /juno/work/taylorlab/cmopipeline/mskcc-igenomes/igenomes/Homo_sapiens/GATK/GRCh37/Sequence/WholeGenomeFasta/human_g1k_v37_decoy.fasta
-rw-r--r-- 1 socci cmopipeline 67108864 Apr 22  2019 /juno/work/taylorlab/cmopipeline/mskcc-igenomes/igenomes/Homo_sapiens/GATK/GRCh37/Sequence/WholeGenomeFasta/human_g1k_v37_decoy.fasta.index
-rw-r--r-- 1 noronhaa cmopipeline 16854 Jun 29  2022 /juno/work/taylorlab/cmopipeline/mskcc-igenomes/igenomes/Homo_sapiens/GATK/GRCh37/Sequence/BWAIndex/human_g1k_v37_decoy.fasta.dict
-rw-r--r-- 1 noronhaa cmopipeline 1176551519 Jun 30  2022 /juno/work/taylorlab/cmopipeline/mskcc-igenomes/igenomes/Homo_sapiens/GATK/GRCh37/Sequence/BWAIndex/human_g1k_v37_decoy.fasta.gridsscache
-rw-rw-r-- 1 socci cmopipeline 3189750467 Jul  1  2019 /juno/work/taylorlab/cmopipeline/mskcc-igenomes/igenomes/Homo_sapiens/GATK/GRCh37/Sequence/BWAIndex/human_g1k_v37_decoy.fasta
-rw-r--r-- 1 wooh cmopipeline 2813 Jun  7  2021 /juno/work/taylorlab/cmopipeline/mskcc-igenomes/igenomes/Homo_sapiens/GATK/GRCh37/Sequence/BWAIndex/human_g1k_v37_decoy.fasta.fai
-rw-r--r-- 1 socci cmopipeline 9040952644 Mar  5  2019 /juno/work/taylorlab/cmopipeline/mskcc-igenomes/igenomes/Homo_sapiens/GATK/b37/dbsnp_137.b37__RmDupsClean__plusPseudo50__DROP_SORT.vcf
-rw-r--r-- 1 socci cmopipeline 1015019014 Mar  5  2019 /juno/work/taylorlab/cmopipeline/mskcc-igenomes/igenomes/Homo_sapiens/GATK/b37/dbsnp_137.b37__RmDupsClean__plusPseudo50__DROP_SORT.vcf.gz

some files such as human.b37.genome.bed, human_g1k_v37_decoy.fasta.microsatellites.list and dbsnp_137.b37__RmDupsClean__plusPseudo50__DROP_SORT.vcf* can be relocated somewhere outside of the igenomes directory. fasta, fai, and dict files can be cleaned up or ignored from /juno/work/taylorlab/cmopipeline/mskcc-igenomes/igenomes/Homo_sapiens/GATK/GRCh37/Sequence/BWAIndex/ because i don't believe they are being used by tempo.

some of the vcf files are also unzipped in the juno folder, but on igenomes they only exist as zipped files. this might cause confusion as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant