Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about regionStart - regionEnd #20

Open
aheritas opened this issue Nov 9, 2022 · 3 comments
Open

question about regionStart - regionEnd #20

aheritas opened this issue Nov 9, 2022 · 3 comments

Comments

@aheritas
Copy link

aheritas commented Nov 9, 2022

Hi Robbie,
This is my first time using QUILT, so I apologise if these are naive questions. I am in the first step, preparing and reformatting the reference panel. I would like to do that for each chromosome. I have downloaded the reference haplotype, legend and genetic maps from: https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.html (I would also appreciate your views if you think these are ok for an initial genome-wide test)

However I am unsure about what I should be inputing in these parameters: --regionStart= --regionEnd= . I would think that that would correspond to the start (1) to end (length) of each chromosome, but I am not sure. I am also not sure if leaving the default behaviour of leaving these parameters blank would allow QUILT to recognize that the entire chromosome should be read.

Thank you in advance for your time and for developing this tool.

@rwdavies
Copy link
Owner

Hey,

Apologies I never saw this originally.

I think that data above is OK, especially for a genome-wide test, but I would consider newer data, like the 1000 Genomes Project NYGC re-sequencing effort
https://www.internationalgenome.org/data-portal/data-collection/30x-grch38

I actually for a colleague here in Oxford imputed some of their samples using that resource, and put some scripts here
https://github.com/rwdavies/QUILT-wrap
It's written using snakemake. Not sure if it's easily generalizable, but hopefully you can read enough of the "main.smk" file to get a sense of what it's doing.

For regionStart, regionEnd and buffer, I would recommend imputing in regions of ~5Mbp size, for a panel of this size for humans, with a buffer of maybe 500000bp. So e.g. you might do
--chr=chr1 --regionStart=1 --regionEnd=5000000 --buffer=500000
--chr=chr1 --regionStart=5000001 --regionEnd=10000000 --buffer=500000
etc

Again sorry for the slow reply, every once in a while I miss these, especially during term

Best wishes
Robbie

@bbdragon1
Copy link

Hi
I am using QUILT2 now.If I perform imputation across the entire chromosome, I can scan the total span of the chromosome in the VCF file, such as human chromosome 1. But how can I best determine the size of this chunk, and is there a simpler method to input the start and end regions (currently, I'm using a for loop to read them)

@PMuchina
Copy link

PMuchina commented Oct 31, 2024

To determine the chunk size you can use this:
dat <- QUILT::quilt_chunk_map("chr20", "package/CEU-chr20-final.b38.txt.gz")
str(dat)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants