Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overlapping sliding window #176

Open
michaeljmetzger opened this issue Dec 5, 2020 · 2 comments
Open

overlapping sliding window #176

michaeljmetzger opened this issue Dec 5, 2020 · 2 comments

Comments

@michaeljmetzger
Copy link

This is great! Currently it appears that the windows are non-overalpping (ie. 1-1000, 1001-2000, 2001-3000, etc). I was wondering if you have developed any way to modify the method to use an overlapping sliding window (ie. 1-1000, 101-1101, 201-1201, etc). We were thinking this would could allow for more precise definitions of the copy number breakpoints, while still using data from a large window size.

Thanks,
Michael

@knausb
Copy link
Owner

knausb commented Dec 7, 2020

Hi Michael,

I'm not sure I follow you, do you think you could come up with an example? I think you could come up with overlapping windows by altering the winsize parameter. But I'm not sure how to combine the different runs. Also note that the more windows you have the more computational time it will require. Because this is an analysis of heterozygous positions it needs CNV that are large relative to the rate of heterozygosity in your organism. So it will miss small features. If you're interested in precise identification of break points you may want to include coverage data, such as samtools pileup. With the caveat that Illumina coverage data is highly variable, so it has it's challenges as well.

Good luck!
Brian

@michaeljmetzger
Copy link
Author

Thanks for your response. My understanding of vcfR is that it breaks the genome into non-overlaping segments (windows). The winsize parameter is the length of each of these. So for the window size of 1000 the first segment would be 1-1000 and the second would be 1001-2000. For a sliding window, would be two parameters: window size and step size. For example, if you have a window size of 1000 and a step size of 100 the first segment would be 1-1000 and the second would be 100-1100. It would require some different calculation of the final coverage, as each position would be covered by multiple windows. It sounds like this has not been made for this program. If we can get it working, we can let you know.
Thanks,
Michael

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants