Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue] pileup "feature" #47

Open
vpbrendel opened this issue May 15, 2022 · 4 comments
Open

[Issue] pileup "feature" #47

vpbrendel opened this issue May 15, 2022 · 4 comments

Comments

@vpbrendel
Copy link

The pileup.c code implicitly relies on alphabetical ordering of the SQ entries in the input sam/bam header, at least for the tsv display.

The data columns in the tsv file are displayed in rows corresponding to the order of the SQ entries in the header. However, the names in the rows are always alphabetized.

This leads to wrong association of names versus data if the entries in the header are not in alphabetical order.

How to reproduce? Create a bam input with header SQ lines for chr1 and chr2 and corresponding data. Then re-run with only the two header SQ lines in opposite order (chr2 before chr1). The tsv output will then display the chr1 data under name chr2 and the chr2 data under name chr1.

@zwdzwd
Copy link
Collaborator

zwdzwd commented May 15, 2022

Thanks for your report. Can you post this to https://github.com/huishenlab/biscuit ? But I don't understand your question since the SAM header does fully describe the chromosome names. If you switch it manually it will get displayed differently.

@vpbrendel
Copy link
Author

This came up using an existing (bismark) generated bam input file for which the header had sequence names NOT in alphabetical order. The pileup code includes a sorting function that puts the names in alphabetical order. That's what shows in the tsv file. However, the data entries are in the order of the bam header, and thus the table gives wrong associations.

@zwdzwd
Copy link
Collaborator

zwdzwd commented May 16, 2022

Thanks for reporting. I think you are right. Can you confirm the tsv you mentioned is the meth average stats tsv not the methylation call? If so, I think I know how to fix it (should just affect the meth average stats)

@vpbrendel
Copy link
Author

Correct. Methylation average stats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants