Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Haplotagged CRAM and bedmethyl #355

Open
umranyaman opened this issue Jan 27, 2025 · 4 comments
Open

Haplotagged CRAM and bedmethyl #355

umranyaman opened this issue Jan 27, 2025 · 4 comments
Labels
question Looking for clarification on inputs and/or outputs

Comments

@umranyaman
Copy link

umranyaman commented Jan 27, 2025

Hello,

I do have a haplotagged CRAM file, and phased outputs from the epi2melabs/wf-human-variation workflow, we have used --mod with --phase parameter. I have bedmethyl files for haplotypes 1, 2 and ungrouped ones. Would it make sense to sum the bedmethyl files to generate bedmethyl outputs with bedtools unionbedg instead of running modkit pileup --combine-strands with cram file again? When I run modkit pileup with cram file it says "non-BAM.. CRAM may be unstable"

bedtools unionbedg -i hap1.bed hap2.bed ungrouped.bed > combined.bed

Thanks very much!

@ArtRand
Copy link
Contributor

ArtRand commented Jan 28, 2025

Hello @umranyaman,

It sounds like you want a bedMethyl with all of the counts aggregated, as through you had run pileup without partitioning on the HP tag. I would use modkit bedmethyl merge to combine the bedmethyl files you have already. You could use the untagged ones or omit them. Regarding CRAM, it is supported - it's just a bit slower and not tested quite as heavily so Modkit warns you. I could probably get rid of this warning to be honest.

@ArtRand ArtRand added the question Looking for clarification on inputs and/or outputs label Jan 28, 2025
@umranyaman
Copy link
Author

umranyaman commented Jan 28, 2025

Thank you! It indeed makes sense to aggregate the hap1 and 2. I am still not sure about discarding ungrouped calls. Is there any reason modkit pileup including ungrouped ones, e.g. why it could be useful.

Sorry for the confusion.

Thank you!

@ArtRand
Copy link
Contributor

ArtRand commented Jan 29, 2025

Hello @umranyaman,

Regarding using the ungrouped reads, it depends on what you're trying to do. The ungrouped reads are ones that don't have the haplotag, so I imagine this means the phasing algorithm couldn't assign them confidently or they don't overlap any HET variants. If you give me a few more details on what you're going - maybe I can be of more help.

@umranyaman
Copy link
Author

Thanks @ArtRand!

I am considering downstream analyses. I will perform differential methylation analysis across multiple samples and identify genes associated with the trait using aggregated methylation. From there, I would investigate whether the gene has allele-specific methylation as well, here I am not sure how to perform this across samples yet, but the end goal is to have aggregated vs haplotype-specific methylation levels. If I compare hap1 vs hap2 across samples, it discards ungrouped already, so I am wondering whether I should discard them on the aggregated version as well.

Thanks very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Looking for clarification on inputs and/or outputs
Projects
None yet
Development

No branches or pull requests

2 participants