Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get location for analysed sites #16

Open
ZKai0801 opened this issue Aug 17, 2020 · 2 comments
Open

get location for analysed sites #16

ZKai0801 opened this issue Aug 17, 2020 · 2 comments

Comments

@ZKai0801
Copy link

ZKai0801 commented Aug 17, 2020

Hi @Beifang and @owehann ,

Thank you for developing such a wonderful tool!

I would like to get all locations of analysed sites and I found someone asked the same question here:
ding-lab/msisensor#49

I tried to retrieve all counted locations in your mentioned manner (by default i.e. sites where reads depth >= 20), however, it didn't generate correct output as it does not match the number shown on .prefix file.

So I assume MSIsensor and MSIsensor2 used different criteria for selecting readily analysed sites. Is there a way I could retrieve those sites?

Thank you very much!

@owehann
Copy link
Contributor

owehann commented Aug 17, 2020

Thanks for your attention on MSIsensor2
Actually, MSIsensor and MSIsensor2 used the same criteria for selecting microsatellites. You can easily get all analysed sites with their distributions in prefix_dis file. Samtools can also used to get the detail information of a microsatellite in bam files, and the premise is to pay attention that left and right flank must match with the reference.

@ZKai0801
Copy link
Author

Hi @owehann ,

Thank you so much for your reply!

That seems quite odds, because what I did is simply scanning through the prefix_dis file. And whenever I encounter a location contain a number that >= 20, then its cooresponding location will be directed to a output file.

Here is a Python script I used to retrieve sites:

import sys

if __name__ == "__main__":
    with open(sys.argv[1], "r") as fh:
        for line in fh:
            if line.startswith("chr"):
                coordinate_line = line
                continue
            depths = [int(i) for i in line.strip().lstrip("T:").split() if int(i) >= 20]
            if not depths:
                continue
            print(coordinate_line, end="")

To be specific, with the use of this script, sometimes I get less analysed locations than the number shown on prefix file. Is there something wrong with my script? Or would you mind if I sent you an example BAM file?

Thanks for your helping!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants