Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNA Searching #19

Open
averagehat opened this issue Aug 11, 2015 · 9 comments
Open

DNA Searching #19

averagehat opened this issue Aug 11, 2015 · 9 comments

Comments

@averagehat
Copy link
Collaborator

From Lewis:

There are a couple of approaches to DNA searching we could take with SEQR. One is to do a translating blastx type search that takes a DNA sequence, converts it to 6 possible protein sequences through a 6 frame translation, then uses these protein sequences to do a search against known protein sequences. This is a nice approach as if there is a good protein hit, you can be relatively assured that your DNA does translate into a protein in nature. Protein searches also tend to be more sensitive.

That is already in progress here

The other approach is to do a straight DNA to DNA search. We need to do a bit of research on this, but it is uncomplicated. The only tricky bit is that DNA sequences can be as large as a chromosome, so you need to incorporate some locality in the search. Fortunately, this is quite easy to do.

This would let a lot of people use seqr as a drop-in replacement

@seandavi
Copy link

So very neat!

@pcantalupo
Copy link

SEQR team,

Yes, this would be super cool as one who uses a BLAST pipeline for annotation of viral metagenomes. I had to ditch BLASTX in favor of Rapsearch since BX became way to slow for annotation of thousands of sequences. Even BLASTN (mega) is starting to feel slow.

Will there be an announcement when a beta version of SEQR is ready for user testing? I'd love to see how it compares.

Thank you,

Paul

@lewisg-ncbi
Copy link
Collaborator

Hi Paul,

Yes absolutely! We should aim to do the translating search (blastx) as it's easy to do.

Best,
Lewis


From: Cutf1 Rmrf [[email protected]]
Sent: Wednesday, August 12, 2015 11:49 PM
To: DCGenomics/seqr
Subject: Re: [seqr] DNA Searching (#19)

SEQR team,

Yes, this would be super cool as one who uses a BLAST pipeline for annotation of viral metagenomes. I had to ditch BLASTX in favor of Rapsearchhttps://github.com/zhaoyanswill/RAPSearch2 since BX became way to slow for annotation of thousands of sequences. Even BLASTN (mega) is starting to feel slow.

Will there be an announcement when a beta version of SEQR is ready for user testing? I'd love to see how it compares.

Thank you,

Paul


Reply to this email directly or view it on GitHubhttps://github.com//issues/19#issuecomment-130521651.

@DCGenomics
Copy link
Contributor

This is great! You guys are awesome!

Another big use case is to search metagenomes (in SRA) for viruses.

Some seminal work (these folks did not get nearly as far with their project
as you did) is outlined here:

https://github.com/DCGenomics/hackathon_v001_metagenomics

Cheers!

Ben

On Thu, Aug 13, 2015 at 2:03 AM, lewisg-ncbi [email protected]
wrote:

Hi Paul,

Yes absolutely! We should aim to do the translating search (blastx) as
it's easy to do.

Best,
Lewis


From: Cutf1 Rmrf [[email protected]]
Sent: Wednesday, August 12, 2015 11:49 PM
To: DCGenomics/seqr
Subject: Re: [seqr] DNA Searching (#19)

SEQR team,

Yes, this would be super cool as one who uses a BLAST pipeline for
annotation of viral metagenomes. I had to ditch BLASTX in favor of
Rapsearchhttps://github.com/zhaoyanswill/RAPSearch2 since BX became way
to slow for annotation of thousands of sequences. Even BLASTN (mega) is
starting to feel slow.

Will there be an announcement when a beta version of SEQR is ready for
user testing? I'd love to see how it compares.

Thank you,

Paul


Reply to this email directly or view it on GitHub<
https://github.com/DCGenomics/seqr/issues/19#issuecomment-130521651>.


Reply to this email directly or view it on GitHub
#19 (comment).

What have you done today to make the world a better place?

@DCGenomics
Copy link
Contributor

Im also cc'ing Keith on this string, as he may have some constructive ideas.

Cheers!

Ben

On Thu, Aug 13, 2015 at 8:51 AM, Ben Busby [email protected] wrote:

This is great! You guys are awesome!

Another big use case is to search metagenomes (in SRA) for viruses.

Some seminal work (these folks did not get nearly as far with their
project as you did) is outlined here:

https://github.com/DCGenomics/hackathon_v001_metagenomics

Cheers!

Ben

On Thu, Aug 13, 2015 at 2:03 AM, lewisg-ncbi [email protected]
wrote:

Hi Paul,

Yes absolutely! We should aim to do the translating search (blastx) as
it's easy to do.

Best,
Lewis


From: Cutf1 Rmrf [[email protected]]
Sent: Wednesday, August 12, 2015 11:49 PM
To: DCGenomics/seqr
Subject: Re: [seqr] DNA Searching (#19)

SEQR team,

Yes, this would be super cool as one who uses a BLAST pipeline for
annotation of viral metagenomes. I had to ditch BLASTX in favor of
Rapsearchhttps://github.com/zhaoyanswill/RAPSearch2 since BX became
way to slow for annotation of thousands of sequences. Even BLASTN (mega) is
starting to feel slow.

Will there be an announcement when a beta version of SEQR is ready for
user testing? I'd love to see how it compares.

Thank you,

Paul


Reply to this email directly or view it on GitHub<
https://github.com/DCGenomics/seqr/issues/19#issuecomment-130521651>.


Reply to this email directly or view it on GitHub
#19 (comment).

What have you done today to make the world a better place?

What have you done today to make the world a better place?

@pcantalupo
Copy link

@DCGenomics wrote:

search metagenomes (in SRA) for viruses.

I'm already on it ;) that's part of my computational work in the lab.

@averagehat
Copy link
Collaborator Author

I don't have very good tests yet, but from what I can tell, the frame-translation is working as expected thanks to @crashfrog's work at the hackathon.

I think a good next step might be to support blast-type file output formats. Edit: Thanks to @nabihlme's work, we have a lot of file formats already; probably want to test that these are faithful to blast.

This might also be a good opportunity for people to voice what features (commandline options) are high-priority for use in existing pipelines.

@pcantalupo
Copy link

@averagehat
Here are options that I would be interested in seeing (listed in decreasing order of importance):

-num_threads
-outfmt 6 std with these additional specifiers: qcovs, qcovhsp, qlen, slen
-show_gis
-evalue
-lcase_masking
-max_target_seqs
-word_size
-perc_identity

Thank you, Paul

@averagehat
Copy link
Collaborator Author

I made a separate issue for the options discussion here: #22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants