How does a small database affect the taxonomic assignment? #2023

Fmicro23 · 2024-09-18T10:00:28Z

Good morning!
I am working on marine aerobic methanotrophs, a group of microorganisms that oxidize methane in the presence of oxygen, and I used dada2 to analyse pmoA gene (econding methane monooxygenase) sequences that I took from a study. This study used a specific pmoA gene database to assign the taxonomy and I wondered IF and HOW a small database can affect the algorithm behind the "assignTaxonomy" function. If so, can you suggest me how to solve this problem?

Many many thanks in advance!

benjjneb · 2024-09-19T15:45:53Z

You can only assign to what is in your database. So that is one way. The second important way is that without outgroup sequences, you can make spurious assignments to what is in the database even when the query sequence is not very similar to anything in the reference database. This is because the way the naive Bayesian classifier method (implemented by assignTaxonomy) calculates the certainty of an assignment. It subsamples the query sequence and checks how often those subsample get assigned the same taxonomy. Small databases with no outgroup sequences are much more likely to have the subsamples match the same reference sequence as the full query simply because no other remotely similar sequences are in the database.

I don't know about pmoA references, but you could consider a method like IdTaxa in the DECIPHER package as an alternative approach. This method directly considers sequence similarity between query and best reference match when making assignments, and is more robust to the type of error I described above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does a small database affect the taxonomic assignment? #2023

How does a small database affect the taxonomic assignment? #2023

Fmicro23 commented Sep 18, 2024

benjjneb commented Sep 19, 2024

How does a small database affect the taxonomic assignment? #2023

How does a small database affect the taxonomic assignment? #2023

Comments

Fmicro23 commented Sep 18, 2024

benjjneb commented Sep 19, 2024