Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output multiple artifacts per primer, similar to Cutadapt's demultiplexing method #60

Open
lina-kim opened this issue Nov 6, 2023 · 4 comments
Assignees

Comments

@lina-kim
Copy link

lina-kim commented Nov 6, 2023

Addition Description
It would be useful to bin reads by primer prior to primer removal. I'd like to separate a single FASTQ-based artifact (containing several different primers) into multiple output artifacts by primer; each output artifact would be characterized by a single primer. This would be helpful for meta-analyses in which sequences with multiple primers/variable regions may be found in a single QIIME artifact.

This is possible with native Cutadapt (as of v4.5) using steps to demultiplex, but not in the QIIME 2 plugin as its inputs are restricted to specific semantic types.

Current Behavior

  • The QIIME 2 plugin performs a similar function with qiime cutadapt demux (based on adapter sequence), but generates only a single output for demultiplexed sequences. It also requires an input artifact of type MultiplexedSingleEndBarcodeInSequence and does not accept SampleData[Single/PairedEndSequencesWithQuality].
  • qiime cutadapt trim could technically perform this by running the command once per primer (pair), but that is quite inefficient.

Proposed Behavior

  • q2-cutadapt would take as input 1) a FASTQ artifact of SampleData[Single/PairedEndSequencesWithQuality], which contains N different primer sequences among its many reads, and 2) a tab-separated metadata file containing the N primer names and corresponding primer sequences.
  • As output, it would generate N artifacts of SampleData[Single/PairedEndSequencesWithQuality]; each output artifact would contain reads of the same primer sequence. There would also be an output artifact (also SampleData[Single/PairedEndSequencesWithQuality]) of sequences that did not have any of the N primer names.

Questions

  1. Does QIIME 2 allow for variable numbers of output artifacts? I suppose that would be a blocker to implementation.

References

  1. Cutadapt manual, "Demultiplexing"
  2. QIIME 2 docs, qiime cutadapt trim-paired
@ebolyen
Copy link
Member

ebolyen commented Jul 17, 2024

This is totally possible now with Collection[...] as an output. Is this something you would be interested in working on @lina-kim?

@lina-kim
Copy link
Author

Great to know, thanks @ebolyen! Yes, I would be more than happy to work on it. Is Collection[...] a semantic type found in q2-types / the base QIIME 2 installation? I'm not seeing much documentation for it on first glance.

@gregcaporaso
Copy link
Member

Hey @lina-kim, I am actually working on some tutorial content that includes Collection right now. You can see the working draft here. Note that you'll only be able to access this tutorial page through this like as it's built from a pull-request (so you won't find this content if you navigate from https://develop.qiime2.org yet). This link will also break once the corresponding PR is merged.

You can also find the new API docs on Collection here.

Want to take a look at that and let us know if you have questions about how to use Collection?

@lina-kim
Copy link
Author

Perfect, thanks for the resources @gregcaporaso! I'll check them out and get back to you with any questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

3 participants