-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚧 WIP: Batch run #125
base: master
Are you sure you want to change the base?
🚧 WIP: Batch run #125
Conversation
I believe we can and should simplify by always using unpaired mode of bowtie2 and not using --split-files option of fastq-dump. That way, the same command-line should work for any SRA dataset AFAIK. Artem can correct me if I'm wrong here. That way, we only need one pipe, no need for named pipes. |
I believe the only option we need for bowtie2 is --very-sensitive-local with /dev/stdin for unpaired fastq input. |
I would suggest the following simplification & optimization. Combine the bowtie2, prefetch, fastq-dump and samtools binary files, summarizer.py and the bowtie2 index files into one tarball on S3. When the container starts, install aws cli and python3 base only. Then copy the tarball and decompress it. At that point the container is ready to do prefetch SRA12345 fastq-dump SRA12345 | bowtie2 | summarizer.py | samtools > output.bam # single pipe aws s3 cp output.bam s3://serratus-public/out/... |
That actually adds a lot of complexity. Using Docker, we simply build an image with all of those executables installed. Then when we create a container, they're ready to go instantly. I'll see if I can guess the parameters right for |
I have no background in Docker, so my bad on that -- I'm trying to learn but am struggling so far. I think this command-line for bowtie2 should work with unpaired FASTQ from a pipe, sending SAM output to a pipe: bowtie2 -x INDEXNAME --very-sensitive-local -U /dev/stdin Contact me by email [email protected] or the serratus-bioinformatics slack channel if you need help with the informatics pipe. |
All good, happy to help clear 🐳 stuff up 👍 Where can I find the value from I think that's the piece I'm missing to get this running. I'll ping you over on the serratus slack 👍. Thanks! |
You'll need a genome/sequence file and index of that genome to run bowtie. In essence it takes takes short little bits of DNA and tries to place them in a big piece of DNA. Kind of like a fuzzy regex. Genome + Bowtie2 Index Files : As long as those files are in the same directory as |
a448262
to
6bec9d0
Compare
This is the beginnings of a container that can run the serratus pipeline end-to-end. I'm not sure what settings I need for
bowtie2
though, so I haven't been able to get past those runs.I'm also not sure if its appropriate to write to all 3 pipes, and then try to do both flavors of bowtie (paired and unpaired?), or if we need to figure out which scenario we're in and only run the one bowtie process.
I think the input to the container is good though, so hopefully we're on the right track