🚧 WIP: Batch run #125

itzamna314 · 2020-05-24T23:09:10Z

This is the beginnings of a container that can run the serratus pipeline end-to-end. I'm not sure what settings I need for bowtie2 though, so I haven't been able to get past those runs.

I'm also not sure if its appropriate to write to all 3 pipes, and then try to do both flavors of bowtie (paired and unpaired?), or if we need to figure out which scenario we're in and only run the one bowtie process.

I think the input to the container is good though, so hopefully we're on the right track

rcedgar · 2020-05-24T23:11:08Z

I believe we can and should simplify by always using unpaired mode of bowtie2 and not using --split-files option of fastq-dump. That way, the same command-line should work for any SRA dataset AFAIK. Artem can correct me if I'm wrong here. That way, we only need one pipe, no need for named pipes.

rcedgar · 2020-05-24T23:13:10Z

I believe the only option we need for bowtie2 is --very-sensitive-local with /dev/stdin for unpaired fastq input.

rcedgar · 2020-05-24T23:17:32Z

I would suggest the following simplification & optimization. Combine the bowtie2, prefetch, fastq-dump and samtools binary files, summarizer.py and the bowtie2 index files into one tarball on S3. When the container starts, install aws cli and python3 base only. Then copy the tarball and decompress it. At that point the container is ready to do

prefetch SRA12345

fastq-dump SRA12345 | bowtie2 | summarizer.py | samtools > output.bam # single pipe

aws s3 cp output.bam s3://serratus-public/out/...

itzamna314 · 2020-05-24T23:37:08Z

I would suggest the following simplification & optimization. Combine the bowtie2, prefetch, fastq-dump and samtools binary files, summarizer.py and the bowtie2 index files into one tarball on S3. When the container starts, install aws cli and python3 base only. Then copy the tarball and decompress it. At that point the container is ready to do

That actually adds a lot of complexity. Using Docker, we simply build an image with all of those executables installed. Then when we create a container, they're ready to go instantly.

I'll see if I can guess the parameters right for bowtie2. I've never used it before though, I have no background in biology. I know how to get the executables where they need to be, but not so much what they do or how to run them.

rcedgar · 2020-05-24T23:41:39Z

I have no background in Docker, so my bad on that -- I'm trying to learn but am struggling so far.

I think this command-line for bowtie2 should work with unpaired FASTQ from a pipe, sending SAM output to a pipe:

bowtie2 -x INDEXNAME --very-sensitive-local -U /dev/stdin

Contact me by email [email protected] or the serratus-bioinformatics slack channel if you need help with the informatics pipe.

itzamna314 · 2020-05-24T23:43:55Z

All good, happy to help clear 🐳 stuff up 👍

Where can I find the value from INDEXNAME for this scenario? It comes from a JOB_JSON file in the full serratus pipeline.

I think that's the piece I'm missing to get this running. I'll ping you over on the serratus slack 👍. Thanks!

ababaian · 2020-05-25T00:44:21Z

You'll need a genome/sequence file and index of that genome to run bowtie. In essence it takes takes short little bits of DNA and tries to place them in a big piece of DNA. Kind of like a fuzzy regex.

Genome + Bowtie2 Index Files : aws s3 sync s3://serratus-public/seq/cov3a/ ./

As long as those files are in the same directory as bowtie2 you can run -x cov3a (or whatever the prefix to the .bt2 files is)

itzamna314 requested review from ababaian and brietaylor May 24, 2020 23:09

itzamna314 force-pushed the run-batch branch 2 times, most recently from a448262 to 6bec9d0 Compare May 25, 2020 01:14

Kyl Wellman added 2 commits May 24, 2020 19:15

📝 Add instructions to run batch container

85ff975

🚧 WIP: Batch run

e66b40a

itzamna314 force-pushed the run-batch branch from 6bec9d0 to e66b40a Compare May 25, 2020 01:15

✨ Data pre-loading support, stdout option

8583941

itzamna314 marked this pull request as ready for review May 26, 2020 16:21

brietaylor removed their request for review August 31, 2022 16:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚧 WIP: Batch run #125

🚧 WIP: Batch run #125

itzamna314 commented May 24, 2020

rcedgar commented May 24, 2020 •

edited

Loading

rcedgar commented May 24, 2020

rcedgar commented May 24, 2020 •

edited

Loading

itzamna314 commented May 24, 2020

rcedgar commented May 24, 2020

itzamna314 commented May 24, 2020

ababaian commented May 25, 2020

🚧 WIP: Batch run #125

Are you sure you want to change the base?

🚧 WIP: Batch run #125

Conversation

itzamna314 commented May 24, 2020

rcedgar commented May 24, 2020 • edited Loading

rcedgar commented May 24, 2020

rcedgar commented May 24, 2020 • edited Loading

itzamna314 commented May 24, 2020

rcedgar commented May 24, 2020

itzamna314 commented May 24, 2020

ababaian commented May 25, 2020

rcedgar commented May 24, 2020 •

edited

Loading

rcedgar commented May 24, 2020 •

edited

Loading