Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the number of published files from alevin-fry output #178

Open
tomsing1 opened this issue Nov 6, 2022 · 2 comments
Open

Reduce the number of published files from alevin-fry output #178

tomsing1 opened this issue Nov 6, 2022 · 2 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@tomsing1
Copy link

tomsing1 commented Nov 6, 2022

Description of feature

I am using the alevin-fry quantitation method. For downstream analysis, I am mainly interested in the final count matrix, e.g.

  1. The content of the af_quant/alevin directory and
  2. The af_quant/quant.json file

Right now, the workflow publishes lots of other files - some of them very large - as well, e.g. the af_map output directory or the af_quant/alevin/map.collated.rad which can be tens of gigabytes in size for large experiments.

It would be great to be able to whittle down the published files to reduce the size of the pipeline's output. (After all, the intermediate files are still available in the working directory.)

For example, I am running nextflow on AWS Batch with an S3 bucket as the publish directory. It takes many times longer to copy the output files to the bucket than to run the actual workflow (because the publishing is not parallelized*.)

*If there is a way to speed this up, I would love to learn!

@tomsing1 tomsing1 added the enhancement New feature or request label Nov 6, 2022
@grst
Copy link
Member

grst commented Nov 17, 2022

I agree we don't need all these intermediate files. Happy to accept a PR!

We could add an option --save_align_intermediates as in the rnaseq workflow:
image

@grst
Copy link
Member

grst commented Nov 7, 2024

Depends on #361

@grst grst added this to scrnaseq Nov 7, 2024
@grst grst moved this from Todo high priority to Todo - medium priority in scrnaseq Nov 7, 2024
@github-project-automation github-project-automation bot moved this to Todo high priority in scrnaseq Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
Status: Todo - medium priority
Development

No branches or pull requests

2 participants