Reduce the number of published files from alevin-fry output #178

tomsing1 · 2022-11-06T21:23:06Z

Description of feature

I am using the alevin-fry quantitation method. For downstream analysis, I am mainly interested in the final count matrix, e.g.

The content of the af_quant/alevin directory and
The af_quant/quant.json file

Right now, the workflow publishes lots of other files - some of them very large - as well, e.g. the af_map output directory or the af_quant/alevin/map.collated.rad which can be tens of gigabytes in size for large experiments.

It would be great to be able to whittle down the published files to reduce the size of the pipeline's output. (After all, the intermediate files are still available in the working directory.)

For example, I am running nextflow on AWS Batch with an S3 bucket as the publish directory. It takes many times longer to copy the output files to the bucket than to run the actual workflow (because the publishing is not parallelized*.)

*If there is a way to speed this up, I would love to learn!

The text was updated successfully, but these errors were encountered:

grst · 2022-11-17T15:12:29Z

I agree we don't need all these intermediate files. Happy to accept a PR!

We could add an option --save_align_intermediates as in the rnaseq workflow:

grst · 2024-11-07T13:12:54Z

Depends on #361

tomsing1 added the enhancement New feature or request label Nov 6, 2022

grst added this to scrnaseq Feb 21, 2023

github-project-automation bot moved this to Todo in scrnaseq Feb 21, 2023

grst mentioned this issue Feb 21, 2023

STARsolo Workflow Doesn't Output Solo.out counts #187

Closed

grst added the good first issue Good for newcomers label Oct 26, 2023

grst removed this from scrnaseq Nov 7, 2024

grst added this to scrnaseq Nov 7, 2024

grst moved this from Todo high priority to Todo - medium priority in scrnaseq Nov 7, 2024

github-project-automation bot moved this to Todo high priority in scrnaseq Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce the number of published files from alevin-fry output #178

Reduce the number of published files from alevin-fry output #178

tomsing1 commented Nov 6, 2022

grst commented Nov 17, 2022

grst commented Nov 7, 2024

Reduce the number of published files from alevin-fry output #178

Reduce the number of published files from alevin-fry output #178

Comments

tomsing1 commented Nov 6, 2022

Description of feature

grst commented Nov 17, 2022

grst commented Nov 7, 2024