do not put all csv string in memory #66

LuyiTian · 2018-05-02T03:07:49Z

currently, in the sc_demultiplex step, the program will keep all the reads in memory and then open CSV and write to them one by one (because we cant open too many files at the same time). it will become a problem when we have a very large dataset.

to solve this. we need to write to CSV file during we process the bam file. it can be that every 100000 reads we trigger a write_to_csv thing and clear the cache.

The text was updated successfully, but these errors were encountered:

cpattaroni · 2021-01-24T22:57:01Z

I currently have a massive dataset (6 billion reads to start with). Memory was an issue but I managed to get it running with 900Gb RAM... It completed the task, got to the next line but the output was completely empty. Any idea why?

cpattaroni · 2021-01-26T05:42:32Z

I confirm there is an issue for big files (starting with 6 billions reads). Even with enough RAM, it is not writing .csv file properly.

LuyiTian · 2021-02-08T05:50:54Z

@cpattaroni what protocol did you use? if it is a combination of multiple run you could split the data and run each one separately.

cpattaroni · 2021-02-08T06:28:47Z

@LuyiTian It's QIAseq UPX 3' Transcriptome, not very common. It worked fine a year ago with a small run and previous versions of R and scPipe.

I've tried splitting it into 4, still not writing it. I've stopped using it for this reason, trying with umi-tools at the moment.

LuyiTian self-assigned this May 2, 2018

LuyiTian added the enhancement label May 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

do not put all csv string in memory #66

do not put all csv string in memory #66

LuyiTian commented May 2, 2018

cpattaroni commented Jan 24, 2021

cpattaroni commented Jan 26, 2021

LuyiTian commented Feb 8, 2021

cpattaroni commented Feb 8, 2021

do not put all csv string in memory #66

do not put all csv string in memory #66

Comments

LuyiTian commented May 2, 2018

cpattaroni commented Jan 24, 2021

cpattaroni commented Jan 26, 2021

LuyiTian commented Feb 8, 2021

cpattaroni commented Feb 8, 2021