Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not put all csv string in memory #66

Open
LuyiTian opened this issue May 2, 2018 · 4 comments
Open

do not put all csv string in memory #66

LuyiTian opened this issue May 2, 2018 · 4 comments
Assignees

Comments

@LuyiTian
Copy link
Owner

LuyiTian commented May 2, 2018

currently, in the sc_demultiplex step, the program will keep all the reads in memory and then open CSV and write to them one by one (because we cant open too many files at the same time). it will become a problem when we have a very large dataset.

to solve this. we need to write to CSV file during we process the bam file. it can be that every 100000 reads we trigger a write_to_csv thing and clear the cache.

@LuyiTian LuyiTian self-assigned this May 2, 2018
@cpattaroni
Copy link

I currently have a massive dataset (6 billion reads to start with). Memory was an issue but I managed to get it running with 900Gb RAM... It completed the task, got to the next line but the output was completely empty. Any idea why?

@cpattaroni
Copy link

I confirm there is an issue for big files (starting with 6 billions reads). Even with enough RAM, it is not writing .csv file properly.

@LuyiTian
Copy link
Owner Author

LuyiTian commented Feb 8, 2021

@cpattaroni what protocol did you use? if it is a combination of multiple run you could split the data and run each one separately.

@cpattaroni
Copy link

@LuyiTian It's QIAseq UPX 3' Transcriptome, not very common. It worked fine a year ago with a small run and previous versions of R and scPipe.

I've tried splitting it into 4, still not writing it. I've stopped using it for this reason, trying with umi-tools at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants