[1.2.0 - Ancient Destiny] Chunk fasta reads for better parallelization for revio pacbio data #285

DLBPointon · 2024-03-20T12:32:31Z

Description of feature

The size of the revio data is huge, this needs to be split into n = (reads / 10million) files. Mapping and then merge the output.

yumisims · 2024-03-20T13:25:06Z

if fasta size > 10G, then split the fasta.gz into N chunks, N= round( size_of_fasta/10)
pyfasta split -n N {sample}.fasta.gz

mcshane · 2024-03-20T13:35:58Z

@yumisims @DLBPointon. Maybe use https://nf-co.re/modules/seqkit_split2 ?

yumisims · 2024-03-20T13:39:23Z

or just zcat {sample}.fasta.gz | awk '/^>/{n++} { print > ("chunk_" int(n/N) ".fasta") }'
let's see

mcshane · 2024-03-20T13:52:29Z

seqkit split2 is multithreaded and will output gzipped chunks

DLBPointon changed the title ~~Chunk fasta reads for better parallelization for revio pacbio data~~ [1.2.0 - Ancient Destiny]Chunk fasta reads for better parallelization for revio pacbio data Mar 21, 2024

DLBPointon changed the title ~~[1.2.0 - Ancient Destiny]Chunk fasta reads for better parallelization for revio pacbio data~~ [1.2.0 - Ancient Destiny] Chunk fasta reads for better parallelization for revio pacbio data Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1.2.0 - Ancient Destiny] Chunk fasta reads for better parallelization for revio pacbio data #285

[1.2.0 - Ancient Destiny] Chunk fasta reads for better parallelization for revio pacbio data #285

DLBPointon commented Mar 20, 2024

yumisims commented Mar 20, 2024

mcshane commented Mar 20, 2024

yumisims commented Mar 20, 2024

mcshane commented Mar 20, 2024

[1.2.0 - Ancient Destiny] Chunk fasta reads for better parallelization for revio pacbio data #285

[1.2.0 - Ancient Destiny] Chunk fasta reads for better parallelization for revio pacbio data #285

Comments

DLBPointon commented Mar 20, 2024

Description of feature

yumisims commented Mar 20, 2024

mcshane commented Mar 20, 2024

yumisims commented Mar 20, 2024

mcshane commented Mar 20, 2024