Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Initializing" takes forever #103

Open
olgabot opened this issue Feb 8, 2019 · 4 comments
Open

"Initializing" takes forever #103

olgabot opened this issue Feb 8, 2019 · 4 comments

Comments

@olgabot
Copy link
Contributor

olgabot commented Feb 8, 2019

Hello,

I'm running this workflow:

param (
    // S3 path to 10x folder
    tenx string

    // Full s3 file location to put the sourmash signature
    output string

    // Size of kmer(s) to use
    ksizes = "21,33,51"

    // choose number of hashes as 1/scaled of input k-mers
    scaled = 0

    // Number of kmer hashes to use
    num_hashes = 1000

    // Calculate protein signature
    protein = true

    // Calculate DNA signature
    dna = true

    // Number of processes
    processes = 8

    // Name of the bam file in the tenx folder
    BAM_FILENAME = "possorted_genome_bam.bam"

    // Name of the single-column barcodes file in the tenx folder
    BARCODES = "barcodes.tsv"
)

// Instantiate the system modules "files" (system modules begin
// with $), assigning its instance to the "files" identifier. To
// view the documentation for this module, run "reflow doc
// $/files".
val files = make("$/files")
val dirs = make("$/dirs")


sourmash := make("./sourmash.rf")


// bam2fastx Docker image
val bam2fastx = "czbiohub/bam2fastx"


// Compute a minhash signature for a sample
@requires(cpu := 4, mem := 16*GiB, disk := 4*GiB)
func TenXBamToFasta(tenx dir) = {
    outdir := exec(image := bam2fastx) (output dir) {"
            bam2fastx fasta {{tenx}} --all-cells-in-one-file --output {{output}}
    "}

    val (fasta, _) = dirs.Pick(outdir, "*.fasta")

    // Return single fasta
    fasta
}



// Instantiate Go system module "strings"
val strings = make("$/strings")



@requires(cpu := 1, mem := 16*GiB)
val Main = {
    val tenx_folder = dir(tenx)
    val (bam, _) = dirs.Pick(tenx_folder, "*.bam")
    val (bai, _) = dirs.Pick(tenx_folder, "*.bai")
    val (barcodes, _) = dirs.Pick(tenx_folder, BARCODES)

    val renamed = map([(BAM_FILENAME, bam), 
        (BAM_FILENAME + ".bai", bai), 
        (BARCODES, barcodes)])
    val minimal_tenx_dir = dirs.Make(renamed)

    fasta := TenXBamToFasta(minimal_tenx_dir)
    reads := [fasta]

    singleton := false

    sourmash_sketch := sourmash.Compute(reads, scaled, ksizes, protein, 
        dna, singleton)
    files.Copy(sourmash_sketch, output)
}

The data gets transferred just fine but then the reflow run command claims the job is running and yet the reflow ps command shows it is initializing. Who is right? I've been stuck at the "initalizing" phase for many hours for this file, this is just a fresh example to show the inputs.

Below is a screenshot of the output from this command:

reflow -log=debug -cache=off run -trace /home/olga/code/kmer-hashing/sourmash/maca/10x_spleen_kidney/../../../reflow/sourmash_compute_10x.rf -tenx s3://czbiohub-maca/10x_data/10X_P4_7 -output s3://olgabot-maca/10x/sourmash_compute/ksizes=21,27,33,51_num_hashes=5000/Spleen_10X_P4_7.sig -ksizes 21,27,33,51 -num_hashes 5000

screen shot 2019-02-08 at 8 20 07 am

Thank you!
Warmest,
Olga

@olgabot
Copy link
Contributor Author

olgabot commented Feb 12, 2019

Update: this is still "initializing" ...

screen shot 2019-02-12 at 7 42 13 am

@olgabot
Copy link
Contributor Author

olgabot commented Feb 12, 2019

Here's the end of that text:

2019/02/12 07:32:10 ec2cluster: pending{}
2019/02/12 07:33:10 ec2cluster: pending{}
2019/02/12 07:34:10 ec2cluster: pending{}
2019/02/12 07:35:10 ec2cluster: pending{}
2019/02/12 07:36:10 ec2cluster: pending{}
2019/02/12 07:37:10 ec2cluster: pending{}
2019/02/12 07:38:10 ec2cluster: pending{}
2019/02/12 07:39:10 ec2cluster: pending{}
2019/02/12 07:40:10 ec2cluster: pending{}
2019/02/12 07:41:10 ec2cluster: pending{}
2019/02/12 07:42:10 ec2cluster: pending{}
2019/02/12 07:43:10 ec2cluster: pending{}
2019/02/12 07:44:10 ec2cluster: pending{}
2019/02/12 07:45:10 ec2cluster: pending{}
2019/02/12 07:46:10 ec2cluster: pending{}
2019/02/12 07:47:10 ec2cluster: pending{}
2019/02/12 07:48:10 ec2cluster: pending{}
2019/02/12 07:49:10 ec2cluster: pending{}
2019/02/12 07:50:10 ec2cluster: pending{}
2019/02/12 07:51:10 ec2cluster: pending{}
2019/02/12 07:52:10 ec2cluster: pending{}
2019/02/12 07:53:10 ec2cluster: pending{}
2019/02/12 07:54:10 ec2cluster: pending{}
2019/02/12 07:55:10 ec2cluster: pending{}
ec2cluster: 1 instances: r4.xlarge:1 (<=$0.3/hr), total{mem:28.4GiB cpu:4 disk:2.4TiB intel_avx:4 intel_avx2:4}, waiting{}, pend
48cfcb94: elapsed: 96h0m, running:1, completed: 1/3
  sourmash_compute_10x.TenXBamToFasta.outdir:  exec czbiohub/bam2fastx bam2fastx fasta {{tenx}} --al..-one-file --outp  96h6m18s

@prasadgopal
Copy link
Collaborator

prasadgopal commented Feb 12, 2019 via email

@olgabot
Copy link
Contributor Author

olgabot commented Feb 13, 2019

There was a bug with 0.6.8 so I don't use it. This is 0.6.7:

 reflow version
0.6.7 (go1.10)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants