Built-in CnView workaround does not prevent errors as intended #205

GrubLord · 2016-05-05T02:26:33Z

Hi guys,

I'm been encountering an error in genome/lib/perl/Genome/Model/Tools/CopyNumber/CnView.pm.R

My process grinds to a halt with the following error:

Error in `[<-.data.frame`(`*tmp*`, which(segments[, "Adjusted_CN_DIFF"] >  :
  replacement has 1 row, data has 0
Calls: [<- -> [<-.data.frame
Execution halted

Now, at first I thought it was because the segments file was empty, but the problem traces back to this code here...

#Make sure segments file has at least one row of data, otherwise following commands will choke, need to be skipped
if (length(rownames(segments))>0){
  #Add "chr" to the chromosome names in the segments object
  segments[,"CHR"]=paste("chr", segments[,"CHR"], sep="")
  #Calculate the adjusted CN difference for segments
  segments[,"Adjusted_CN_DIFF"]=segments[,"Adjusted_CN1"]-  segments[,"Adjusted_CN2"]
} else { #Just add colnames to empty matrix to prevent further downstream   errors
  segments=as.data.frame(setNames(replicate(length(colnames(segments))  +1,numeric(0), simplify = F), c(colnames(segments),"Adjusted_CN_DIFF")))
}

The code clearly already accounts for the possibility - however, the else statement where colnames get added to an empty matrix for use as a substitute... seems to be creating a substitute with the wrong number of rows.

Could I get your feedback, please, on how to address this, and what the matrix needs to look like to prevent these downstream errors?

It's clear the thinking has been done, and a measure is already in place... but unfortunately, the error-prevention substitute data doesn't do what it ought to do.

Alternately, could I just set "segments" to some particular raw data that will work every time, and prevent errors?

Thanks kindly in advance.

The text was updated successfully, but these errors were encountered:

GrubLord · 2016-05-06T02:40:13Z

We resolved this issue by changing the following section of code:

#Reset values larger than 20 to be 20 (arbitrary - for display purposes).  Single outliers, usually false positives near the centromeres obscure the data by jacking up the scale...
if (length(which(cnvs[,"DIFF"] > hard_cap_upper)) > 0){
  cnvs[which(cnvs[,"DIFF"] > hard_cap_upper),"DIFF"] = hard_cap_upper
  if (length(rownames(segments))>0) {
    segments[which(segments[,"Adjusted_CN_DIFF"] > hard_cap_upper),"Adjusted_CN_DIFF"] = hard_cap_upper
  }
}

The culprit was the line segments[which(segments[,"Adjusted_CN_DIFF"] > hard_cap_upper),"Adjusted_CN_DIFF"] = hard_cap_upper, which turns out to be the one that actually generates the error.

I expect the issue was that the workaround referenced above was implemented before the addition of this line, and this functionality broke later, when that line was added, but unnoticed (since it only occurs when there is missing segment data).

Would be worth adding as a unit test, so this does not occur again in future.

GrubLord · 2016-05-06T02:40:31Z

Suggest our fix be integrated into the genome code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Built-in CnView workaround does not prevent errors as intended #205

Built-in CnView workaround does not prevent errors as intended #205

GrubLord commented May 5, 2016

GrubLord commented May 6, 2016

GrubLord commented May 6, 2016

Built-in CnView workaround does not prevent errors as intended #205

Built-in CnView workaround does not prevent errors as intended #205

Comments

GrubLord commented May 5, 2016

GrubLord commented May 6, 2016

GrubLord commented May 6, 2016