Skip to content
This repository has been archived by the owner on Jan 31, 2020. It is now read-only.

Built-in CnView workaround does not prevent errors as intended #205

Open
GrubLord opened this issue May 5, 2016 · 2 comments
Open

Built-in CnView workaround does not prevent errors as intended #205

GrubLord opened this issue May 5, 2016 · 2 comments

Comments

@GrubLord
Copy link

GrubLord commented May 5, 2016

Hi guys,

I'm been encountering an error in genome/lib/perl/Genome/Model/Tools/CopyNumber/CnView.pm.R

My process grinds to a halt with the following error:

Error in `[<-.data.frame`(`*tmp*`, which(segments[, "Adjusted_CN_DIFF"] >  :
  replacement has 1 row, data has 0
Calls: [<- -> [<-.data.frame
Execution halted

Now, at first I thought it was because the segments file was empty, but the problem traces back to this code here...

#Make sure segments file has at least one row of data, otherwise following commands will choke, need to be skipped
if (length(rownames(segments))>0){
  #Add "chr" to the chromosome names in the segments object
  segments[,"CHR"]=paste("chr", segments[,"CHR"], sep="")
  #Calculate the adjusted CN difference for segments
  segments[,"Adjusted_CN_DIFF"]=segments[,"Adjusted_CN1"]-  segments[,"Adjusted_CN2"]
} else { #Just add colnames to empty matrix to prevent further downstream   errors
  segments=as.data.frame(setNames(replicate(length(colnames(segments))  +1,numeric(0), simplify = F), c(colnames(segments),"Adjusted_CN_DIFF")))
}

The code clearly already accounts for the possibility - however, the else statement where colnames get added to an empty matrix for use as a substitute... seems to be creating a substitute with the wrong number of rows.

Could I get your feedback, please, on how to address this, and what the matrix needs to look like to prevent these downstream errors?

It's clear the thinking has been done, and a measure is already in place... but unfortunately, the error-prevention substitute data doesn't do what it ought to do.

Alternately, could I just set "segments" to some particular raw data that will work every time, and prevent errors?

Thanks kindly in advance.

@GrubLord
Copy link
Author

GrubLord commented May 6, 2016

We resolved this issue by changing the following section of code:

#Reset values larger than 20 to be 20 (arbitrary - for display purposes).  Single outliers, usually false positives near the centromeres obscure the data by jacking up the scale...
if (length(which(cnvs[,"DIFF"] > hard_cap_upper)) > 0){
  cnvs[which(cnvs[,"DIFF"] > hard_cap_upper),"DIFF"] = hard_cap_upper
  if (length(rownames(segments))>0) {
    segments[which(segments[,"Adjusted_CN_DIFF"] > hard_cap_upper),"Adjusted_CN_DIFF"] = hard_cap_upper
  }
}

The culprit was the line segments[which(segments[,"Adjusted_CN_DIFF"] > hard_cap_upper),"Adjusted_CN_DIFF"] = hard_cap_upper, which turns out to be the one that actually generates the error.

I expect the issue was that the workaround referenced above was implemented before the addition of this line, and this functionality broke later, when that line was added, but unnoticed (since it only occurs when there is missing segment data).

Would be worth adding as a unit test, so this does not occur again in future.

@GrubLord
Copy link
Author

GrubLord commented May 6, 2016

Suggest our fix be integrated into the genome code.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant