Unexpected error during Ridge Regression: `LAPACK GETRF error code: -4` #788

alexdupre · 2024-09-26T08:29:54Z

Describe the bug
During the Ridge regression, for some slices of the original training data, the dpotrf function returns the error code -4, even if the lda argument is greater than the n argument.

Expected behavior
No error. FWIW, the ridge regression with scikit-learn on the same data works fine.

Actual behavior

Iteration: 0
Iteration: 1
Iteration: 2
10:16:43.137 [main] ERROR smile.math.matrix.Matrix - LAPACK GETRF error code: -4
Exception in thread "main" java.lang.ArithmeticException: LAPACK GETRF error code: -4
	at smile.math.matrix.Matrix.cholesky(Matrix.java:1732)
	at smile.regression.RidgeRegression.fit(RidgeRegression.java:206)
	at smile.regression.RidgeRegression.fit(RidgeRegression.java:116)
	at smile.regression.package$.$anonfun$ridge$1(package.scala:115)
	at smile.util.package$time$.apply(package.scala:67)
	at smile.regression.package$.ridge(package.scala:115)
	at Bug$.$anonfun$new$1(Bug.scala:15)

Code snippet

import smile._
import smile.data.formula._
import smile.regression._

import scala.language.postfixOps

object Bug extends App {

  val df = read.csv("bug.csv")
  (0 to 3).foreach { i =>
    println(s"Iteration: $i")
    val trainData = df.slice(i, 50 + i)
    ridge("Label" ~, trainData, 1)
  }

}

Input data
bug.csv

Additional context

OpenJDK 21 on WSL2 and OpenJDK 11 on Ubuntu
Smile 3.1.1

The text was updated successfully, but these errors were encountered:

alexdupre · 2024-10-02T10:24:40Z

After digging into the issue it looks like the problem is caused by the values of a feature column that are all equal.

In theory, this scenario should have been caught before calling the dotprf function, here:

smile/core/src/main/java/smile/regression/RidgeRegression.java

Lines 184 to 188 in 76f79ec

    
           for (int j = 0; j < scale.length; j++) { 
        
               if (MathEx.isZero(scale[j])) { 
        
                   throw new IllegalArgumentException(String.format("The column '%s' is constant", X.colName(j))); 
        
               } 
        
           }

In practice, the constant check passed because the calculated standard deviation was NaN instead of a zero in machine precision.

This snippet, simulating a vector of length 48 with a constant value 62571.43, can easily show the issue:

  val c = 62571.43
  val m = 48
  val column = Array.fill(m)(c)

  val sum = column.sum
  val mean = sum / m // 62571.43000000003
  val sumsq = column.map(v => v * v).sum // 1.8792882490775534E11
  val variance = sumsq / m - mean * mean // -4.76837158203125E-7

  val sd = Math.sqrt(variance) // NaN
  val isZero = MathEx.isZero(sd) // false

The number that get passed to Math.sqrt is negative instead of zero because of floating point arithmetic and the algortihm used by colSds, and so the result is NaN. I'm not sure what's the correct way to fix the standard deviation function to work in this scenario, but the algorithm implemented by the commons math library correctly returns a variance of 0 in this case: https://github.com/apache/commons-math/blob/e580cde5f77019bb1b60c195a38de56ca4f5dcb0/commons-math-legacy/src/main/java/org/apache/commons/math4/legacy/stat/descriptive/moment/Variance.java#L399-L425

haifengl · 2024-10-03T02:47:20Z

Thanks for deep dive! I have added safeguard on the colSds computation. Please try the master branch if it fixed the issue.

alexdupre · 2024-10-03T08:17:23Z

I haven't tried it, yet, but the fix seems fine for my case (I don't know if there is the possibility that with some data the math error produces a small positive variance that is then not considered a zero)

While I was experimenting with the Ridge Regression I found another unexpected error: if I use a lambda equal to 0, to emulate a simple Linear Regression, I get another LAPACK error code in the same potrf function. Looking at the following code it seems 0 should be a supported value:

smile/core/src/main/java/smile/regression/RidgeRegression.java

Lines 167 to 171 in 528ab52

    
           for (int i = 0; i < p; i++) { 
        
               if (lambda[i] < 0.0) { 
        
                   throw new IllegalArgumentException(String.format("Invalid lambda[%d] = %f", i, lambda[i])); 
        
               } 
        
           }

haifengl · 2024-10-04T00:30:10Z

My guess is that your data is (very close to) collinear so that potrf fails when lambda = 0.0.

alexdupre · 2024-10-04T15:18:02Z

It might be. My experiment was to port some code from scikit-learn to scala (where I'm more proficient), and verify that it achieves similar results for the same datasets with the "same" algorithm. But it seems smile is a bit more picky about the dataset, while sklearn accepted everything. I'll continue my tests.

haifengl · 2024-10-04T15:52:43Z

In case of lambda = 0.0, you should use OLS that has a different way to handle collinearity. RidgeRegression is to handle collinear data with non-zero lambda.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected error during Ridge Regression: `LAPACK GETRF error code: -4` #788

Unexpected error during Ridge Regression: `LAPACK GETRF error code: -4` #788

alexdupre commented Sep 26, 2024

alexdupre commented Oct 2, 2024

haifengl commented Oct 3, 2024

alexdupre commented Oct 3, 2024

haifengl commented Oct 4, 2024

alexdupre commented Oct 4, 2024

haifengl commented Oct 4, 2024 •

edited

Loading

Unexpected error during Ridge Regression: LAPACK GETRF error code: -4 #788

Unexpected error during Ridge Regression: LAPACK GETRF error code: -4 #788

Comments

alexdupre commented Sep 26, 2024

alexdupre commented Oct 2, 2024

haifengl commented Oct 3, 2024

alexdupre commented Oct 3, 2024

haifengl commented Oct 4, 2024

alexdupre commented Oct 4, 2024

haifengl commented Oct 4, 2024 • edited Loading

Unexpected error during Ridge Regression: `LAPACK GETRF error code: -4` #788

Unexpected error during Ridge Regression: `LAPACK GETRF error code: -4` #788

haifengl commented Oct 4, 2024 •

edited

Loading