-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected error during Ridge Regression: LAPACK GETRF error code: -4
#788
Comments
After digging into the issue it looks like the problem is caused by the values of a feature column that are all equal. In theory, this scenario should have been caught before calling the smile/core/src/main/java/smile/regression/RidgeRegression.java Lines 184 to 188 in 76f79ec
In practice, the constant check passed because the calculated standard deviation was NaN instead of a zero in machine precision.
This snippet, simulating a vector of length 48 with a constant value 62571.43, can easily show the issue:
The number that get passed to |
Thanks for deep dive! I have added safeguard on the colSds computation. Please try the master branch if it fixed the issue. |
I haven't tried it, yet, but the fix seems fine for my case (I don't know if there is the possibility that with some data the math error produces a small positive variance that is then not considered a zero) While I was experimenting with the Ridge Regression I found another unexpected error: if I use a smile/core/src/main/java/smile/regression/RidgeRegression.java Lines 167 to 171 in 528ab52
|
My guess is that your data is (very close to) collinear so that potrf fails when lambda = 0.0. |
It might be. My experiment was to port some code from scikit-learn to scala (where I'm more proficient), and verify that it achieves similar results for the same datasets with the "same" algorithm. But it seems smile is a bit more picky about the dataset, while sklearn accepted everything. I'll continue my tests. |
In case of lambda = 0.0, you should use OLS that has a different way to handle collinearity. RidgeRegression is to handle collinear data with non-zero lambda. |
Describe the bug
During the Ridge regression, for some slices of the original training data, the
dpotrf
function returns the error code-4
, even if thelda
argument is greater than then
argument.Expected behavior
No error. FWIW, the ridge regression with scikit-learn on the same data works fine.
Actual behavior
Code snippet
Input data
bug.csv
Additional context
The text was updated successfully, but these errors were encountered: