Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NotRegularMatrix exception for certain dataframes #32

Open
lokeshh opened this issue Jul 13, 2016 · 6 comments
Open

NotRegularMatrix exception for certain dataframes #32

lokeshh opened this issue Jul 13, 2016 · 6 comments
Labels

Comments

@lokeshh
Copy link
Member

lokeshh commented Jul 13, 2016

Statsample::GLM.compute is failing for certain dataframes.

> try = Daru::DataFrame.from_csv 'try.csv'
> Statsample::GLM.compute try, 'y', :logistic
ExceptionForMatrix::ErrNotRegular: Not Regular Matrix
from /home/ubuntu/.rvm/gems/ruby-2.2.3/gems/backports-3.6.8/lib/backports/1.9.2/stdlib/matrix.rb:933:in `block in inverse_from'

Get dataframe used in the above code here

@v0dro v0dro added the bug label Jul 13, 2016
@v0dro
Copy link
Member

v0dro commented Jul 13, 2016

Weird bug.

@v0dro
Copy link
Member

v0dro commented Jul 13, 2016

I think it's failing because a matrix inverse is being computed, and possibly the determinant is very close to zero which is why it's that ErrNotRegular. If I'm right, changing the matrix inverse computation algorithm should make it work.

@lokeshh
Copy link
Member Author

lokeshh commented Jul 14, 2016

Here's some info I found.

I printed all the matrices whose inverse the algorithm was computing. Here's the result:

...
Matrix[[-8.459899447643453e-14, -5.75239855749016e-12], [-5.75239855749016e-12, -10927.800950741155]]
Matrix[[-3.1308289294429086e-14, -2.128675014034775e-12], [-2.128675014034775e-12, -10927.800950740906]]
Matrix[[-1.1546319456101584e-14, -7.842171356742226e-13], [-7.842171356742226e-13, -10927.800950740813]]
Matrix[[-4.218847493575589e-15, -2.865041537347675e-13], [-2.865041537347675e-13, -10927.800950740779]]
Matrix[[-1.3322676295501873e-15, -8.997247391562266e-14], [-8.997247391562266e-14, -10927.800950740766]]
Matrix[[-6.661338147750937e-16, -4.4986236957811335e-14], [-4.4986236957811335e-14, -10927.800950740762]]
Matrix[[-0.0, -0.0], [-0.0, -10927.80095074076]]
ExceptionForMatrix::ErrNotRegular: Not Regular Matrix
from /home/ubuntu/.rvm/gems/ruby-2.2.3/gems/backports-3.6.8/lib/backports/1.9.2/stdlib/matrix.rb:933:in `block in inverse_from'

In the end it is computing inverse of Matrix[[-0.0, -0.0], [-0.0, -10927.80095074076]] which is not possible.

@v0dro
Copy link
Member

v0dro commented Jul 21, 2016

@agisga might this be an issue with the algorithm or is it loss of precision in some of the calculations?

@agisga
Copy link

agisga commented Jul 23, 2016

It seems to me that the algorithm is theoretically okay, because it gives correct results most of the time. Maybe it fails because it accumulates numerical error quickly, when the input matrix is not well conditioned.

Especially, since you mention matrix inverses, it sounds to me like the algorithm is not well optimized. It should be changed such that instead of computing matrix inverses, linear systems are solved (here is a very concise summary why). Solving a linear system is faster and numerically more stable than finding a matrix inverse.

Unfortunately right now I don't have the time to look at the algorithm in detail. I hope I can find the time to look at the algorithm in detail eventually. Probably it would be best to rewrite it such that it utilizes matrix decompositions and linear solvers provided by nmatrix-lapacke.

@dansbits
Copy link

Thanks for the explanation. I'm getting the same thing in case another example is helpful.
Data is available here: https://dl.dropboxusercontent.com/u/97188721/recruitment_failures.csv

data = Daru::DataFrame.from_csv 'recruitment_failures.csv'
glm = Statsample::GLM.compute data, 'failed_recruitment', :logistic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants