-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add weights parameter as mentioned in section 4.6 "Subsampling Training Data" #3
Comments
Hi. I've done this couple of days ago - see #2 . So now |
Great! Can you also update the docs and add an example of how to generate and use the weights? |
Idea is to set weights of minor class inverse proportional to major class. For example you have dataset with 1000 examples 10 of which are positive and 990 are negative. I this case generally good idea is to set weight 1 for positive examples and ~0.01 (10/990) to negative examples. |
Yeah, I get that, I just wanted to see an actual code implementation example in the docs |
Let's keep it open as reminder to update docs. |
First of all, thanks for the great effort- it looks great. The combination of
sparseMatrix
with Rcpp (instead of Rs memory expensivemodel.matrix
) looks very promising!Though, as many times mentioned in the paper, in real world we are facing with very sparse data and very small amount of successes, hence, the data is very unbalanced. The normal logistic regression implementation can't handle this (although generating very high accuracy, no TPs will be found), hence, it is crucial to re-balance the data using some type of weights.
In section 4.6 in the paper, they introduced a pretty straight forward implementation of subsampling correction.
The text was updated successfully, but these errors were encountered: