Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add weights parameter as mentioned in section 4.6 "Subsampling Training Data" #3

Open
DavidArenburg opened this issue Jun 6, 2017 · 5 comments

Comments

@DavidArenburg
Copy link

DavidArenburg commented Jun 6, 2017

First of all, thanks for the great effort- it looks great. The combination of sparseMatrix with Rcpp (instead of Rs memory expensive model.matrix) looks very promising!

Though, as many times mentioned in the paper, in real world we are facing with very sparse data and very small amount of successes, hence, the data is very unbalanced. The normal logistic regression implementation can't handle this (although generating very high accuracy, no TPs will be found), hence, it is crucial to re-balance the data using some type of weights.

In section 4.6 in the paper, they introduced a pretty straight forward implementation of subsampling correction.

@dselivanov
Copy link
Owner

Hi. I've done this couple of days ago - see #2 . So now partial_fit method contains additional argument for weights. I've tried it myself and seems it works pretty well.

@DavidArenburg
Copy link
Author

Great! Can you also update the docs and add an example of how to generate and use the weights?
Thanks

@dselivanov
Copy link
Owner

Idea is to set weights of minor class inverse proportional to major class. For example you have dataset with 1000 examples 10 of which are positive and 990 are negative. I this case generally good idea is to set weight 1 for positive examples and ~0.01 (10/990) to negative examples.

@DavidArenburg
Copy link
Author

Yeah, I get that, I just wanted to see an actual code implementation example in the docs

@dselivanov
Copy link
Owner

Let's keep it open as reminder to update docs.

@dselivanov dselivanov reopened this Jun 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants