Add weights parameter as mentioned in section 4.6 "Subsampling Training Data" #3

DavidArenburg · 2017-06-06T12:03:19Z

First of all, thanks for the great effort- it looks great. The combination of sparseMatrix with Rcpp (instead of Rs memory expensive model.matrix) looks very promising!

Though, as many times mentioned in the paper, in real world we are facing with very sparse data and very small amount of successes, hence, the data is very unbalanced. The normal logistic regression implementation can't handle this (although generating very high accuracy, no TPs will be found), hence, it is crucial to re-balance the data using some type of weights.

In section 4.6 in the paper, they introduced a pretty straight forward implementation of subsampling correction.

The text was updated successfully, but these errors were encountered:

dselivanov · 2017-06-06T12:30:03Z

Hi. I've done this couple of days ago - see #2 . So now partial_fit method contains additional argument for weights. I've tried it myself and seems it works pretty well.

DavidArenburg · 2017-06-07T11:39:24Z

Great! Can you also update the docs and add an example of how to generate and use the weights?
Thanks

dselivanov · 2017-06-07T12:49:41Z

Idea is to set weights of minor class inverse proportional to major class. For example you have dataset with 1000 examples 10 of which are positive and 990 are negative. I this case generally good idea is to set weight 1 for positive examples and ~0.01 (10/990) to negative examples.

DavidArenburg · 2017-06-08T06:52:11Z

Yeah, I get that, I just wanted to see an actual code implementation example in the docs

dselivanov · 2017-06-11T05:49:06Z

Let's keep it open as reminder to update docs.

DavidArenburg closed this as completed Jun 11, 2017

dselivanov reopened this Jun 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add weights parameter as mentioned in section 4.6 "Subsampling Training Data" #3

Add weights parameter as mentioned in section 4.6 "Subsampling Training Data" #3

DavidArenburg commented Jun 6, 2017 •

edited

Loading

dselivanov commented Jun 6, 2017

DavidArenburg commented Jun 7, 2017

dselivanov commented Jun 7, 2017

DavidArenburg commented Jun 8, 2017

dselivanov commented Jun 11, 2017

Add weights parameter as mentioned in section 4.6 "Subsampling Training Data" #3

Add weights parameter as mentioned in section 4.6 "Subsampling Training Data" #3

Comments

DavidArenburg commented Jun 6, 2017 • edited Loading

dselivanov commented Jun 6, 2017

DavidArenburg commented Jun 7, 2017

dselivanov commented Jun 7, 2017

DavidArenburg commented Jun 8, 2017

dselivanov commented Jun 11, 2017

DavidArenburg commented Jun 6, 2017 •

edited

Loading