Feature to weigh samples differently #87

gowthamnatarajan · 2017-11-13T05:41:46Z

It would be useful to weight different samples differently. I have to under sample parts of the data because they occur too often. It will be good to just give them more weight rather than have the raw data in the file.

andresmasegosa · 2017-11-13T09:01:53Z

I agree this is something we also considered at some point. It would not be too difficult to code it for maximum likelihood. Do you dare to do it? :) I could help you out. ;)

gowthamnatarajan · 2017-11-13T09:22:00Z

I can do it for MLE. Can you help? I already made some changes to the sampler.

andresmasegosa · 2017-11-13T10:17:05Z

Yes, sure! Let me think a bit which would be a simple design. But a first question, do you want to provide specific weights for each instance as an extra attribute in your data set?

gowthamnatarajan · 2017-11-13T10:29:24Z

Yes, another column with the weights in the ARFF file might be good.

andresmasegosa · 2017-11-15T07:38:09Z

Hi,

I've been thinking how to do it. I detail my proposal

a) Modify Attributes class to include "WEIGHT" as a special attribute name like "TIME_ID" and "SEQUENCE_ID". When an attribute has this specific name, it will be recognized by AMIDST and will be used to weight the instances when learning.

b) Within "core" module, create a new java class in the learning parametric package (e.g. ParalelMLWeighted). This class should inherit from ParallelMaximulLikelihood. Overwrite the method "double updateModel(DataOnMemory batch)" to account for the weights of the instances. Weight of the instances can be accessed through the special methods created at point (a).

c) Within "core-dynamic" module, create a new java class in the learning parametric package (e.g. ParalelMLWeighted). This class should inherit from ParallelMaximulLikelihood and be based on its counterpart static version created at point b).

I recommend you performing independent pull-requests for each one of these steps.

gowthamnatarajan · 2017-11-16T01:08:22Z

Will work on this in 2 weeks when we have to scale up.

andresmasegosa added the enhancement label Nov 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature to weigh samples differently #87

Feature to weigh samples differently #87

gowthamnatarajan commented Nov 13, 2017

andresmasegosa commented Nov 13, 2017

gowthamnatarajan commented Nov 13, 2017 •

edited

Loading

andresmasegosa commented Nov 13, 2017

gowthamnatarajan commented Nov 13, 2017

andresmasegosa commented Nov 15, 2017

gowthamnatarajan commented Nov 16, 2017

Feature to weigh samples differently #87

Feature to weigh samples differently #87

Comments

gowthamnatarajan commented Nov 13, 2017

andresmasegosa commented Nov 13, 2017

gowthamnatarajan commented Nov 13, 2017 • edited Loading

andresmasegosa commented Nov 13, 2017

gowthamnatarajan commented Nov 13, 2017

andresmasegosa commented Nov 15, 2017

gowthamnatarajan commented Nov 16, 2017

gowthamnatarajan commented Nov 13, 2017 •

edited

Loading