Added weighting of silverman and scott #77

tommyod · 2020-11-23T21:19:05Z

No description provided.

KDEpy/utils.py

tommyod · 2020-11-26T10:18:39Z

Thanks for the comments @lukedyer-peak .

This was not as straightforward as I first thought. If you have any more thoughts let me know.

The standard deviation is computed using ddof = 1, i.e. the sample standard deviation with n-1 in the denominator. With weights my immediate generalization was sum(weights)-1, but often the weights sum to unity. I'm considering scaling weights so the smallest weight equals one, this way the sample standard deviation substracts the smallest weight. But I don't think that's a common way of doing it.
Weighted percentiles were also non-trivial. I found some code snippets online, but none that were very good. Many failed the property that repeated observations should equal integer weights, i.e. that data = [0, 1, 1] should equal data = [0, 1] with weights = [1, 2].
I believe the intuitive property that data = [0, 1, 1] should equal data = [0, 1] with weights = [1, 2] should apply to the entire KDEpy library. I don't see any other possible interpretation that makes sense.
Weights should probably not be allowed to be zero (which is equal to data not being there in the first place). This choice should be consistent, but it's most important in the first check of weights. (Many sub-routines also check weights, just for sanity).

lukedyer-peak · 2020-12-02T16:03:00Z

The standard deviation is computed using ddof = 1, i.e. the sample standard deviation with n-1 in the denominator. With weights my immediate generalization was sum(weights)-1, but often the weights sum to unity. I'm considering scaling weights so the smallest weight equals one, this way the sample standard deviation substracts the smallest weight. But I don't think that's a common way of doing it.

I think it would be helpful to define what is meant by the weights. I'm not a statistical expert but there are 2 different ways weights meaning weights can have here. I think restricting to one case or another might help - and documenting what is meant these weights would be useful too. Wiki describes 2 different ways of calculating a weighted std dev with either frequency or reliability weights (note in some formula on that wiki page they assume that the weights have been normalised so that they sum to 1). I personally think it might be best to go with the reliability weights, which GNU also go with in their science library. In some places reliability weights are just talked of as weights and frequency weights as frequency - see this explanation in a SAS blog.

Weighted percentiles were also non-trivial. I found some code snippets online, but none that were very good. Many failed the property that repeated observations should equal integer weights, i.e. that data = [0, 1, 1] should equal data = [0, 1] with weights = [1, 2].

I think this logic (of using reliability weights) should follow through naturally to calculating quantiles. One could think of sampling with these weights and taking quantiles from the sampled distributions. Then if you follow that logic through it would lead to something like this code snipped from SO.

Weights should probably not be allowed to be zero (which is equal to data not being there in the first place). This choice should be consistent, but it's most important in the first check of weights. (Many sub-routines also check weights, just for sanity).

I have some personal motivation to allow 0 weighting, which would correspond to ignoring that observation. This is as I'm planning on using this package. (I can implement this logic on my side though). There evidence for this approach being "standard" or "expected" too as numpy allows weights to be 0 (and probabilities to be 0 in the random module).

lukedyer-peak reviewed Nov 26, 2020

View reviewed changes

KDEpy/utils.py Outdated Show resolved Hide resolved

lukedyer-peak reviewed Nov 26, 2020

View reviewed changes

KDEpy/utils.py Outdated Show resolved Hide resolved

tommyod added 3 commits November 26, 2020 19:32

added weighting of silverman and scott

fa06351

incremented version to 2.0.0

ced2dd0

added back ddof=1 as default

42c9bfb

tommyod force-pushed the weights branch from 8ad4745 to 42c9bfb Compare November 26, 2020 18:32

tommyod added 2 commits November 26, 2020 19:47

cleaned up a little bit

fcbdff5

fixed typo in docs

8b5a178

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added weighting of silverman and scott #77

Added weighting of silverman and scott #77

tommyod commented Nov 23, 2020

tommyod commented Nov 26, 2020

lukedyer-peak commented Dec 2, 2020

Added weighting of silverman and scott #77

Are you sure you want to change the base?

Added weighting of silverman and scott #77

Conversation

tommyod commented Nov 23, 2020

tommyod commented Nov 26, 2020

lukedyer-peak commented Dec 2, 2020