-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AbstractWeights
restriction to Real
is too restrictive
#745
Comments
What would be the use case for complex weights or weights with units? I've never seen weights that aren't dimensionless, positive reals. |
I don't understand what would be the meaning of the code you posted above either. Do you have any examples of other software that allows non-real weights? |
Sorry I thought I had included a link in an earlier reply. Here is a detailed Literate example that uses Here is a book chapter that discusses the MRI use case in detail. The MWE above is a minimal simplification of the essence of what is needed. I realize that it may be hard to convince you that complex weights are useful, but perhaps you would at least consider weights with units? In a classical histogram, the weights are simply integers (counts). This package has already taken a useful step by generalizing to allow floating point weights. (I'm not quite sure why that generalization is there but I'm glad it is.) In scientific computing, in many places where we have floating point values it is because they are associated with physical measurements that have units. The Unitful.jl package is designed to support such units, and a growing number of packages in the Julia ecosystem have endeavored to support Unitful data, often by relaxing type restrictions to be |
Thanks for the links, but I must say I don't really understand them. Do you mean that you would like to pass complex weights so that the real part of the weights applies to the real part of data and the imaginary part of weights to the imaginary part of data? Are you sure that if complex numbers were allowed for histograms, it would give what you expect, or would more changes be needed in StatsBase? Could you give a self-contained example which can easily be understood without the context of other packages? Do you have any examples of other software that allows this? Same for the case of unitful weights. Something worth noting is that weights are not just used for histograms in StatsBase, so if we relax the assumptions we have to make sure that would give correct results everywhere. But it would be possible to allow any |
I gave a MWE in the OP. If you have any examples of use cases where one needs |
Sorry, I just realized that I reported the error in the OP, but I did not describe well enough the desired behavior. Here is a MWE of approximately how I'd like it to work: using StatsBase
w = rand(ComplexF64, 100) # weights
data = randn(100)
tmp1 = fit(Histogram, data, weights(real(w)))
tmp2 = fit(Histogram, data, weights(imag(w)))
result = complex.(tmp1.weights, tmp2.weights) Having to split and the recombine is tolerable but inelegant, and the result is a Vector, not a |
Thanks for the self-contained example. AFAICT Also you haven't confirmed that this behavior is supported in other existing implementations.
No, absolutely not. For example, |
The data here are real; it is the weights that are complex valued. The edges come from the data, not from the weights. Certainly the data should still be real.
Here is Matlab weighted histogram code that works perfectly fine with complex weights: https://www.mathworks.com/matlabcentral/fileexchange/42493-generate-weighted-histogram I confirmed that it works as expected by Matlab does not support units (AFAIK) but that code would work fine with weights that have units if Matlab did support units. |
Ah sorry I thought that the choice of bins depends on the weights, but looking at the code again that's not the case. I wish you had mentioned the MATLAB precedent earlier when I asked! :-) It's generally much easier to think about new features when previous implementations exist. Could you check whether replacing |
the unit should go with data right? "weight", is dimensionless, it's "counting", by default it's "one item per one data point" |
Good point, but supporting units for the data is another issue for later.
If it were always "one item" then the type should be
I have checked this and it does work perfectly for my use case, but it causes a separate issue.
I've made a PR #755 to show a step towards what this would look like. |
no, because to get better distribution, many sampling processes generate <1 weight, while keeping the integration "correct", but in any case it's still counting, thus real. I've never seen non-real weights, and I personally can't imagine a situation where the "unit" is in weights not in data |
I hope that the packages I am developing in Julia will be used in ways I have never seen or imagined. And I've been around long enough to have the modesty to understand that I will never know or imagine most things. |
Let me try to elaborate here. Once we go from I believe that "anything Matlab can do, Julia can do better" so I would hope that the existence of Matlab code (linked above) that can do a weighted histogram with complex weights, along with my pointers above to a book chapter and a Julia repo that needs this functionality would be sufficiently compelling. Of course one could create a separate "BinnedAccumulation.jl" package but I just realized that my use case for |
we're still doing the same thing, as I said, the most important reason to support non-int real number is this to get the correct distribution, you can even have negative weight: https://arxiv.org/abs/2110.15211 . but it's still counting, imagine you're generating an approximation to a gaussian, you can't use int weight because then you can only have 1 bin count representing a gaussian distribution -- doesn't work. One can think of it as first generate histogram then normalize integral to 1, but some complex sampling process doesn't yield equal-weight samples, so you really need non-int weight as oppose to normalize histogram post-hoc. I think Julia community doesn't believe in "doing something, anything is always better than error", that sounds like javascript. So unless there's a conceptually sound / existing domain-specific application, I don't think weight can be non-real number. tbh, not the hill I'd die on, but I just want to be clear why int -> real doesn't imply real -> complex. |
Above I have 1) provided pointers to a Julia code repo that needs complex weights, 2) linked a related book chapter that describes why, and 3) linked to a Matlab function that works for complex numbers as needed. |
I'd be OK with widening the signature, but the dispatch ambiguity is problematic. For a long time I've wanted to move from a positional to a keyword arguments for So unfortunately I don't see a short-term way forward, apart from making progress in the merge of Statistics and StatsBase. |
We could have all the weights implemented in StatsBase require a real input, but let users add weights of any |
If we do that we should probably allow |
The type of
AbstractWeights
is restricted toReal
which precludes complex numbers, numbers with units from Unitful, etc. UsingNumber
for the most general type would be more inclusive.StatsBase.jl/src/weights.jl
Line 2 in b0ca0b0
Of course many subtypes like
ProbabilityWeights
must be Real, but general weights do not, especially for how weights are used in thefit
method for aHistogram
. The follow MWE ideally would work, but fails due to theweights()
method restrictions:An alternative to generalizing the
AbstractWeights
type would be to generalize thefit
method so that the weights can simply be any Number type.I would be eventually willing to help with a PR if there is openness to such a generalization, but it would helpful to know if there is a preference between generalizing
Weights
vs generalizingfit
.The text was updated successfully, but these errors were encountered: