Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for non-interpolating quantile computation #185

Open
andreasnoack opened this issue Jan 6, 2025 · 2 comments
Open

Support for non-interpolating quantile computation #185

andreasnoack opened this issue Jan 6, 2025 · 2 comments

Comments

@andreasnoack
Copy link
Member

In the taxonomy of https://www.tandfonline.com/doi/abs/10.1080/00031305.1996.10473566 (see summary on Wikipedia), this would be the R-1 definition and is identical to the "population quantile" defined on Wikipedia. We currently only support the 4-9 definitions through the use of the alpha and beta keyword arguments.

The motivation is that it is sometimes nice that quantiles are based on values in the input data. This is something we have previously discussed in the context of the median function and it recently came up in JuliaData/CategoricalArrays.jl#381 (comment).

Since the R-1 definition can't be reached through special values of alpha and beta we'd need a separate interface if we are to support the new definition. One possibility is to introduce a new keyword argument, say method or definition, which takes Enum values and errors out if set simultaneously with alpha/beta.

@nalimilan
Copy link
Member

FWIW R's cut calls that argument type, which takes an integer.

We have two options for that argument:

  • Do the same as R, adjusting alpha and beta automatically according to the value of that argument. The advantage is that it's easy for people to replicate these definitions, which seem to be well established. Passing both type and alpha/beta would be disallowed.
  • Only allow type/method/whatever to specify whether the R-1 definition or the 4-9 definitions (linear interpolation) should be used, in the latter case alpha and beta would be used. Other values can be added later of course.

I tend to prefer option 1.

@nalimilan
Copy link
Member

I'm working on a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants