Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is At() Necessary for Symbol+String Inputs? #338

Open
ParadaCarleton opened this issue Nov 14, 2021 · 8 comments
Open

Is At() Necessary for Symbol+String Inputs? #338

ParadaCarleton opened this issue Nov 14, 2021 · 8 comments
Labels
needs thought question Further information is requested

Comments

@ParadaCarleton
Copy link
Contributor

Would it make sense to infer, when a string or symbol is passed to getindex, that the user probably wants At(string)? Adding this would probably make it easier to transition a bunch of AxisArrays code to use DimensionalData.jl.

@rafaqz
Copy link
Owner

rafaqz commented Nov 14, 2021

Yeah, this has been a contentious thing for a while. It initially required At - I chose it because its the shortest thing that could mean that to minimise how annoying this is... but then I switched it so what you want actually worked up until a few months ago... and then went back to requiring At! because automatic At is confusing. But you're right, it's mostly numbers that are weird without At when using DimArray.

Symbol can also be weird with DimStack - a single Symbol getindex is essentially NamedTuple indexing to get DimArray layers by name. But At(:symbol) is indexing the values of all the layers at the same time - which is a very different thing. And its nice if DimArray and DimStack have the same syntax. Of course with a Dimension wrapper its obvious what the Symbol means even for DimStack.

So to answer your question, no it's not totally necesary, but the semantics starts to feel a bit crowded without At wrappers.

I have thought about adding a trait somewhere for automatic At so people can choose that if they need it.

@ParadaCarleton
Copy link
Contributor Author

ParadaCarleton commented Nov 14, 2021

Symbol can also be weird with DimStack - a single Symbol getindex is essentially NamedTuple indexing to get DimArray layers by name. But At(:symbol) is indexing the values of all the layers at the same time - which is a very different thing. And its nice if DimArray and DimStack have the same syntax. Of course with a Dimension wrapper its obvious what the Symbol means even for DimStack.

I think that makes some sense, but it definitely causes problems for anyone trying to transition their codebase from AxisArrays. That being said, isn't not having to use At() a way of keeping the interface more consistent between DimStack and DimArray? That way it's possible to index with a symbol regardless of whether DimArray or DimStack is being used. (Even though it's not using a dimension struct, I would naturally tend to think of the names for DimStack as being something like an extra dimension which can be indexed along.)

Alternatively, given that DimStack is probably going to see a lot less use than DimArray, the syntax for DimStack could be changed, and then symbols could be treated the same as symbols for DimArray; perhaps AtStack()? Or perhaps the dimension name could be used; indexing a DimStack as x[:dim_array] (not providing a dimension name) treats x as a named tuple, while x[person=:name] indexes all the layers at the same time regardless of whether you're using a DimStack or DimArray.

@ParadaCarleton
Copy link
Contributor Author

Yeah, this has been a contentious thing for a while. It initially required At - I chose it because its the shortest thing that could mean that to minimise how annoying this is... but then I switched it so what you want actually worked up until a few months ago... and then went back to requiring At! because automatic At is confusing. But you're right, it's mostly numbers that are weird without At when using DimArray.

Symbol can also be weird with DimStack - a single Symbol getindex is essentially NamedTuple indexing to get DimArray layers by name. But At(:symbol) is indexing the values of all the layers at the same time - which is a very different thing. And its nice if DimArray and DimStack have the same syntax. Of course with a Dimension wrapper its obvious what the Symbol means even for DimStack.

So to answer your question, no it's not totally necesary, but the semantics starts to feel a bit crowded without At wrappers.

I have thought about adding a trait somewhere for automatic At so people can choose that if they need it.

It's worth noting that if symbol keys are automatically wrapped in At, switching from AxisKeys becomes a lot more viable for other packages -- for instance, we're considering making this swap in Turing, but having to change all the indexing-related syntax would be a definite hassle.

@rafaqz
Copy link
Owner

rafaqz commented Nov 19, 2021

I'm open to working on a change if this will be blocking to use cases elsewhere. But we have to be careful with the syntax as we're trying to do more things than AxisArrays.jl does.

DimStack is actually pretty central to the reason for writing this package, mostly through RasterStack <: AbstractDimStack in Rasters.jl. It can represent a netcdf and similar structures with multiple layers with mixed number of dims in a form that is easy to use and fast to index. And it's a table...

The specific syntax clash is this edge case, which is unlikely to happen very often, but needs handling:

julia> using DimensionalData

julia> st = DimStack((a=rand(10), b=falses(10)), (X(Symbol.('a':'j')),))
DimStack with dimensions:
  X Categorical Symbol[a, b, , i, j] ForwardOrdered
and 2 layers:
  :a Float64 dims: X (10)
  :b Bool dims: X (10)


julia> st[:a]
10-element DimArray{Float64,1} a with dimensions:
  X Categorical Symbol[a, b, , i, j] ForwardOrdered
 0.764328
 0.810431
 0.37886
 0.689809
 0.368838
 0.284454
 0.665395
 0.326088
 0.762186
 0.596156

julia> st[At(:a)]
(a = 0.7643275682479944, b = false)

@rafaqz
Copy link
Owner

rafaqz commented Nov 19, 2021

I may have a solution to this. We can use getproperty to access DimStack layers, instead of getindex, like:

julia> st.a
10-element DimArray{Float64,1} a with dimensions:
  X Categorical Symbol[a, b, , i, j] ForwardOrdered
 0.764328
 0.810431
 0.37886
 0.689809
 0.368838
 0.284454
 0.665395
 0.326088
 0.762186
 0.596156

julia> st[:a]
(a = 0.7643275682479944, b = false)

That might even be nice to separate out the syntax a little more anyway - so that getindex is always array-like, rather than dict-like.

@ParadaCarleton
Copy link
Contributor Author

I may have a solution to this. We can use getproperty to access DimStack layers, instead of getindex, like:

julia> st.a
10-element DimArray{Float64,1} a with dimensions:
  X Categorical Symbol[a, b, , i, j] ForwardOrdered
 0.764328
 0.810431
 0.37886
 0.689809
 0.368838
 0.284454
 0.665395
 0.326088
 0.762186
 0.596156

julia> st[:a]
(a = 0.7643275682479944, b = false)

That might even be nice to separate out the syntax a little more anyway - so that getindex is always array-like, rather than dict-like.

Definitely like that solution! Sounds like a good idea to me.

@rafaqz
Copy link
Owner

rafaqz commented Nov 20, 2021

This is going to break a LOT of scripts built on Rasters.jl.

I will have to check in with people and see what they think, so it might take a while.

@ParadaCarleton
Copy link
Contributor Author

ParadaCarleton commented Dec 10, 2021

This is going to break a LOT of scripts built on Rasters.jl.

I will have to check in with people and see what they think, so it might take a while.

Let me know what people think when you've talked to them about this.

I think this is a good idea, though; in general, transitioning from AxisArrays to DimensionalData should be as smooth as possible, maybe even automatic (as in, we could write a script that takes all code written with AxisArrays and converts it into DimensionalData code).

@rafaqz rafaqz added the question Further information is requested label Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs thought question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants