Replies: 2 comments 3 replies
-
@jpivarski this discussion sets the scene for a tentative PR #2179. At some point, it would be good to get your thoughts on this. Not urgent, so could do so during a 1:1. |
Beta Was this translation helpful? Give feedback.
-
I agree with your assessment: the original The idea that dimensions/list nodes are "more first-class" than everything else is a point I didn't appreciate in the original design. It comes up repeatedly, for example, here: Cloud-Drift/ragged-array-idioms#4 A naive attempt to preserve xarray dimension names as For existing "well known parameters" like In the above, it sounds like you're suggesting that we introduce a new, higher-level kind of parameter, which is attached to nested list dimensions, rather than layout nodes. That is a good idea: the need for it keeps coming up. I think that the higher-level thing can be implemented in terms of |
Beta Was this translation helpful? Give feedback.
-
Whenever we perform an Awkward operation, we ultimately have to choose how parameters are merged, and where to put them. In #2179, I have made some changes to the existing rules in order to fix a bug with string merging. These changes would make
Indexed[Option]Array
nodes more distinct, i.e. not merging their own parameters with their contents.The purpose of this discussion is to formally discuss where we want our parameter system to go, serve as a reference for PRs that change these rules, and to leave room for #1391
The parameter system in Awkward is very broad, and used in a variety of ways internally. In particular, we have two kinds of
parameters
:User-defined Parameters
We want to be able to associate unique metadata with every layout node. Users can set any parameter, but we don't know how these should survive various Awkward operations, so we use
check_equal
as our metadata merging policy. This ensures that merging parameters only preserves identical key-value pairs. Unlike xarray-style "attrs", global and per-record field #1391, most of this metadata is easily destroyed i.e. it's scoped to the layout node.Merging of layouts does not care if user-defined parameters don't match; they're just dropped at the parameter merge step.
Awkward-defined Parameters
In addition to user-generic parameters, we have a set of reserved (special) names e.g.
__array__
and__record__
. These parameters are required to match in order for layouts to be mergeable.Array lookup problem
Of particular importance is
__array__
, whose location Awkward is highly sensitive to. Unlike__record__
, which walks through the layout tree recursively,__array__
is taken from the first-encountered layout node. This causes the following observationThe reason that the former returns
ak.highlevel.Array
in the first example is that__array__
is resolved against theIndexedArray
node, which has no parameters.In many scenarios, we use parameters for user-facing operations like setting a behaviour class. It's my assertion that the user should only be required to reason about the array type, which does not mention the index. Therefore, I argue that the above behaviour is undesirable.
To fix this, we have discussed introducing something like
dimension_parameter
that behaves likepurelist_parameter
within a particular dimension (i.e., stops resolution at list boundaries). A generalisation of this would betype_parameters
, that are scoped to the dimension and option type.Clearly, the reserved parameters are special, and therefore we can impose our own custom lifetime rules. An alternative solution, therefore, would be to "promote" the
__array__
options, so that they're always set on both the indexed node and its content. I do not prefer this approach.Array class of option?
If we treat the
type
as the mechanism for users to configure behaviours, then:var * int64
have the same array class asOption[var * int64]
?var * int64
have the same array class asvar * int64?
?The current rules are (1) no, (2) yes.
Merging arrays with & without
__array__
Right now, we treat two arrays as mergeable iff. they have the same
__array__
. I wonder whether it makes sense to loosen this; if the contents are mergeable, then we need the arrays to have compatible__array__
, i.e. not__array__=X, __array__=Y
. If the second array does not have an__array__
parameter, is it not mergeable?Beta Was this translation helpful? Give feedback.
All reactions