Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent and problematic behavior in Variable indexing #1350

Open
RandallPittmanOrSt opened this issue Jul 12, 2024 · 0 comments
Open

Inconsistent and problematic behavior in Variable indexing #1350

RandallPittmanOrSt opened this issue Jul 12, 2024 · 0 comments

Comments

@RandallPittmanOrSt
Copy link
Contributor

RandallPittmanOrSt commented Jul 12, 2024

There are a number of footguns, inconsistencies, and suspicious behaviors in how _StartCountStride handles various keys for __getitem__ and __setitem__. I found these when working on the type stubs.

I created this script to test a bunch of combinations. I only post the 1-D variable results below, as the 2-D results didn't any more information. (I added a 2D case below that provides some more information.) See the Notes column of the table for footnotes to each problem I found.

Note - I use the term "integral float" to refer to a float whose value is an integer, e.g. 1.0.

1-D

Data

arr1:

array([5, 6, 7])

var1[...]:

array([5, 6, 7])

Test results

key type key ds[varname][key] result Notes
int 0 array(5)
integral float 0.0 array(5) Note: Allowing integral floats does make some sense to avoid obnoxious forced type conversions.
Non-integral float 0.8 array(5) Footgun: A float key is cooerced to an integer, rather than being detected as a logic error. I think this should be prevented by something like the _is_int() check used as was done for VLEN in #757.
integer string '1' array(6) Note: Ok, we allow integer strings for keys. That's weird, but convenient I guess.
integral float string '1.0' IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices')
Non-integral float string '1.5' IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices')
bool True array(6) Footgun: A single boolean key for a dimensioned variable is treated like an integer 1. Booleans keys only make sense as a mask for one or more dimensions.
list of int [0, 1] array([5, 6])
list of bool (same shape as dimension) [True, False, True] array([5, 7])
list of bool (wrong shape) [True, False] IndexError('\nBoolean array must have the same shape as the data along this dimension.')
list of integral float [0.0, 1.0] IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices') Inconsistent: Why do we allow float keys but not lists of floats?
list of non-integral float [0.6, 1.7] ValueError('slicing expression exceeds the number of dimensions of the variable') Suspicious: Why is this a ValueError about having the wrong number of dimensions in the key?
list of integer str ['0', '1'] ValueError('slicing expression exceeds the number of dimensions of the variable') Inconsistent/suspicious: We allow integer strings but not lists of integer strings, and the error is not an IndexError but a ValueError.
list of integral float str ['0.0', '1.0'] IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices')
list of non-integral float str ['0.6', '1.7'] IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices')
single-valued list of int [1] array([6])
single-valued list of integral float [1.0] IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices')
single-valued list of non-integral float [1.7] array(6) Inconsistent/suspicious: A list of float keys is allowed only if there's only one value and that value is not integral! And it returns a scalar instead of an array (compare to single-valued list of int).
single-valued list of int str ['1'] array(6) Inconsistent/suspicious: A list of int str keys is allowed only if there's only one value. And it returns a scalar instead of an array (compare to single-valued list of int).
single-valued list of integral float str ['1.0'] IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices')
single-valued list of non-integral float str ['1.7'] IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices')

2D case

It appears that a list of (non-integral!) floats or a list of strings of integers is interpreted as row, column indices, whereas a list of int is interpreted as multiple indexes for one dimension.

Data

var2[...]:

array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]])

Test and result

var2[[1, 2]]
# returns the second and third rows
# array([[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
#        [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]])

var2[['1']]
# returns the second row
# array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

ds["var2"][['1', '2']]
# returns the third value in the second column
# array(22)

ds["var2"][[1.0, 2.0]]
# raises IndexError: only integers, slices (`:`), ellipsis (`...`), and 1-d integer or boolean arrays are valid indices

ds["var2"][[1.2, 2.2]]
# returns the third value in the second column
# array(22)
@RandallPittmanOrSt RandallPittmanOrSt changed the title Variable.__setitem__ allows non-integral floats for the key Inconsistencies in allowed types for slicing keys Jul 18, 2024
@RandallPittmanOrSt RandallPittmanOrSt changed the title Inconsistencies in allowed types for slicing keys Inconsistencies in allowed types for Varaiable __getattr__ and __setattr__ keys Jul 18, 2024
@RandallPittmanOrSt RandallPittmanOrSt changed the title Inconsistencies in allowed types for Varaiable __getattr__ and __setattr__ keys Inconsistencies in allowed types for Variable __getattr__ and __setattr__ keys Jul 18, 2024
@RandallPittmanOrSt RandallPittmanOrSt changed the title Inconsistencies in allowed types for Variable __getattr__ and __setattr__ keys Inconsistencies in allowed types for Variable __getitem__ and __setitem__ keys Jul 18, 2024
@RandallPittmanOrSt RandallPittmanOrSt changed the title Inconsistencies in allowed types for Variable __getitem__ and __setitem__ keys Inconsistencies in allowed types for Variable __getitem__ and __setitem__ keys (indexing) Jul 19, 2024
@RandallPittmanOrSt RandallPittmanOrSt changed the title Inconsistencies in allowed types for Variable __getitem__ and __setitem__ keys (indexing) Inconsistencies in allowed types for Variable indexing Jul 19, 2024
@RandallPittmanOrSt RandallPittmanOrSt changed the title Inconsistencies in allowed types for Variable indexing Footguns, Inconsistencies, and suspicious behavior in Variable indexing Jul 22, 2024
@RandallPittmanOrSt RandallPittmanOrSt changed the title Footguns, Inconsistencies, and suspicious behavior in Variable indexing Inconsistent and problematic behavior in Variable indexing Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant