`bool(x == x)` when `x` is masked? #28

mdhaber · 2024-11-25T22:54:01Z

According to NumPy:

import numpy as np
y = np.ma.MaskedArray(1, mask=True)
bool(y == y)  # False

Yet:

x = np.ma.MaskedArray([1], mask=True)
bool(x == x)  # True
bool(np.ma.all(x==x))  # False
bool(np.all([]))  # True

Let me try to interpret this:

bool(y == y) is False because y == y is a masked 0D thing, and any masked 0D thing is "falsy".
bool(np.ma.all(x==x)) is False because np.ma.all(x==x) reduces to a masked 0D thing.
bool(np.all([])) is True for the same reason that all([]) is True.
bool(x == x) being True is erroneous all the ways I try to look at it. x == x is a 1D thing with one masked element. I think bool(x == x) is most comparable to bool([]), which should be False.

All that said, I think it was choice to make bool of a masked 0D thing "falsy". Similarly, what should be returned when we try to get an int/float/complex out of a masked number? It would probably be most consistent to raise an error about the ambiguity whenever bool/int/float/complex is used on a masked array with only a masked element.

All this came up when I was thinking about whether masked elements should be treated as the same or distinct when computing unique_ functions.

The note in the documentation (e.g. here) of these functions states:

Uniqueness should be determined based on value equality (see equal()).

But I think this is still ambiguous for the reasons above, so I think we have a choice.

I'll just say that treating all masked elements as the same is easier because then the implementation is as simple as replacing the masked values with a sentinel that is not already in the data. If the masked elements are distinct, I see a few options:

use a different sentinel value for each masked element (which may require promoting the type - e.g. bool has only two values)
use np.nan as the sentinel value because nans are treated as distinct (but this only works for inexact types when there are no NaNs in the data),
performing the calculation on the compressed array, then append to the results:
- values - one masked element for each masked element of the input
- indices - the indices of all the masked elements of the input (e.g. nonzero(mask))
- count - a separate 1 for each masked element

inverse_indices is a bit more complicated because we can't just append to the results of the compressed array. I think we have to:

create a new array the size of the argument.
wherever there was a non-masked value, use inverse_indices from the compressed array
replace the elements where there was a masked value with arange(n_nonmasked, n_total) (because in values, the masked elements appear after n_nonmasked values).

And all that headache for something that doesn't make much sense to begin with, IMO!

The text was updated successfully, but these errors were encountered:

keewis · 2024-12-07T12:43:37Z

I think bool(x == x) is most comparable to bool([]), which should be False.

As far as I can tell, the Array API only defines the __<python scalar type>__ methods for 0D arrays, so I think we can safely assume this means we should raise (which is what numpy.ndarray does)

It would probably be most consistent to raise an error about the ambiguity whenever bool/int/float/complex is used on a masked array with only a masked element

I think this is where we can come up with our own strategy. Raising does make sense to me, but we could just as well return _sentinel (if we still have that) as the requested type.

I will need to take some more time to understand what equality would mean in the context of unique_* and inverse_indices.

mdhaber · 2024-12-28T22:37:31Z

@lucascolley can you weigh in from the perspective of whether masked values should be treated as distinct in the set functions? It's easier to implement the set functions if they are the same; see bottom of #28 (comment).

lucascolley · 2024-12-28T22:46:08Z

I think they should be treated as absent from the set. So if a value is masked, it shouldn't contribute to a count in unique_counts. Does that make sense, or am I missing something?

mdhaber · 2024-12-28T22:55:51Z

Maybe. If we want to be able to recreate the original set (including masked elements) from the inverse_indices, we can't just strip the masked elements out completely, right?

(Otherwise, it sounds like what you're proposing is that the set functions are as simple as removing the masked values completely, performing the function, and returning the results as MArrays with no elements masked, right?)

keewis · 2024-12-28T22:58:43Z

If we want to be able to recreate the original set (including masked elements) from the inverse_indices, we can't just strip the masked elements out completely, right?

that, or the result of inverse_indices needs to contain information on the missing values. However, for that we'd have to allow indexing with a masked array, which we wanted to avoid before.

lucascolley · 2024-12-28T23:33:40Z

(Otherwise, it sounds like what you're proposing is that the set functions are as simple as removing the masked values completely, performing the function, and returning the results as MArrays with no elements masked, right?)

That is what I was proposing, but I hadn't considered that you may want to reconstruct the original masked array.

mdhaber · 2024-12-28T23:55:11Z

Right. The standard says

the indices from the set of unique elements that reconstruct x

With that in mind, do you prefer one of the other options?

lucascolley · 2024-12-29T00:05:21Z

And all that headache for something that doesn't make much sense to begin with, IMO!

My preference would be to give up on being able to reconstruct the original array (just throw masked elements away), until somebody makes a feature request. I think it would be a cool thing to think through if there is a use-case. But probably not worth the hassle without one!

mdhaber · 2024-12-29T00:20:32Z

There is an option on the table for allowing reconstruction that is simple: treat all masked elements as the same. No headache. I think I will do that for now.

mdhaber mentioned this issue Dec 3, 2024

MAINT: remove NumPy dependency #12

Merged

mdhaber mentioned this issue Dec 11, 2024

Fix/Test set functions #34

Closed

mdhaber mentioned this issue Dec 28, 2024

TST: run array-api-tests in CI #47

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`bool(x == x)` when `x` is masked? #28

`bool(x == x)` when `x` is masked? #28

mdhaber commented Nov 25, 2024 •

edited

Loading

keewis commented Dec 7, 2024

mdhaber commented Dec 28, 2024

lucascolley commented Dec 28, 2024

mdhaber commented Dec 28, 2024 •

edited

Loading

keewis commented Dec 28, 2024

lucascolley commented Dec 28, 2024

mdhaber commented Dec 28, 2024

lucascolley commented Dec 29, 2024

mdhaber commented Dec 29, 2024

bool(x == x) when x is masked? #28

bool(x == x) when x is masked? #28

Comments

mdhaber commented Nov 25, 2024 • edited Loading

keewis commented Dec 7, 2024

mdhaber commented Dec 28, 2024

lucascolley commented Dec 28, 2024

mdhaber commented Dec 28, 2024 • edited Loading

keewis commented Dec 28, 2024

lucascolley commented Dec 28, 2024

mdhaber commented Dec 28, 2024

lucascolley commented Dec 29, 2024

mdhaber commented Dec 29, 2024

`bool(x == x)` when `x` is masked? #28

`bool(x == x)` when `x` is masked? #28

mdhaber commented Nov 25, 2024 •

edited

Loading

mdhaber commented Dec 28, 2024 •

edited

Loading