-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: remove NumPy dependency #12
Conversation
I'll replace this with simply displaying the two separately. |
OK, after removing the hack: import marray
import numpy as np
xp = marray.masked_array(np)
x = xp.asarray(np.arange(5))
x
# MaskedArray(array([0, 1, 2, 3, 4]), array([False, False, False, False, False]))
print(x)
# MaskedArray([0 1 2 3 4], [False False False False False])
x = xp.asarray(np.arange(10))
x
# MaskedArray(
# array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
# array([False, False, False, False, False, False, False, False, False,
# False])
# )
print(x)
# MaskedArray(
# [0 1 2 3 4 5 6 7 8 9],
# [False False False False False False False False False False]
# )
x = xp.asarray(np.arange(10000))
x
# MaskedArray(
# array([ 0, 1, 2, ..., 9997, 9998, 9999]),
# array([False, False, False, ..., False, False, False])
# )
print(x)
# MaskedArray(
# [ 0 1 2 ... 9997 9998 9999],
# [False False False ... False False False]
# ) I can live with that for a while. For debugging, it might even be useful to see what value is inside the data under the mask. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me in general, although I have some ideas about the implementation.
Given that this is a temporary measure until we have decided what to do about #5, how important are tests?
marray/__init__.py
Outdated
textwrap.indent(data_str, " " * 4), | ||
textwrap.indent(mask_str, " " * 4), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have probably gone with 2 spaces since we don't need to follow the same syntax as python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not follow the Python convention, though, especially if we do want it to be possible to recreate the array from the output?
marray/__init__.py
Outdated
join_char = "" | ||
components = ["MaskedArray(", data_str, mask_str, ")"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of using str.join
I would have gone with an f-string because that makes it easier to understand the end result. Something like
join_char = "" | |
components = ["MaskedArray(", data_str, mask_str, ")"] | |
result = f"MaskedArray({data_str}, {mask_str})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose that looks better here. I am not as convinced about the multiline version, though.
marray/__init__.py
Outdated
join_char = "\n" | ||
components = ["MaskedArray(", | ||
textwrap.indent(data_str, " " * 4), | ||
textwrap.indent(mask_str, " " * 4), | ||
")"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the same here:
join_char = "\n" | |
components = ["MaskedArray(", | |
textwrap.indent(data_str, " " * 4), | |
textwrap.indent(mask_str, " " * 4), | |
")"] | |
result = textwrap.dedent( | |
f""" | |
MaskedArray( | |
{data_str}, | |
{mask_str}, | |
) | |
""".lstrip("\n").rstrip() | |
) |
(and then obviously data_str
should not already contain a trailing comma)
data = np.asarray(self.data) | ||
mask = np.asarray(self.mask) | ||
return np.ma.masked_array(data, mask).__str__() | ||
return self._data_mask_string(str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should always be repr
? That would mean there's no difference between __str__
and __repr__
, but given that at least numpy
is always a pseudo-repr
anyways (so recreating objects is not possible) this might not be too much of an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should always be repr?
But why not follow the convention of the underlying library, whatever it is?
import numpy as np
import torch
import jax.numpy as jnp
import cupy as cp
for xp in [np, torch, jnp, cp]:
x = xp.asarray([1, 2, 3, 4])
print(xp.__name__)
print(str(x))
print(repr(x))
# numpy
# [1 2 3 4]
# array([1, 2, 3, 4])
#
# torch
# tensor([1, 2, 3, 4])
# tensor([1, 2, 3, 4])
#
# jax.numpy
# [1 2 3 4]
# Array([1, 2, 3, 4], dtype=int32)
#
# cupy
# [1 2 3 4]
# array([1, 2, 3, 4])
Most of them have different str
vs repr
, so I don't see a reason not to do what they do.
but given that at least numpy is always a pseudo-repr anyways (so recreating objects is not possible) this might not be too much of an issue?
I don't think it's much of an issue, especially since this is just a placeholder until it's time to polish things.
But if we were to expose MaskedArray
(and maybe accept dtype
as a parameter), then with the repr
output, the object could be recreated to the extent that the NumPy array
object can be recreated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how important are tests?
Not very, IMO. I'd rather not add any at this point.
data = np.asarray(self.data) | ||
mask = np.asarray(self.mask) | ||
return np.ma.masked_array(data, mask).__str__() | ||
return self._data_mask_string(str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should always be repr?
But why not follow the convention of the underlying library, whatever it is?
import numpy as np
import torch
import jax.numpy as jnp
import cupy as cp
for xp in [np, torch, jnp, cp]:
x = xp.asarray([1, 2, 3, 4])
print(xp.__name__)
print(str(x))
print(repr(x))
# numpy
# [1 2 3 4]
# array([1, 2, 3, 4])
#
# torch
# tensor([1, 2, 3, 4])
# tensor([1, 2, 3, 4])
#
# jax.numpy
# [1 2 3 4]
# Array([1, 2, 3, 4], dtype=int32)
#
# cupy
# [1 2 3 4]
# array([1, 2, 3, 4])
Most of them have different str
vs repr
, so I don't see a reason not to do what they do.
but given that at least numpy is always a pseudo-repr anyways (so recreating objects is not possible) this might not be too much of an issue?
I don't think it's much of an issue, especially since this is just a placeholder until it's time to polish things.
But if we were to expose MaskedArray
(and maybe accept dtype
as a parameter), then with the repr
output, the object could be recreated to the extent that the NumPy array
object can be recreated.
marray/__init__.py
Outdated
textwrap.indent(data_str, " " * 4), | ||
textwrap.indent(mask_str, " " * 4), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not follow the Python convention, though, especially if we do want it to be possible to recreate the array from the output?
marray/__init__.py
Outdated
join_char = "" | ||
components = ["MaskedArray(", data_str, mask_str, ")"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose that looks better here. I am not as convinced about the multiline version, though.
I switched to f-strings; not exactly what you suggested, but definitely an improvement. |
sorry for being absent for a week, I've been busy with other things. Looks good to me, though! |
Closes gh-2.
Uses hack to eliminate NumPy dependency.Let's keep gh-5 on the table, though.