Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: remove NumPy dependency #12

Merged
merged 5 commits into from
Dec 1, 2024
Merged

MAINT: remove NumPy dependency #12

merged 5 commits into from
Dec 1, 2024

Conversation

mdhaber
Copy link
Owner

@mdhaber mdhaber commented Nov 21, 2024

Closes gh-2.

Uses hack to eliminate NumPy dependency. Let's keep gh-5 on the table, though.

marray/__init__.py Outdated Show resolved Hide resolved
@mdhaber
Copy link
Owner Author

mdhaber commented Nov 21, 2024

I'll replace this with simply displaying the two separately.

@mdhaber
Copy link
Owner Author

mdhaber commented Nov 21, 2024

OK, after removing the hack:

import marray
import numpy as np
xp = marray.masked_array(np)
x = xp.asarray(np.arange(5))

x
# MaskedArray(array([0, 1, 2, 3, 4]), array([False, False, False, False, False]))

print(x)
# MaskedArray([0 1 2 3 4], [False False False False False])

x = xp.asarray(np.arange(10))

x
# MaskedArray(
#     array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), 
#     array([False, False, False, False, False, False, False, False, False,
#            False])
# )

print(x)
# MaskedArray(
#     [0 1 2 3 4 5 6 7 8 9], 
#     [False False False False False False False False False False]
# )

x = xp.asarray(np.arange(10000))

x
# MaskedArray(
#     array([   0,    1,    2, ..., 9997, 9998, 9999]), 
#     array([False, False, False, ..., False, False, False])
# )

print(x)
# MaskedArray(
#     [   0    1    2 ... 9997 9998 9999], 
#     [False False False ... False False False]
# )

I can live with that for a while. For debugging, it might even be useful to see what value is inside the data under the mask.

Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me in general, although I have some ideas about the implementation.

Given that this is a temporary measure until we have decided what to do about #5, how important are tests?

Comment on lines 91 to 92
textwrap.indent(data_str, " " * 4),
textwrap.indent(mask_str, " " * 4),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have probably gone with 2 spaces since we don't need to follow the same syntax as python

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not follow the Python convention, though, especially if we do want it to be possible to recreate the array from the output?

Comment on lines 86 to 87
join_char = ""
components = ["MaskedArray(", data_str, mask_str, ")"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using str.join I would have gone with an f-string because that makes it easier to understand the end result. Something like

Suggested change
join_char = ""
components = ["MaskedArray(", data_str, mask_str, ")"]
result = f"MaskedArray({data_str}, {mask_str})"

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that looks better here. I am not as convinced about the multiline version, though.

Comment on lines 89 to 93
join_char = "\n"
components = ["MaskedArray(",
textwrap.indent(data_str, " " * 4),
textwrap.indent(mask_str, " " * 4),
")"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the same here:

Suggested change
join_char = "\n"
components = ["MaskedArray(",
textwrap.indent(data_str, " " * 4),
textwrap.indent(mask_str, " " * 4),
")"]
result = textwrap.dedent(
f"""
MaskedArray(
{data_str},
{mask_str},
)
""".lstrip("\n").rstrip()
)

(and then obviously data_str should not already contain a trailing comma)

data = np.asarray(self.data)
mask = np.asarray(self.mask)
return np.ma.masked_array(data, mask).__str__()
return self._data_mask_string(str)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should always be repr? That would mean there's no difference between __str__ and __repr__, but given that at least numpy is always a pseudo-repr anyways (so recreating objects is not possible) this might not be too much of an issue?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should always be repr?

But why not follow the convention of the underlying library, whatever it is?

import numpy as np
import torch
import jax.numpy as jnp
import cupy as cp
for xp in [np, torch, jnp, cp]:
    x = xp.asarray([1, 2, 3, 4])
    print(xp.__name__)
    print(str(x))
    print(repr(x))

# numpy
# [1 2 3 4]
# array([1, 2, 3, 4])
#
# torch
# tensor([1, 2, 3, 4])
# tensor([1, 2, 3, 4])
#
# jax.numpy
# [1 2 3 4]
# Array([1, 2, 3, 4], dtype=int32)
#
# cupy
# [1 2 3 4]
# array([1, 2, 3, 4])

Most of them have different str vs repr, so I don't see a reason not to do what they do.

but given that at least numpy is always a pseudo-repr anyways (so recreating objects is not possible) this might not be too much of an issue?

I don't think it's much of an issue, especially since this is just a placeholder until it's time to polish things.
But if we were to expose MaskedArray (and maybe accept dtype as a parameter), then with the repr output, the object could be recreated to the extent that the NumPy array object can be recreated.

Copy link
Owner Author

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how important are tests?

Not very, IMO. I'd rather not add any at this point.

data = np.asarray(self.data)
mask = np.asarray(self.mask)
return np.ma.masked_array(data, mask).__str__()
return self._data_mask_string(str)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should always be repr?

But why not follow the convention of the underlying library, whatever it is?

import numpy as np
import torch
import jax.numpy as jnp
import cupy as cp
for xp in [np, torch, jnp, cp]:
    x = xp.asarray([1, 2, 3, 4])
    print(xp.__name__)
    print(str(x))
    print(repr(x))

# numpy
# [1 2 3 4]
# array([1, 2, 3, 4])
#
# torch
# tensor([1, 2, 3, 4])
# tensor([1, 2, 3, 4])
#
# jax.numpy
# [1 2 3 4]
# Array([1, 2, 3, 4], dtype=int32)
#
# cupy
# [1 2 3 4]
# array([1, 2, 3, 4])

Most of them have different str vs repr, so I don't see a reason not to do what they do.

but given that at least numpy is always a pseudo-repr anyways (so recreating objects is not possible) this might not be too much of an issue?

I don't think it's much of an issue, especially since this is just a placeholder until it's time to polish things.
But if we were to expose MaskedArray (and maybe accept dtype as a parameter), then with the repr output, the object could be recreated to the extent that the NumPy array object can be recreated.

Comment on lines 91 to 92
textwrap.indent(data_str, " " * 4),
textwrap.indent(mask_str, " " * 4),
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not follow the Python convention, though, especially if we do want it to be possible to recreate the array from the output?

Comment on lines 86 to 87
join_char = ""
components = ["MaskedArray(", data_str, mask_str, ")"]
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that looks better here. I am not as convinced about the multiline version, though.

marray/__init__.py Outdated Show resolved Hide resolved
@mdhaber
Copy link
Owner Author

mdhaber commented Nov 25, 2024

I switched to f-strings; not exactly what you suggested, but definitely an improvement.

@mdhaber mdhaber requested a review from keewis November 25, 2024 00:12
@mdhaber mdhaber merged commit 2adcfae into main Dec 1, 2024
12 checks passed
@keewis
Copy link
Collaborator

keewis commented Dec 3, 2024

sorry for being absent for a week, I've been busy with other things. Looks good to me, though!

@mdhaber
Copy link
Owner Author

mdhaber commented Dec 3, 2024

No problem. I went ahead and merged everything to avoid more conflicts, but there is plenty left to do. Can I get your thoughts on gh-6, gh-24, gh-28, and gh-30 when you have a chance?

@mdhaber mdhaber mentioned this pull request Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: remove numpy dependency
2 participants