Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected conversion int to float type and crash with complex type since release 0.7 #271

Open
ikondov opened this issue Jan 7, 2025 · 6 comments

Comments

@ikondov
Copy link

ikondov commented Jan 7, 2025

Relevant packages versions

  • python 3.10.12
  • pint 0.24.4
  • pandas 2.2.3

Unexpected type conversion in version 0.7

With pint-pandas 0.6.2:

In [1]: import pandas

In [2]: import pint_pandas

In [3]: a = pandas.Series([1, 2], dtype=pint_pandas.PintType('meter'))

In [4]: a
Out[4]:
0    1
1    2
dtype: pint[meter]

In [5]: a.pint.magnitude
Out[5]:
0    1
1    2
dtype: int64

With pint-pandas 0.7.1 (the same with version 0.7):

In [1]: import pandas

In [2]: import pint_pandas

In [3]: a = pandas.Series([1, 2], dtype=pint_pandas.PintType('meter'))

In [4]: a
Out[4]:
0    1.0
1    2.0
dtype: pint[meter][Float64]

In [5]: a.pint.magnitude
Out[5]:
0    1.0
1    2.0
dtype: float64

Crash with complex type with version 0.7

Another issue I have with complex types:

With pint-pandas 0.6.2:

In [1]: import pandas

In [2]: import pint_pandas

In [3]: a = pandas.Series([1.3+1.1j, 2.5j], dtype=pint_pandas.PintType('meter'))

In [4]: a
Out[4]:
0    (1.3+1.1j)
1          2.5j
dtype: pint[meter]

In [5]: a.pint.magnitude
Out[5]:
0    1.3+1.1j
1    0.0+2.5j
dtype: complex128

With pint-pandas 0.7.1:

In [3]: a = pandas.Series([1.3+1.1j, 2.5j], dtype=pint_pandas.PintType('meter'))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 a = pandas.Series([1.3+1.1j, 2.5j], dtype=pint_pandas.PintType('meter'))

File /mnt/data/ubuntu/work/python-3.10.12/lib/python3.10/site-packages/pandas/core/series.py:584, in Series.__init__(self, data, index, dtype, name, copy, fastpath)
    582         data = data.copy()
    583 else:
--> 584     data = sanitize_array(data, index, dtype, copy)
    586     manager = _get_option("mode.data_manager", silent=True)
    587     if manager == "block":

File /mnt/data/ubuntu/work/python-3.10.12/lib/python3.10/site-packages/pandas/core/construction.py:596, in sanitize_array(data, index, dtype, copy, allow_2d)
    594     _sanitize_non_ordered(data)
    595     cls = dtype.construct_array_type()
--> 596     subarr = cls._from_sequence(data, dtype=dtype, copy=copy)
    598 # GH#846
    599 elif isinstance(data, np.ndarray):

File /mnt/data/ubuntu/work/python-3.10.12/lib/python3.10/site-packages/pint_pandas/pint_array.py:696, in PintArray._from_sequence(cls, scalars, dtype, copy)
    690 if isinstance(master_scalar, _Quantity):
    691     scalars = [
    692         (item.to(units).magnitude if hasattr(item, "to") else item)
    693         for item in scalars
    694     ]
--> 696 values = pd.array(scalars, dtype=subdtype)
    697 return cls(
    698     values, dtype=PintType(units=units, subdtype=values.dtype), copy=copy
    699 )

File /mnt/data/ubuntu/work/python-3.10.12/lib/python3.10/site-packages/pandas/core/construction.py:321, in array(data, dtype, copy)
    319 if isinstance(dtype, ExtensionDtype):
    320     cls = dtype.construct_array_type()
--> 321     return cls._from_sequence(data, dtype=dtype, copy=copy)
    323 if dtype is None:
    324     inferred_dtype = lib.infer_dtype(data, skipna=True)

File /mnt/data/ubuntu/work/python-3.10.12/lib/python3.10/site-packages/pandas/core/arrays/masked.py:152, in BaseMaskedArray._from_sequence(cls, scalars, dtype, copy)
    150 @classmethod
    151 def _from_sequence(cls, scalars, *, dtype=None, copy: bool = False) -> Self:
--> 152     values, mask = cls._coerce_to_array(scalars, dtype=dtype, copy=copy)
    153     return cls(values, mask)

File /mnt/data/ubuntu/work/python-3.10.12/lib/python3.10/site-packages/pandas/core/arrays/numeric.py:272, in NumericArray._coerce_to_array(cls, value, dtype, copy)
    270 dtype_cls = cls._dtype_cls
    271 default_dtype = dtype_cls._default_np_dtype
--> 272 values, mask, _, _ = _coerce_to_data_and_mask(
    273     value, dtype, copy, dtype_cls, default_dtype
    274 )
    275 return values, mask

File /mnt/data/ubuntu/work/python-3.10.12/lib/python3.10/site-packages/pandas/core/arrays/numeric.py:181, in _coerce_to_data_and_mask(values, dtype, copy, dtype_cls, default_dtype)
    179 elif values.dtype.kind not in "iuf":
    180     name = dtype_cls.__name__.strip("_")
--> 181     raise TypeError(f"{values.dtype} cannot be converted to {name}")
    183 if values.ndim != 1:
    184     raise TypeError("values must be a 1D list-like")

TypeError: complex128 cannot be converted to FloatingDtype
@andrewgsavage
Copy link
Collaborator

andrewgsavage commented Jan 7, 2025 via email

@ikondov
Copy link
Author

ikondov commented Jan 7, 2025

@andrewgsavage Thanks a lot for your prompt answer! Indeed, supplying "pint[meter][int]" and "pint[meter][complex], respectively, "solves" the issue. As far as I understand, there is no more automatic inference from the dtype of the numpy array when constructing a Series. Unfortunately, all our interface to pint-pandas/pandas has to be modified and extended with more code. Is there any chance to get the old behavior of 0.6.2, i.e. no default integer but inference from the actual dtype of the numpy arrays without breaking the interface? This would help us a lot. For the time being, we will pin to ==0.6.2 and consider rewriting our pandas interface in future to be able to move on to >=0.7.

@andrewgsavage
Copy link
Collaborator

andrewgsavage commented Jan 7, 2025 via email

@ikondov
Copy link
Author

ikondov commented Jan 7, 2025

The change was to fix issues like this: #205 which are difficult for users to debug. Inference of datatypes is something I would like to leave out of pint-pandas.

I understand.

We could have a setting that could change the behaviour to tell pandas to infer the subdtype when it's passed into pd.array, instead of specifying Float64.

This would be nice and save a lot of fixing our interface layer.

@andrewgsavage
Copy link
Collaborator

you can do pint_pandas.pint_array.DEFAULT_SUBDTYPE = None

@ikondov
Copy link
Author

ikondov commented Jan 10, 2025

@andrewgsavage: Thanks a lot for this solution! It has worked and the issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants