Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEP] IndexError when using pandas==2.0.2 #806

Closed
AlexanderJuestel opened this issue Jun 17, 2023 · 10 comments
Closed

[DEP] IndexError when using pandas==2.0.2 #806

AlexanderJuestel opened this issue Jun 17, 2023 · 10 comments

Comments

@AlexanderJuestel
Copy link
Contributor

Describe the bug
After updating pandas from 2.0.1 to 2.0.2 using pip. The error already occurs when creating a new model using gp.create_model('Model1').

IndexError                                Traceback (most recent call last)
Cell In[8], line 1
----> 1 geo_model = gp.create_model('Model1')
      2 geo_model

File ~\Documents\02_Weisweiler\00_Geomechanical_Model\02_Python\../../../gempy_v2023.1.0/gempy-gempy_v2023.1.0/gempy-gempy_v2023.1.0\gempy\gempy_api.py:114, in create_model(project_name)
     99 def create_model(project_name='default_project') -> Project:
    100     """Create a Project object.
    101 
    102     Args:
   (...)
    112         TODO: Adding saving address
    113     """
--> 114     return Project(project_name)

File ~\Documents\02_Weisweiler\00_Geomechanical_Model\02_Python\../../../gempy_v2023.1.0/gempy-gempy_v2023.1.0/gempy-gempy_v2023.1.0\gempy\core\model.py:1628, in Project.__init__(self, project_name)
   1625 def __init__(self, project_name='default_project'):
   1627     self.meta = MetaData(project_name=project_name)
-> 1628     super().__init__()

File ~\Documents\02_Weisweiler\00_Geomechanical_Model\02_Python\../../../gempy_v2023.1.0/gempy-gempy_v2023.1.0/gempy-gempy_v2023.1.0\gempy\core\model.py:85, in ImplicitCoKriging.__init__(self)
     78 self._rescaling = ScalingSystem(self._surface_points, self._orientations,
     79                                 self._grid)
     80 self._additional_data = AdditionalData(self._surface_points,
     81                                        self._orientations, self._grid,
     82                                        self._faults,
     83                                        self._surfaces, self._rescaling)
---> 85 self._interpolator = InterpolatorModel(self._surface_points,
     86                                        self._orientations, self._grid,
     87                                        self._surfaces,
     88                                        self._stack, self._faults,
     89                                        self._additional_data)
     91 self.solutions = Solution(self._grid, self._surfaces, self._stack)
     93 # Previous values of sfai.

File ~\Documents\02_Weisweiler\00_Geomechanical_Model\02_Python\../../../gempy_v2023.1.0/gempy-gempy_v2023.1.0/gempy-gempy_v2023.1.0\gempy\core\interpolator.py:650, in InterpolatorModel.__init__(self, surface_points, orientations, grid, surfaces, series, faults, additional_data, **kwargs)
    646 def __init__(self, surface_points: "SurfacePoints", orientations: "Orientations", grid: "Grid",
    647              surfaces: "Surfaces", series, faults: "Faults", additional_data: "AdditionalData",
    648              **kwargs):
--> 650     super().__init__(surface_points, orientations, grid, surfaces, series, faults,
    651                      additional_data, **kwargs)
    652     self.len_series_i = np.zeros(1)
    653     self.len_series_o = np.zeros(1)

File ~\Documents\02_Weisweiler\00_Geomechanical_Model\02_Python\../../../gempy_v2023.1.0/gempy-gempy_v2023.1.0/gempy-gempy_v2023.1.0\gempy\core\interpolator.py:66, in Interpolator.__init__(self, surface_points, orientations, grid, surfaces, series, faults, additional_data, **kwargs)
     63 self.aesara_graph = self.create_aesara_graph(additional_data, inplace=False)
     64 self.aesara_function = None
---> 66 self._compute_len_series()

File ~\Documents\02_Weisweiler\00_Geomechanical_Model\02_Python\../../../gempy_v2023.1.0/gempy-gempy_v2023.1.0/gempy-gempy_v2023.1.0\gempy\core\interpolator.py:822, in InterpolatorModel._compute_len_series(self)
    817 self.len_series_f = np.atleast_1d(len_series_f_.astype(
    818     'int32'))  # [:self.additional_data.get_additional_data()['values']['Structure', 'number series']]
    820 self._old_len_series = self.len_series_i
--> 822 self.len_series_i = self.len_series_i[non_zero]
    823 self.len_series_o = self.len_series_o[non_zero]
    824 # self.len_series_f = self.len_series_f[non_zero]

IndexError: invalid index to scalar variable.

To Reproduce
Provide detailed steps to reproduce the behavior:

Updating pandas from 2.0.1 to 2.0.2 and using the latest version of the development branch
...

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: (e.g. iOS)
  • GemPy Version
    • if installed via pip: provide GemPy version (e.g. 2.0.1)
    • if cloned from GitHub: provide Git commit id (e.g. 839bf85)
  • Jupyter Version (if applicable)

Additional context
Add any other context about the problem here.

@AlexanderJuestel
Copy link
Contributor Author

The error remains in a freshly installed environment and with all packages installed manually.

@AlexanderJuestel AlexanderJuestel changed the title IndexError: invalid index to scalar variable. [DEP] IndexError when using pandas==2.0.2 Jun 20, 2023
@phasyn8
Copy link

phasyn8 commented Jul 21, 2023

What is the work around for this? I ran into it after having a working install and now can't seem to get around this issue, by means of downgrading pandas.

@Japhiolite
Copy link
Collaborator

So, you still have this issue after downgrading pandas?
Or does downgrade not work in your environment?
GemPy won't work with pandas 2.0.2, but 2.0 and 2.0.1

@AlexanderJuestel AlexanderJuestel pinned this issue Aug 12, 2023
@AlexanderJuestel
Copy link
Contributor Author

Tracking down the issue further. The function that is broken is _def_compute_len_series

    def _compute_len_series(self):


        self.len_series_i = self.additional_data.structure_data.df.loc[
                                'values', 'len series surface_points'] - \
                            self.additional_data.structure_data.df.loc[
                                'values', 'number surfaces per series']


        self.len_series_o = self.additional_data.structure_data.df.loc[
            'values', 'len series orientations'].astype(
            'int32')

self.len_series_i and self.len_series_o are of type np.int32 and equal both to 0 when running the model the first time

        # Remove series without data
        non_zero_i = self.len_series_i.nonzero()[0]
        non_zero_o = self.len_series_o.nonzero()[0]
        non_zero = np.intersect1d(non_zero_i, non_zero_o)


        self.non_zero = non_zero

non_zero equals therefore to array([], dtype=int64), an empty array

        self.len_series_u = self.additional_data.kriging_data.df.loc[
            'values', 'drift equations'].astype('int32')
        try:
            len_series_f_ = self.faults.faults_relations_df.values[non_zero][:, non_zero].sum(
                axis=0)


        except np.AxisError:
            print('np.axis error')
            len_series_f_ = self.faults.faults_relations_df.values.sum(axis=0)


        self.len_series_f = np.atleast_1d(len_series_f_.astype(
            'int32'))  # [:self.additional_data.get_additional_data()['values']['Structure', 'number series']]


        self._old_len_series = self.len_series_i


        self.len_series_i = self.len_series_i[non_zero]
        self.len_series_o = self.len_series_o[non_zero]
        # self.len_series_f = self.len_series_f[non_zero]
        self.len_series_u = self.len_series_u[non_zero]

Indexing self.len_series_i (type np.int32) with non_zero results in the error seen in this issue.

Reproducing the error locally:
image

@AlexanderJuestel
Copy link
Contributor Author

# Index Error raised since pandas==2.0.2
        try:
            self.len_series_i = self.len_series_i[non_zero]
            self.len_series_o = self.len_series_o[non_zero]
            # self.len_series_f = self.len_series_f[non_zero]
            self.len_series_u = self.len_series_u[non_zero]

            if self.len_series_i.shape[0] == 0:
                self.len_series_i = np.zeros(1, dtype=int)
                self._old_len_series = self.len_series_i

            if self.len_series_o.shape[0] == 0:
                self.len_series_o = np.zeros(1, dtype=int)
            if self.len_series_u.shape[0] == 0:
                self.len_series_u = np.zeros(1, dtype=int)
            if self.len_series_f.shape[0] == 0:
                self.len_series_f = np.zeros(1, dtype=int)

        except IndexError:
            self.len_series_i = np.array([self.len_series_i])
            self.len_series_o = np.array([self.len_series_o])
            # self.len_series_f = np.array([self.len_series_f])
            self.len_series_u = np.array([self.len_series_u])

@AlexanderJuestel
Copy link
Contributor Author


# Type Error raised since pandas==2.0.2
        try:
            if len(self.kriging_data.df.loc['values', 'drift equations']) < \
                    self.structure_data.df.loc['values', 'number series']:
                self.kriging_data.set_u_grade()
        except TypeError:
            if int(self.kriging_data.df.loc['values', 'drift equations']) < \
                    self.structure_data.df.loc['values', 'number series']:
                self.kriging_data.set_u_grade()

@AlexanderJuestel
Copy link
Contributor Author

There seems to be a change in data type from pandas 2.0.1 to 2.0.2. So we need to track down the location where the DataFrame is constructed.

image

image

This can be traced to the following value:

image

image

image

image

Now I just need to find the place where the values are assigned to the DataFrame....

@AlexanderJuestel
Copy link
Contributor Author

Opened an issue in the pandas repo: pandas-dev/pandas#54519

@Japhiolite
Copy link
Collaborator

There seems to be a change in data type from pandas 2.0.1 to 2.0.2. So we need to track down the location where the DataFrame is constructed.

image

image

This can be traced to the following value:

image

image

image

image

Now I just need to find the place where the values are assigned to the DataFrame....

as a fix / workaround, might already work to put the int representing the geo_model object into a numpy.ndarray.

@Leguark
Copy link
Member

Leguark commented Apr 16, 2024

GemPy v3 does not depend on pandas anymore

@Leguark Leguark closed this as completed Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants