Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Squelch some setting with copy warnings #1100

Open
wants to merge 11 commits into
base: dev
Choose a base branch
from

Conversation

gnn
Copy link
Collaborator

@gnn gnn commented Feb 9, 2023

Fixes #1097.

Before merging into dev-branch, please make sure that

  • the CHANGELOG.rst was updated.
  • new and adjusted code is formatted using black and isort.
  • the Dataset-version is updated when existing datasets are adjusted.
  • the branch was merged into the
    continuous-integration/run-everything-over-the-weekend-2022-11-10-branch.
  • the workflow is running successful in Schleswig-Holstein mode.
  • the workflow is running successful in Everything mode.

On "src/egon/data/datasets/pypsaeursec/__init__.py" and with version
`23.1.0`. Using `--preview` first followed by a normal invocation pulls
in new style fixes but also makes sure, that only those remain, which
will not conflict with normal `black` usage, e.g. when pre-commit hooks
are run.
This one removes some extraneous blank lines and wraps overly long
ones.
Take advantage of the fact that Python automatically joins adjacent
strings by breaking the string at appropriate places to stay within the
line length limit. Then wrap the two strings in parenthesis, making it
one expression which can be spread over multiple lines. That way the
strings can each be on individual lines and we're staying within 79
characters.
Actually, it's just to make the code shorter while also making it a bit
more readable. (IMHO of course.)
Using `== False` can not only be unnecessarily verbose, but also
surprisingly incorrect, as `0 == False` is `True`. Since `pandas`
doesn't want us to use `not` with boolean `Series`, `~` is the correct
choice for flipping those `Series`.
This also squelches `flake8` complaints.
@gnn gnn force-pushed the features/#1097-squelch-some-setting-with-copy-warnings branch from aa8bed9 to 0b3c8dc Compare February 16, 2023 14:16
Replace single and dual letter variable names with descriptive ones.
Remove comments as the variable names now serve a purpose comparable to
that of the comments.
Feel free to reintroduce the comments if you feel that they serve an
additional informatory purpose. But please expand them a bit, at least
noting that they are examples of what's happening in specific exemplary
loop iterations. That is, if I understood them correctly. If I'm wrong,
please expand them to prevent that misunderstanding.
The `component_t` object is an instance of `pypsa.descriptors.Dict`,
which allows both, attribute access and key lookup. Therefore it's not
necessary to use `getattr`.
The usage of `i` to store `DataFrame` column names was very confusing
to me. It correctly suggested (to me) that `i` is an integer index so I
got that `x["foo"][i]` is accessing column `"foo"` of the `DataFrame`
stored in `x` and then pulling out the value at index `i` of the
resulting `Series`. But since I now associated `i` with being an integer
index, the left hand side of `x["foo"][i] = y[i].values.tolist()`, i.e.
`y[i].values.tolist()`, didn't make sense to me, because `y[i]` looks
like it's pulling out the single value of a `Series` or list at an
integer index. But it doesn't. It actually pulls the column `i` out of
the `DataFrame` `y`. That can then be converted to a list and is
assigned to the `DataFrame` cell `x["foo"][i]`.
Figuring this out took me hours, so I decided to change the variable
name to `column` to convey that it's meant to store a column, even if it
is sometimes used to index a row. What is indexed is then made explicit
by using `.at` and `.loc` as appropriate.
Using `.at[row, column]` also serves the purpose of squelching some
instances of

  ```
  SettingWithCopyWarning:
  A value is trying to be set on a copy of a slice from a DataFrame.
  Try using .loc[row_indexer,col_indexer] = value instead

  See the caveats in the documentation: https://pandas.pydata.org/pandas-doc#
  ```

Using `.at[r, c]` is necessary in order to tell `pandas` that we want to
assign a sequence to a single `DataFrame` location. Using `.loc[r, c]`
`pandas` would try to somehow align the sequence with the cell's
neighbourhood which usually results in an error.
Instead of `[c]`. This hopefully squelches a few instances of

  ```
  SettingWithCopyWarning:
  A value is trying to be set on a copy of a slice from a DataFrame.
  Try using .loc[row_indexer,col_indexer] = value instead

  See the caveats in the documentation: https://pandas.pydata.org/pandas-doc#
  ```

It should also make sure that the assignment always happens and isn't
lost in an ephemeral copy.
At least in the places where I noticed.
This makes it easier to see, whether the indexing is done into a
dictionary, `Series` or list, i.e. a normally one dimensional structure,
or a `DataFrame`, which is always two dimensional.
@gnn gnn force-pushed the features/#1097-squelch-some-setting-with-copy-warnings branch from 0b3c8dc to e908207 Compare February 16, 2023 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PypsaEurSec's neighbor_reduction task emits SettingWithCopyWarnings
1 participant