Squelch some setting with copy warnings #1100

gnn · 2023-02-09T14:39:19Z

Before merging into `dev`-branch, please make sure that

the CHANGELOG.rst was updated.
new and adjusted code is formatted using black and isort.
the Dataset-version is updated when existing datasets are adjusted.
the branch was merged into the
continuous-integration/run-everything-over-the-weekend-2022-11-10-branch.
the workflow is running successful in Schleswig-Holstein mode.
the workflow is running successful in Everything mode.

On "src/egon/data/datasets/pypsaeursec/__init__.py" and with version `23.1.0`. Using `--preview` first followed by a normal invocation pulls in new style fixes but also makes sure, that only those remain, which will not conflict with normal `black` usage, e.g. when pre-commit hooks are run. This one removes some extraneous blank lines and wraps overly long ones.

Take advantage of the fact that Python automatically joins adjacent strings by breaking the string at appropriate places to stay within the line length limit. Then wrap the two strings in parenthesis, making it one expression which can be spread over multiple lines. That way the strings can each be on individual lines and we're staying within 79 characters.

Actually, it's just to make the code shorter while also making it a bit more readable. (IMHO of course.)

Using `== False` can not only be unnecessarily verbose, but also surprisingly incorrect, as `0 == False` is `True`. Since `pandas` doesn't want us to use `not` with boolean `Series`, `~` is the correct choice for flipping those `Series`. This also squelches `flake8` complaints.

Replace single and dual letter variable names with descriptive ones. Remove comments as the variable names now serve a purpose comparable to that of the comments. Feel free to reintroduce the comments if you feel that they serve an additional informatory purpose. But please expand them a bit, at least noting that they are examples of what's happening in specific exemplary loop iterations. That is, if I understood them correctly. If I'm wrong, please expand them to prevent that misunderstanding.

The `component_t` object is an instance of `pypsa.descriptors.Dict`, which allows both, attribute access and key lookup. Therefore it's not necessary to use `getattr`.

The usage of `i` to store `DataFrame` column names was very confusing to me. It correctly suggested (to me) that `i` is an integer index so I got that `x["foo"][i]` is accessing column `"foo"` of the `DataFrame` stored in `x` and then pulling out the value at index `i` of the resulting `Series`. But since I now associated `i` with being an integer index, the left hand side of `x["foo"][i] = y[i].values.tolist()`, i.e. `y[i].values.tolist()`, didn't make sense to me, because `y[i]` looks like it's pulling out the single value of a `Series` or list at an integer index. But it doesn't. It actually pulls the column `i` out of the `DataFrame` `y`. That can then be converted to a list and is assigned to the `DataFrame` cell `x["foo"][i]`. Figuring this out took me hours, so I decided to change the variable name to `column` to convey that it's meant to store a column, even if it is sometimes used to index a row. What is indexed is then made explicit by using `.at` and `.loc` as appropriate. Using `.at[row, column]` also serves the purpose of squelching some instances of ``` SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-doc# ``` Using `.at[r, c]` is necessary in order to tell `pandas` that we want to assign a sequence to a single `DataFrame` location. Using `.loc[r, c]` `pandas` would try to somehow align the sequence with the cell's neighbourhood which usually results in an error.

Instead of `[c]`. This hopefully squelches a few instances of ``` SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-doc# ``` It should also make sure that the assignment always happens and isn't lost in an ephemeral copy.

At least in the places where I noticed. This makes it easier to see, whether the indexing is done into a dictionary, `Series` or list, i.e. a normally one dimensional structure, or a `DataFrame`, which is always two dimensional.

gnn added 5 commits February 9, 2023 13:57

Remove unused import

1ca6e70

Use f-strings for fun and profit

7f0b6aa

Actually, it's just to make the code shorter while also making it a bit more readable. (IMHO of course.)

gnn force-pushed the features/#1097-squelch-some-setting-with-copy-warnings branch from aa8bed9 to 0b3c8dc Compare February 16, 2023 14:16

gnn added 5 commits February 16, 2023 15:18

Replace getattr with key lookup

54ac272

The `component_t` object is an instance of `pypsa.descriptors.Dict`, which allows both, attribute access and key lookup. Therefore it's not necessary to use `getattr`.

Replace [column] access with .loc[:, column]

e908207

At least in the places where I noticed. This makes it easier to see, whether the indexing is done into a dictionary, `Series` or list, i.e. a normally one dimensional structure, or a `DataFrame`, which is always two dimensional.

gnn force-pushed the features/#1097-squelch-some-setting-with-copy-warnings branch from 0b3c8dc to e908207 Compare February 16, 2023 14:20

Bump PypsaEurSec's dataset version

59e17b7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Squelch some setting with copy warnings #1100

Squelch some setting with copy warnings #1100

gnn commented Feb 9, 2023 •

edited

Loading

Squelch some setting with copy warnings #1100

Are you sure you want to change the base?

Squelch some setting with copy warnings #1100

Conversation

gnn commented Feb 9, 2023 • edited Loading

Before merging into dev-branch, please make sure that

gnn commented Feb 9, 2023 •

edited

Loading

Before merging into `dev`-branch, please make sure that