Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create credible temp score using RMI data #16

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

MichaelTiemannOSC
Copy link
Contributor

If I have read the code, the diagram, and the methodology documentation correctly, It appears that there was quite some ambiguity in how emissions vs. production data was being handled, and a variety of other problems hiding therein. A sign that something was wrong was temperature scores than ranged from a fraction of a degree to several hundred degrees(!).

This PR pervasively changes the EScope (emissions scope) type label to PScope (projection scope). This can be further renamed if there's a another wording that's preferred (of if there's actually a second branch of logic that we should fully pull apart from the main branch).

Regardless of the EScope/PScope naming, there were several other changes needed to make production projections their own thing (and not confused with emissions projections).

The to_numpy() casts in DataWarehouse were covering up an egregious error in keeping production projections and emissions/intensity projections aligned. The rows are no longer scrambled.

Included are files based on original portfolio data but also based on real RMI data (using 2019 as a base year). As you can see by the Notebook file, there are some US-based utilities that are not Paris-aligned (or the data we have doesn't show them in very good light), but most of the temperature scores are in the 2-3 degree range, which is credible.

Comments welcome!

Also disable Steel sector, for which RMI provides no data.
The previous S1+S2 did not correct MMT vs MT as the unit of measure for NOX emissions.
It appeared that there was quite some ambiguity in how emissions vs. production data was being handled.

Pervasively changed ESCope (emissions scope) to PScope (projection scope).  This can be further renamed if there's a better wording for it.

Regardless of the EScope/PScope naming, there were several other changes needed to make production projections their own thing (and not confused with emissions projections).

The to_numpy() casts in DataWarehouse were covering up an egregious error in keeping production projections and emissions/intensity projections aligned.  The rows are no longer scrambled.

Included are files based on original portfolio data but also based on real RMI data (using 2019 as a base year).

Comments welcome!
@MichaelTiemannOSC
Copy link
Contributor Author

As @BertKramer explained to me, it may be that I was overzealous in terms of preferring company production projections over sectoral benchmark projections. If that's the case, then this pull request can be greatly simplified to just the part that fixes the calculations in get_preprocessed_company_data in the file data_warehouse.py.

If, however, there always was a plan to use company production projections, as an option or as a preference, this pull request lays groundwork for that implementation.

ITR data pipeline now properly uses ISO3166 to put countries in the correct regions (with the help of ESSD's UN region definitions).
Copy link
Collaborator

@dp90 dp90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also talked to Bert Kramer about the production projections, and it seems best, for now, to not merge that part back, until the relevant methodology parts have been discussed.
Not sure what the easiest way is to merge part of this PR back, while keeping the remainder, but I'd be happy to help out in any way I can.

As I already started, I did finish the review of the entire PR, so hopefully it's still useful for future reference.

self.column_config.GHG_SCOPE12]]
ei_at_base = self._get_company_intensity_at_year(base_year, company_ids).rename(self.column_config.BASE_EI)
# print(f"BA: company_info.loc[] = {company_info.loc['US0185223007']}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented code

get the projected productions for list of companies in ghg_scope12
:param ghg_scope12: DataFrame with at least the following columns :
ColumnsConfig.COMPANY_ID,ColumnsConfig.GHG_SCOPE12, ColumnsConfig.SECTOR and ColumnsConfig.REGION
get the projected productions for list of companies (PRODUCTIONS not S1S2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"(PRODUCTIONS not S1S2)" might be superfluous

@@ -1,4 +1,4 @@
from abc import ABC
from abc import ABC # _project
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing "# _project"

assert pd.Series(company_ids).isin(df_company_data.loc[:, self.column_config.COMPANY_ID]).all(), \
"some of the company ids are not included in the fundamental data"

company_info_at_base_year = self.company_data.get_company_intensity_and_production_at_base_year(company_ids)
# print(f"DW: company_info_at_base_year.loc[] = {company_info_at_base_year.loc['US0185223007']}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented code

# print(f"BUDG:\n{df_company_data.loc[df_company_data.index<40,['company_id',self.column_config.CUMULATIVE_BUDGET]]}\n\n")
# print(f"CIABY:\n{company_info_at_base_year.loc[df_company_data.index<40,:]}\n\n")
# print(f"""SDA:\n{self.benchmarks_projected_emission_intensity.get_SDA_intensity_benchmarks(
# company_info_at_base_year).loc[df_company_data.index<40,:]}\n\n""")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented code

# print(projected_emission_intensity.index[0:3])
# print(projected_emission_intensity.iloc[0:3])
# print(projected_production.index[0:3])
# print(projected_production.iloc[0:3])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented code

@@ -30,7 +30,7 @@ def convert_benchmark_excel_to_model(df_excel: pd.DataFrame, sheetname: str, col
result.append(bm)
return IBenchmarks(benchmarks=result)


# ??? This duplicates info from
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it duplicates: it refers to tabs of an excel file rather than columns. We could consider moving it to configs.py.
Comment should be removed, I think.

use_S1S2 = (data[self.c.COLS.SCOPE] == EScope.S1S2) | (data[self.c.COLS.SCOPE] == EScope.S1S2S3)
use_S3 = (data[self.c.COLS.SCOPE] == EScope.S3) | (data[self.c.COLS.SCOPE] == EScope.S1S2S3)
use_S1S2 = (data[self.c.COLS.SCOPE] == PScope.S1S2) | (data[self.c.COLS.SCOPE] == PScope.S1S2S3)
use_S3 = (data[self.c.COLS.SCOPE] == PScope.S3) | (data[self.c.COLS.SCOPE] == PScope.S1S2S3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like what you did on line 95. We could consider doing the same here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants